**2016 Seminars**

## Department of Statistics

# 2016 Seminars

Seminars by year: Current | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016

**How to assign partial credit on an exam of true-false questions?**

Speaker: Jing Liu (Stephen), PhD

Affiliation: Shanghai Jiao Tong University, China

When: Wednesday, 21 December 2016, 1:00 pm to 2:00 pm

Where: 303-G14

True-false questions are adaptable to the measurement of a wide variety of learning outcomes. It is particularly popular in statistics. In contrast to some other disciplines of knowledge, many questions in statistics do have a single and objective correct answer, with all other answers being agreed upon as being incorrect. So students in statistics classes can be examined in an objective manner using true-false questions effciently. However, it is known to have its limitations. Perhaps the most obvious problem is the zero-tolerance approach to mistakes, which can distort the relationship between aptitude and credit. Secondly, it does not provide diagnostic information. Lastly, it is susceptible to cheating. This talk looks at how to mathematically alleviate those issues while retaining the advantages of using true-false questions in statistics.

**The SaFE Project: Developing an intervention to promote safe sex and healthy relationships among further education students in the UK.**

Speaker: Dr Honor Young

Affiliation: Lecturer in Quantitative Research Methods, Cardiff University

When: Thursday, 15 December 2016, 11:00 am to 12:00 pm

Where: 303-B07

The promotion of safe and healthy sexual behaviour and relationships is key to young people

**Measuring poverty in the European Union**

Speaker: Dr Marco Pomati

Affiliation: Lecturer in Quantitative Sociology

When: Thursday, 15 December 2016, 11:00 am to 12:00 pm

Where: 303-B07

Dr Marco Pomati will outline his survey-based research on the measurement of material deprivation funded by EUROSTAT and how people cope with poverty in the UK and Europe. He will also outline his plans for research on the measurement of living standards starting in January 2017 and funded by the Nuffield Foundation. He will also summarise some lessons from his newly-designed Undergraduate course Knowing the Social World: Online and Offline Surveys run between September and December 2016 for the first time as part of the BSc Social Analytics.

**User Equilibria in Systems of Processor Sharing queues: 2016 ORSNZ conference presentation, extended cut**

Speaker: Niffe Hermansson

Affiliation: University of Auckland

When: Wednesday, 14 December 2016, 10:00 am to 11:00 am

Where: 303-G14

This talk will be an extended version of my talk at the NZSA/ORSNZ Conference, so if you are interested and missed it there, now is your chance.

We consider the behaviour of selfish users in systems of parallel queues operating under processor sharing. Previous work has shown that these systems exhibit interesting, and sometimes perplexing, behaviours. In this presentation we will see some examples of surprising system behaviour, but also some encouraging properties of the system at equilibrium.

**Is more statistics good for everyone? Cardiff Q-Step FE/Schools Initiative**

Speaker: Rhys Jones, Lecturer in Quantitative Methods, Further Education and Admissions Tutor

Affiliation: Cardiff University

When: Tuesday, 13 December 2016, 11:00 am to 12:00 pm

Where: 303-B07

There has been an overwhelmingly positive response to the Further Education (FE)/School engagement work, linked to developing and promoting context rich statistical courses, across England and Wales. These courses, primarily aimed at year 12 and 13 students, are focussed on the development of a new subject area called Social Analytics (the scientific investigation of social processes using statistical techniques and analysis). Individuals attending this session will gain practical insights into the innovative partnerships developed between universities, exam boards and schools/ FE colleges. An exemplification of the collaborative benefits will also be explored. The session will also focus on the pedagogical basis of the qualifications being created, the interdisciplinary nature and skills centred approach that has been adopted, and the educational impacts in terms of student attainment and achievement in other subject areas. The case will be made that developing students critical thinking and conceptual understanding of statistics, can have positive impacts on many other subject areas. These positive impacts include attitudes towards mathematics and statistics, as well as educational achievement.

**To Fuel or Not to Fuel? Is that the Question?**

Speaker: Javier Cano, Professor

Affiliation: Rey Juan Carlos University in Madrid, Spain

When: Wednesday, 7 December 2016, 11:00 am to 12:00 pm

Where: 303-412

According to the International Air Transport Association, the industry fuel bill accounts for more than 25% of the annual airline operating costs. In times of severe economic constraints and increasing fuel costs, air carriers are looking for ways to reduce costs and improve fuel efficiency without putting flight safety into jeopardy. In particular, this is inducing discussions on how much additional fuel to put in a planned route to avoid diverting to an alternate airport due to Air Traffic Flow Management delays. We provide here a general model to support such decisions. We illustrate it with a case study and provide comparison with the current practice, showing the relevance of our approach.

**Publish for Pleasure: Embracing modern tools for research publications**

Speaker: Paul Murrel

Affiliation: Department of Statistics, University of Auckland

When: Tuesday, 15 November 2016, 11:00 am to 12:00 pm

Where: 303-B07

This talk will describe some of the workflows and tools that I use to create research publications and I will attempt to explain why these tools are so groovy. The focus will be on tools that improve efficiency (e.g., XML and literate documents), tools that optimize accessibility (e.g., Creative Commons and DIY publishing), and tools that promote reproducibility (e.g., Docker).

http://www.stat.auckland.ac.nz/~paul/

**Polytope Samplers for Network Tomography**

Speaker: Martin Hazelton

Affiliation: Statistics and Bioinformatics Group, Massey U.

When: Wednesday, 12 October 2016, 11:00 am to 12:00 pm

Where: Room 303-310

Volume network tomography is concerned with inference about traffic flow characteristics based on traffic measurements at fixed locations on the network. The quintessential example is estimation of the traffic volume between any pair of origin and destination nodes using traffic counts obtained from a subset of the links of the network. The data provide only indirect information about the target variables, generating a challenging type of statistical linear inverse problem.

Given the observed traffic count, the latent route traffic volumes are constrained to lie in an integer convex polytope (the solution space for an underdetermined linear system with non-negativity constraints). Implementation of inference using MCMC or stochastic EM algorithms requires that we sample from this (high dimensional) polytope. In this talk I will describe some recent progress on developing efficient polytope samplers, and will outline links to related problems such as resampling entries in contingency tables conditional on various marginal totals.

**Hypothesis tests based on large quadratic forms**

Speaker: Thomas Lumley

Affiliation: Dept. Statistics, U. Auckland

When: Wednesday, 5 October 2016, 11:00 am to 12:00 pm

Where: Room 303S-561

When a set of n component tests is combined using a weight matrix other than the inverse of their covariance matrix, the natural large-sample approximation to the distribution is a quadratic form in Gaussian variables. There are three classes of existing ways to evaluate tail probabilities for this distribution: approximations based on matching moments, a saddlepoint approximation, and essentially exact methods based on infinite series. For many purposes all of these are satisfactory. However, when extreme tail probabilities are required, as in DNA resequencing studies, all the existing methods that are sufficiently accurate take n^3 time. With modern DNA resequencing projects reaching 10,000 participants and interest in tests combining as many as 10,000-100,000 variants, these methods are prohibitively slow. I will present a new approximation based on a low-rank approximate SVD, and explain why it is both fast and accurate.

**Spatial Modelling with Template Model Builder: Applications to multivariate methods in ecology**

Speaker: Andrea Havron

Affiliation: Dept. Statistics, U. Auckland

When: Thursday, 22 September 2016, 3:00 pm to 4:00 pm

Where: Room 303-310

[Please note the unusual day and time.]

Recent and often rapid changes in environmental states as a result of human impact have led to calls for improved predictive and inferential capabilities in ecological modelling. Due to the complexity of ecological systems and the tendency of observations from these systems to violate assumptions of independence, new methodologies are warranted to better incorporate spatial autocorrelation into model design. Template Model Builder (TMB), an automated differential modelling environment, allows spatial structure to be modelled as spatial random effects within a full likelihood-based framework. The sparse precision matrix, Q, is estimated by a Gaussian Markov Random Field using a Stochastic Partial Differentiation Equation, which has a Gaussian Field with Matern covariance function as its solution. By performing optimization procedures with a sparse Q, the computational burden of estimating the spatial random effects is improved from O(n3) to O(n3/2). Through this application, more complex ecological processes may be analysed.

In this talk, I will review new spatial modelling methods in R-INLA and TMB to estimate the Gaussian Markov Random Field. I will then discuss multivariate applications of these methods to issues in fisheries management, such as predicting joint species distributions from biomass data and predicting fisheries bycatch hotspots. I will also introduce the development of a new method for estimating community ecology's beta diversity.

**The Horrors of Network and Trajectory spaces: MCMC in Bayesian Phylogenetics and Phylodynamics**

Speaker: Tim Vaughan

Affiliation: Centre for Computational Evolution, U. Auckland

When: Wednesday, 14 September 2016, 11:00 am to 12:00 pm

Where: Room 303-310

Phylogenetics is the study of phylogeny: the tree-like relationships between biological entities such as species, organisms and genes. Phylodynamics is the study of the interplay between pathogen phylogeny and epidemic dynamics. The application of the understanding brought about by this study is very broad: from addressing purely scientific questions regarding the evolution of species, to addressing very practical public health questions about the current state and future behaviour of ongoing epidemics.

In this talk, I will briefly discuss two ways in which my colleagues and I have used MCMC to address these questions using genetic data and Bayesian inference. Firstly, I will discuss the application of MCMC to the inference of phylogenetic networks that relate bacterial samples. Even the simplistic model of bacterial evolution we use exhibits a host of difficulties for sampling algorithms such multiple modes, non-identifiability and an infinite-dimensional network space.

Secondly I will present our efforts toward applying particle filtering within MCMC (specifically the Particle Marginal Metropolis Hastings algorithm) to the problem of inferring prevalence dynamics of rapidly evolving pathogens. I will explain how PMMH (which has been described as one of the most important contributions to computational Bayesian inference in the 21st century) allows us to side-step both the problem of explicitly computing a complicated marginal likelihood as well as the problem of performing MCMC on an infinite-dimensional space of continuous-time trajectories.

**Evidence Wars: Statistical ergonomics and the information content of a P-value**

Speaker: Sander Greenland

Affiliation: Dept. Epidemiology, UCLA School of Public Health

When: Wednesday, 7 September 2016, 6:00 pm to 7:00 pm

Where: MLT3 (101), level 1, Uni Bldg 303 on 38 Princes Street, Auckland CBD

[Please note the unusual time & location]

Prof Sander Greenland, who is visiting the Department of Statistics to deliver a workshop on Bayesian and Penalised Regression Methods of Epidemiological Analysis, will be giving a talk on "Evidence Wars: Statistical ergonomics and the information content of a P-value."

Professor Sander Greenland is one of the most prolific and influential authors on epidemiological methods of the past 2-3 decades. He is a co-author (with K Rothman) of the key reference textbook "Modern Epidemiology" and an author of more than 390 articles in epidemiology and biostatistics journals.

Refreshments will be served following the presentation. For inquiries about this seminar, please contact Rosemary Barraclough, email: rk.barraclough@auckland.ac.nz

**Interactive graphics for genetic data**

Speaker: Karl Broman

Affiliation: Biostatistics & Medical Informatics, University of Wisconsin-Madison

When: Wednesday, 31 August 2016, 11:00 am to 12:00 pm

Where: Room 303-310

The value of interactive graphics for making sense of high-dimensional data has long been appreciated but is still not in routine use. I will describe my efforts to develop interactive graphical tools for genetic data, using JavaScript and D3. (The tools are available as an R package: R/qtlcharts, http://kbroman.org/qtlcharts). I will focus on an expression genetics experiment in the mouse, with gene expression microarray data on each of six tissues, plus high-density genotype data, in each of 500 mice. I argue that in research with such data, precise statistical inference is not so important as data visualization.

**Two short talks**

Speaker: Russell Millar and Rachel Fewster

Affiliation: Dept. Statistics, U. Auckland

When: Thursday, 25 August 2016, 3:00 pm to 4:00 pm

Where: Room 303-310

[This seminar will consist of two twenty-minute talks back-to-back. Please note the unusual day and time.]

Russell Millar: Template model builder: a new tool for fitting complex statistical models

Template model builder (TMB) uses automatic differentiation to provide exact derivatives (to machine precision). It is well integrated with R, and allows automatic construction of both an objective function and its derivative with respect to all parameters. It also includes structures for fitting spatio-temporal models and mixed-effects and, if desired, can be used in Bayesian mode via MCMC functions.

Several staff members and graduate students in the department are using TMB. This talk provides an introduction to this new and powerful tool for fitting complex models.

Rachel Fewster: Trace-contrast models: a new way of looking at capture-recapture

Capture-recapture studies increasingly rely upon natural tags that allow animals to be identified by intrinsic features such as coat markings, DNA profiles, acoustic profiles, or spatial locations. These innovations greatly broaden the scope of capture-recapture estimation and the number of capture samples achievable. However, they are imperfect measures of identity, effectively sacrificing sample quality for quantity and accessibility. Drawing on ideas from the Palm likelihood approach to parameter estimation in clustered point processes, I will outline a new inference framework for capture-recapture studies based on comparing pairs of samples. Importantly, no reconstruction of capture histories is needed. The resulting inference is accurate, reasonably precise, and computationally fast. I will illustrate the methods with a camera-trap behavioural study of a partially-marked population of NZ ship rats.

Slides and sample code from Russell's talk are available at the link below.

https://www.stat.auckland.ac.nz/~jgoo625/seminars/20160825SlidesAndExamples/

**Self-regulation of a queue via random priorities**

Speaker: Binyamin Oz

Affiliation: Dept. Statistics, U. Auckland

When: Thursday, 18 August 2016, 3:00 pm to 4:00 pm

Where: Room 303-310

[Please note the unusual day and time.]

We consider an unobservable M/M/1 queue where customers are homogeneous with respect to service valuation and cost per unit of time of waiting. It is well known that left to themselves, in equilibrium, customers join the queue at a rate higher than is socially optimal. Hence, regulation schemes, under which the resulting equilibrium joining rate coincides with the socially optimal one, should be considered. We suggest a classification of regulation schemes, based on few desired properties, and use it to classify schemes from existing literature. To the best of our knowledge, none of the existing schemes possesses all properties, and in this talk we suggest such a scheme. This novel scheme is based on assigning random priority to each customer, prior to his decision whether to join or balk. We also introduce variations of this regulation scheme as well as additional schemes based on randomization.

**Real-time prediction of bus arrival**

Speaker: Tom Elliott

Affiliation: Dept. Statistics, U. Auckland

When: Wednesday, 17 August 2016, 11:00 am to 12:00 pm

Where: Room 303-310

Predicting the arrival time of buses has received a good deal of attention over the past two decades. The two main considerations for these models are 1. the accuracy of the model and associated predictions, and 2. the computational constraints---the predictions need to be made in real-time, for hundreds of buses each with multiple scheduled stops. The Kalman Filter has been used in many arrival time prediction applications, although the assumptions that make it computationally efficient also mean that it is not very robust.

I will be presenting an approach to modelling buses in real-time using a particle filter, which has recently been used in several transit applications. Unlike these, however, we will focus solely on its use as a predictive model. Using the state estimates provided by the particle filter, we aim to combine the data from all buses in Auckland to generate an overall "traffic state", to be used to improve arrival time predictions. I will also discuss the use of prediction intervals and journey planning applications based off of the particle filter.

**Scale-invariant Random Spatial Networks from line patterns**

Speaker: Wilfrid Kendall

Affiliation: Department of Statistics, University of Warwick

When: Tuesday, 9 August 2016, 3:00 pm to 4:00 pm

Where: Room 303S-561

[Please note the unusual day and room.]

This will be a colloquium-style talk about my work on applications of random (Poisson) line patterns to model transportation networks. Starting with construction of the Poisson line process, I will mention how it leads to a surprising result about efficient network design, a model for traffic in cities, and a further model for Google-map-like behaviour. The talk will aim to be accessible to PhD students.

**When to arrive to a queue?**

Speaker: Moshe Haviv

Affiliation: Department of Statistics and the Federmann Center for the Study of Raitonality, The Hebrew University of Jerusalem

When: Wednesday, 3 August 2016, 11:00 am to 12:00 pm

Where: Room 303-310

A bank opens at 8:30am and closes its doors at 12:00pm. You would like to get service. A question arises: when should you arrive so as to wait as little as possible and maybe also to get your service as early as possible (clearing your mind for the rest of the day)? This is not an optimization problem as you are not the only customer and your disutility is a function of what others do. In fact you are a player in a non-cooperative game with, usually randomly many, other, alike customers. For such problems one looks for a Nash equilibrium profile. Typically, the equilibrium strategy of when to arrive prescribes mixing, leading to a continuous random time of arrival. The talk will address this problem while dealing with some of its variations. Among the variations: with or without respecting seniority for those who arrive prior to 8:30am, or with or without tardiness costs. The main results in this area, commencing with Glazer and Hassin (1983) will be surveyed. Time permitted, fluid approximation models will be reviewed too.

**Mixture distributions and meta-analysis**

Speaker: Christian Roever

Affiliation: Dept. Medical Statistics, Uni. Gottingen

When: Wednesday, 20 July 2016, 11:00 am to 12:00 pm

Where: Room 303-310

I will be presenting some of my recent work in the context of meta-analysis. The random-effects model commonly utilized for meta-analysis has two parameters: the "effect" that is to be estimated by combining results from separate studies, and the "heterogeneity", a variance component (and nuisance parameter) accounting for between-study variability. Within a Bayesian framework, the problem may be solved partly analytically and partly numerically. The effect's posterior distribution is a mixture distribution (of conditionals, conditioning on the heterogeneity parameter), and the same strategy may be applied more generally to derive other types of mixtures as well. I will describe the approach and show how it is implemented in the "bayesmeta" package.

**Matrix analytic methods for analysing stochastic fluid models**

Speaker: Nikki Sonenberg

Affiliation: School of Mathematics and Statistics, U. Melbourne

When: Wednesday, 13 July 2016, 11:00 am to 12:00 pm

Where: Room 303-310

Stochastic fluid models are two-dimensional Markov processes in which the first component is the level of fluid in a buffer and the second component is the state of a continuous time Markov chain that governs the rates of change of the fluid. Such a fluid model can represent the energy level of a battery charged device that is depleted according to its usage.

To determine the stationary density of the models we use matrix analytic methods. Specifically, we use a Markov regenerative approach which involves sample path arguments. We consider models with infinite and finite capacity fluid buffers, and illustrate the relationship between the two cases. The models considered feature reactive boundaries, that is where the behaviour can change upon hitting a boundary.

A novel application of matrix analytic methods is presented with a model of an ad-hoc network as a network of stochastic fluid models, each representing the battery energy at a node.

**Smooth Estimation of Convex Functions**

Speaker: Hongbin Guo

Affiliation: Dept. Statistics, U. Auckland

When: Thursday, 23 June 2016, 3:00 pm to 4:00 pm

Where: Room 303-310

[Please note the unusual day and time.]

Estimation of functions under convexity shape restriction is of considerable interest in many practical applications. Typical examples include dose-response relationships in medicine, utility functions, product functions, profit and cost functions in economics, and hazard rate functions in survival analysis. In 2001, Groeneboom, Jongbloed & Wellner established that under regular conditions, maximum likelihood and least squares estimators of these convex functions are piecewise linear (i.e., linear splines). Since then, much efforts have been devoted into the search of smooth and more efficient estimators of convex functions.

In this talk, we describe a new nonparametric smooth estimator of a convex function. We note that the knots of a linear spline along with their associated increments of slopes are essentially a discrete measure, which is the reason of non-smoothness. Our main idea is to replace the discrete measure with a continuous measure and use a tuning parameter to control the level of smoothness. This gives a smooth estimator of the convex function, which can be computed rapidly after a mathematical reformulation. Preliminary simulation results for solving regression problems show that the new estimator performs better than other smooth/non-smooth estimators in various situations.

**Improving the Accuracy of Automated Occupation Coding**

Speaker: Matthias Schonlau

Affiliation: Dept. Statistics&Act.Sci. and Survey Research Centre, U. Waterloo

When: Wednesday, 22 June 2016, 11:00 am to 12:00 pm

Where: Room 303-310

Occupation coding, an important task in official statistics, refers to coding a respondent's text answer into one of many hundreds of occupation codes. To date, occupation coding is still at least partially conducted manually at great expense. We propose new methods for automatic coding that also apply when only a fixed proportion of the text answers are to be coded automatically. Using data from the German General Social Survey (ALLBUS), we show that both methods improve on both the coding accuracy of the underlying statistical/ machine learning algorithm and the coding accuracy of duplicates where duplicates exist.

(Co-authors: Hyukjun Gweon, U of W, Lars Kaczmirek, GESIS, Germany, Michael Blohm, GESIS, Germany, Stefan Steiner, U of W)

**Analysis of Laboratory Monitoring in Adults with Diabetes using Repeated Measures and Trajectory Models**

Speaker: Mugdha Manda

Affiliation: Dept. Statistics, U. Auckland

When: Friday, 3 June 2016, 11:00 am to 12:00 pm

Where: Room 303-310

[Please note the unusual day.]

National evidence-based guidelines recommend that people with diabetes have their HbA1c, blood lipids and albumin:creatinine ratio (ACR) monitored at least annually. Previous studies have indicated a lack of systematic protocols and implementation of recommended guidelines (both nationally and internationally) reflecting a quality of care gap between the clinical recommendations and actual clinical practice. The statistical analyses used in these studies reported a wide variety of analytical methods such as Kaplan-Meir survival analyses, multivariate logistic regression, AN(C)OVA and simple Mantel-Haenszel chi-square tests. However, these analytical methods are not appropriate for such data as they do not take into account of the repeated measures on a patient and the correlated nature of such clinical datasets. As a result, many studies excluded important information in statistically analysing their models. We aim to provide a better assessment of risk factor monitoring, including examining trajectories of kidney failure models and the role of glycaemic measures in the development of these diseases. This talk will focus on a prospective cohort study that is underway examining the patterns of laboratory monitoring in people with diabetes by linking health outcome data, laboratory data (from TestSafe) and routine primary care data and analyse it using alternative multivariate statistical analysis methods.

**Hamiltonian Monte Carlo on the Space of Phylogenies**

Speaker: Arman Bilge

Affiliation: Centre for Computational Evolution, U. Auckland

When: Wednesday, 1 June 2016, 11:00 am to 12:00 pm

Where: Room 303-310

Evolutionary tree inference, or phylogenetics, is an essential tool for understanding biological systems from ancient species divergences to recent viral transmission. The Bayesian paradigm is now commonly used in phylogenetics to describe support for estimated phylogenies or to test hypotheses that can be expressed in phylogenetic terms. However, current Bayesian phylogenetic inference methods are limited to about 1,000 genetic sequences, which is much fewer than are available via modern sequencing technology. This is because they depend on the Metropolis-Hastings Markov chain Monte Carlo (MCMC) algorithm, which utilises a random walk to draw samples from the posterior distribution on phylogenies.

Here we develop phylogenetic Hamiltonian Monte Carlo (HMC) as a new approach to enable phylogenetic inference on larger data sets. HMC is an existing sampling method that transforms the posterior distribution into a physical landscape that can be explored using Newton's laws of motion. By simulating the movement of a particle in this landscape, we can make distant proposals that are likely to be accepted, thus avoiding random walk behaviour. However, because a phylogenetic tree parameter includes both its branch lengths and topology, we must go beyond the current implementations of HMC which cannot consider this special structure of trees. To do so, we develop a probabilistic version of the numerical integrator within HMC which can explore tree space. This algorithm generalises previous algorithms by doing classical HMC on the branch lengths when considering a single topology, but making random choices between the tree topologies at the "intersection" between various trees. We show that our algorithm correctly samples from the posterior distribution on phylogenies and provide a proof-of-concept implementation in open-source software.

**Students' Statistical Modeling with TinkerPlots(tm)**

Speaker: Jennifer Noll

Affiliation: Portland State University

When: Wednesday, 25 May 2016, 11:00 am to 12:00 pm

Where: Room 303S-561

[Please note the new venue.]

Teaching introductory statistics using curriculum focused on computer modeling and simulation is becoming increasingly common in introductory statistics courses and touted as a more beneficial approach for fostering students' statistical thinking. Yet, surprisingly little research exists that studies the impact of a modeling and simulation curricula on student thinking, nor do we have research on how students make sense of the computer models they construct. It stands to reason that the curricula and technology used in teaching introductory statistics mediates students' ways of thinking. Thus, in order to best support students' learning in modeling and simulation courses we need to know more about the impact of new curricula and technology on their statistical development. In March of 2015, I received a National Science Foundation grant to study this topic. My research focuses on the impact of a modeling and simulation approach on students' statistical reasoning. In particular, I seek to develop characterizations of students' thinking as they solve problems from CATALST (Change Agents for the Teaching and Learning of Statistics) Curriculum using TinkerPlots(tm) technology. In my presentation I will share some background information about this project as well as some initial results from my work this past year.

**A two-stage approach to inferring the conditional independence graph for a multivariate autoregressive model**

Speaker: Said Maanan

Affiliation: Dept. Statistics, U. Auckland

When: Wednesday, 18 May 2016, 11:00 am to 12:00 pm

Where: Room 303-310

Graphical interaction models have become an important tool for analysing multivariate time series. In these models, the interrelationships among the components of a time series are described by undirected graphs in which the vertices depict the components while the edges indicate possible dependencies between the components. Most of the methods used for the identification of the graphical structure are based on nonparametric spectral estimation, which prevents application of common model selection strategies.

In this talk, we propose a new parametric approach for graphical interaction modelling of multivariate stationary time series. A conceptually and computationally simple two-stage algorithm that uses convex optimization is presented. The order of the vector autoregressive model is selected in the first stage, and the sparsity pattern of the inverse of spectral density matrix is estimated in the second stage. For the first stage, we derive a novel information theoretic criterion, namely a vector autoregressive form of the renormalized maximum likelihood criterion. For the second stage, we formulate a convex optimization problem for which we compute an exact and efficient solution. An important feature is the guaranteed stability of the estimated model. The performance of the newly proposed method is demonstrated in experiments with simulated and real world data.

**On the analysis of two-phase outcome-stratified designs in cluster-correlated settings**

Speaker: Claudia Rivera

Affiliation: Biostatistics and Epidemiology, Harvard T.H. Chan School of Public Health

When: Thursday, 12 May 2016, 3:00 pm to 4:00 pm

Where: Room 303-B09

[Please note the unusual room.]

In resource-limited settings researchers often only have access to aggregated or group-level data. Analyses based on such data are well-known to be susceptible to a range of biases, collectively termed ecological bias. Unless one is willing to make untestable assumptions, the only reliable approach to valid estimation and inference is to collect a sub-sample of individual-level data. One cost-efficient strategy is the two-phase design in which the population is initially stratified by a binary outcome and categorical variable or combination of categorical variables. While methods for two-phase designs are well-known, to our knowledge they have focused exclusively on settings in which individual study units are independent; that is, no methods exist for the design and analysis of two-phase designs in cluster-correlated data settings. To fill this gap we develop for valid estimation and inference, the first based on inverse-probability weighting (IPW) and the second on a pseudo-likelihood. For both methods, user-specified working correlation matrixes can be specified with inference based on a robust sandwich estimator. For the IPW estimator we develop an calibration algorithm that makes use of the readily-available group-level data to improve efficiency. A comprehensive simulation study is conducted to evaluate small-sample operating characteristics of the proposed methods. Finally, the methods are applied to a large implementation science project examining the effect of an enhanced community health worker program to improve adherence with WHO guidelines for at least 4 antenatal visits, among nearly 200,000 pregnancies in Dar es Salaam, Tanzania.

**How Well Does That Tree Fit Your Data?**

Speaker: Daisy Shepherd

Affiliation: Dept. Statistics, U. Auckland

When: Friday, 15 April 2016, 11:00 am to 12:00 pm

Where: Room 303-310

[Please note the unusual day.]

Phylogenetics focuses on a critical problem in biology - the ability to derive the evolutionary history between groups of organisms. Statistical models are used to describe the changes in DNA that occur over evolutionary time, to help determine how closely related these groups are. As a result, our ability to accurately explain evolutionary relationships depends heavily on the use of an appropriate statistical model.

Tests to assess the relative goodness of fit have been discussed long and wide in a phylogenetic context. As a result, there is a strong and impressive body of work regarding the identification of the best model from among a set of given phylogenetic models.

However, one part of statistical conduct remains missing - the test for goodness of fit between model and data. The purpose of such a test is to provide the possibility to reject the best model due to a lack of fit to the data. Just because our selected model has been deemed the 'best' fit, does not necessarily imply that the model adequately describes the behaviour in the data. This is a fundamental step of the statistical sciences, but is unfortunately one that still remains wanting within the phylogenetic world.

Goodness of fit tests have seldom been discussed in the last decade, and if then, with little to no exposure. At present, a number of approaches remain. However, these are not without flaws and still require further development. In this talk we will provide an overview of the current approaches, before discussing how our research aims to develop alternative measures for goodness of fit that better address the peculiarities of phylogenetic data.

**Optimizing the cardiac patient's journey: Using mathematical modelling to guide patient flow, staffing, scheduling and resource allocation through the cardiac unit**

Speaker: June Lau

Affiliation: Dept. Statistics, U. Auckland

When: Thursday, 14 April 2016, 3:00 pm to 4:00 pm

Where: Room 303-412

[Please note the unusual place and time.]

Queueing theory has been widely used in telecommunications, manufacturing and healthcare. However, real world complications often lead to discrete event simulations to assist in the study of system performance. In this talk, we introduce a time-varying queuing model for a standalone intensive care unit (ICU) and discuss issues that arise due to time-scales of service durations; relative speed of change in inter-arrival rates and service durations; and server vacations. We present current numerical approximations to time-varying models and its application in different time-scales. Thus motivated, we present a real-world validated simulation model of a joint intensive and high-dependency unit. Such a model is a first step in this research into approximations for time-varying queuing models with multi-server vacations.

**Dependence Logic**

Speaker: Jouko Vaananen

Affiliation: University of Helsinki

When: Wednesday, 13 April 2016, 12:00 pm to 1:00 pm

Where: Room 303S-561

[Cross-listed from the Computer Science seminar series. Please note the unusual place and time. Light refreshments to follow.]

I will give an overview of dependence logic, the goal of which is to establish a basic logical theory of dependence and independence underlying seemingly unrelated subjects such as causality, bound variables in logic, random variables, database theory, the theory of social choice, Mendelian genetics, and even parts of quantum physics. There is an abundance of new results in this field demonstrating a remarkable convergence. The concepts of (in)dependence in the different fields of humanities and sciences have surprisingly much in common and a common logic is starting to emerge.

Bio. Jouko Vaananen is a Professor of Mathematics and the Dean of the Faculty of Science of the University of Helsinki. He is also a Professor Emeritus of Mathematical Logic and Foundations of Mathematics at the University of Amsterdam. He works in several fields of logic, such as set theory, model theory, computer science logic, and foundations of mathematics. Much of the current emphasis of his work is best described by his two books "Dependence logic", Cambridge University Press 2007, and "Models and Games", Cambridge University Press 2011. Jouko is a distinguished visitor of the University of Auckland in 2016.

**Criminal Genius: How We Know What Little We Know about High-IQ Crime**

Speaker: James Oleson

Affiliation: Sociology, U. Auckland

When: Friday, 8 April 2016, 3:00 pm to 4:00 pm

Where: Fale Pasifika Complex, Bldg 273, Level 1, Rm 104

[Cross-listed from the Compass seminar series. Plesae note the unusual day and place. All welcome; drinks and nibbles to follow.]

Intelligence is said to be the most studied human faculty, and within criminology, below-average intelligence (operationalized as IQ) is a well-established correlate of delinquency and crime. Nevertheless, even though the association between low IQ and crime has been studied for nearly a century, little is known about offenders with high IQ scores. A handful of studies have examined bright delinquents; virtually no criminological research has been conducted with gifted adults. This is an elusive population. The current research describes the self-reported offending of 465 high-IQ individuals (mean IQ = 148.7) and 756 controls (mean IQ = 115.4) across 72 different offences (ranging in seriousness from abuse of work privileges to homicide). This presentation will focus on the design and implementation of the study and the analytical work performed by COMPASS. It will also describe some key findings, such as the unexpected discovery that high-IQ respondents reported higher prevalence and incidence rates than did controls.

James Oleson is an Associate Professor of Criminology at the University of Auckland. After a stint in the US Navy's nuclear propulsion programme, he earned his BA in psychology and anthropology from St. Mary's College of California, his MPhil and PhD in criminology from the University of Cambridge, and his JD from the University of California, Berkeley. He taught criminology and sociology at Old Dominion University until he was selected as one of the four U.S. Supreme Court Fellows for the 2004-05 year. At the end of his fellowship, he was appointed as Chief Counsel to the newly-formed Criminal Law Policy Staff of the Administrative Office of the U.S. Courts, and served in that capacity between 2005 and 2010. Since arriving at the University of Auckland in 2010, he has taught in the areas of psychological criminology, sentencing, and penology. In 2013, he used his sabbatical to study prison museums across Europe and the United States, and is working on a book about prisons in popular culture. His monograph on high-IQ offenders, Criminal Genius: A Portrait of High-IQ Offenders, will be published in late 2016 by the University of California Press.

**De Bruijn Graphs and Fragile Regions in the Human Genome**

Speaker: Pavel Pevzner

Affiliation: Department of Computer Science and Engineering, University of California at San Diego

When: Wednesday, 6 April 2016, 2:00 pm to 3:00 pm

Where: Room 303S-561

[Cross-listed from the Computer Science seminar series. Please note the unusual place and time.]

A fundamental question in chromosome evolution is whether there exist fragile regions (rearrangement hotspots) where chromosomal rearrangements are happening over and over again. We demonstrate that the fragile regions do exist and further show that they are subject to a "birth and death" process, implying that fragility has limited evolutionary lifespan. To establish this biological result we will prove some theorems about the breakpoint graphs, the workhorse of genome rearrangement studies. We further illustrate that both breakpoint graphs and de Bruijn graphs are special cases of a more general notion of A-Bruijn graphs that found many applications in computational biology.

Dr. Pevzner is Ronald R. Taylor distinguished professor of computer science and adjunct professor of mathematics at UCSD where he directs the National Technology Center for Computational Mass Spectrometry. He holds Ph.D. (1988) from Moscow Institute of Physics and Technology, Russia. He was named Howard Hughes Medical Institute Professor in 2006. He was elected the Association for Computing Machinery Fellow (2010) for "contribution to algorithms for genome rearrangements, DNA sequencing, and proteomics

**Genetic Risk Prediction Using a Spatial Autoregressive Model**

Speaker: Yalu Wen

Affiliation: Dept. Statistics, U. Auckland

When: Wednesday, 6 April 2016, 11:00 am to 12:00 pm

Where: Room 303-310

The translation of human genome discoveries into personalized prediction and prevention represents one of the major challenges in the coming decades. While the advance in high-throughput sequencing technology enables the investigation of the role of both common and rare variants in disease risk prediction, the massive amount of potential predictors and low frequency of rare variants pose great analytical challenges on risk prediction. In this talk, I will describe a spatial autoregressive model with adaptive lasso to simultaneously select risk predictors and estimate their effect sizes. We showed the sparsity and oracle properties of the estimators. Through simulations, we demonstrated the proposed method achieved a higher or comparable accuracy over commonly used GBLUP method under various disease models. We further showed that our method could correctly select predictive genes and shrunk the effects of noise genes to zero.

**Evaluating the outcomes of health services when you can't do an experiment - how about a quasi-experiment?**

Speaker: Tom Robinson

Affiliation: School of Population Health, U. Auckland

When: Friday, 1 April 2016, 3:00 pm to 4:00 pm

Where: Fale Pasifika Complex, Bldg 273, Level 1, Rm 104

Health services are constantly needing to change and in many cases the outcomes of these changes need to be evaluated. However usually RCTs cannot be undertaken. This presentation will consist of two parts.

1) A review of current NZ practice in health services non-experimental outcome evaluation. After searching 4 databases and the NZ Medical Journal, 52 health service outcome evaluations were found that used non-experimental methods and evaluated against the Cochrane's Collaboration's Effective Practice and Organisation of Care group guidance. Most studies did not meet the criteria for inclusion in Cochrane reviews because of their study design. Of those that could be included only a minority had no or few areas of potential bias.

2) A presentation of a quasi-experimental outcome evaluation which was completed in 2013 at Waitemata DHB. An outcome evaluation was undertaken of a programme that aimed to reduce readmission to hospital within a month of discharge. The results will be presented of a number of different evaluation designs including uncontrolled before and after, an interrupted time series, and a regression discontinuity design. Some of the issues encountered in carrying out quasi-experimental studies will be discussed.

Tom Robinson is a public health physician and works three days a week in the Planning & Funding Team for Auckland & Waitemata DHBs. Some of this work involves advising on and undertaking evaluation of health services. He is also undertaking a PhD with the School of Population Health which is looking at the role of quasi-experimental methodology in evaluating the outcomes of our health service changes. Key questions are whether current practice is satisfactory, whether quasi-experimental designs can produce valid assessments of the causal effect of interventions, whether these methods are feasible, and how they should be undertaken.

**Classification on High-dimensional Data Using Nonparametric Mixtures**

Speaker: Shengwei Hu

Affiliation: Dept. Statistics, U. Auckland

When: Thursday, 31 March 2016, 4:00 pm to 5:00 pm

Where: Room 303-310

[Please note the unusual day and time.]

Recently, the problem of classifying high-dimensional data has become very common in genomics, electronic-commerce and many other fields. The concept of high-dimensional data refers to a data set which the number of features p is much larger than the number of observations N (p >> N). Unfortunately, standard classification methods are known to be unsuitable for this type of data. We propose a density-based classifier which first uses nonparametric mixtures to provide an estimated density for each class, and then groups a new observation to the class that has the highest posterior density value. The use of a diagonal covariance matrix in the density estimation process remarkably saves computing time and gives the potential to deal with high-dimensional classification problems. A number of simulated and real-world case studies show that our classifier provides competitive results compared with other commonly used classification methods.

**Issues with Overdispersion and Related Models**

Speaker: John Hinde

Affiliation: School of Mathematics, Statistics and Applied Mathematics, National University of Ireland

When: Wednesday, 30 March 2016, 11:00 am to 12:00 pm

Where: Room 303-310

Many models for overdispersed data originated in single sample applications as descriptive summaries for frequency data, where they are typically easily understood. More recently interest has focused on their use in statistical modeling with regression type models for one, or more, of the model parameters and, typically, linear predictors used with appropriate link functions. While these models provide much flexibility, there are some issues in their application that seem to be not well understood. This talk will raise a few of these, including the testing of fixed effects in overdispersed models, residuals and model diagnostics, goodness-of-fit, and model comparisons involving testing overdispersion and zero-inflation. Examples will be used to highlight possible problems and some possible solutions/approaches presented, although there is still scope for more work on many of these aspects. Also, while software has become available for fitting many of these models (procedures in SAS, packages in R, ...) users need to take care - different implementations can lead to different results!

**Is there a Statistics Crisis in Science?**

Speaker: Miodrag Lovric

Affiliation: Editor-in-Chief of the International Encyclopedia of Statistical Science

When: Wednesday, 23 March 2016, 11:00 am to 12:00 pm

Where: Room 303-310

Despite some recent vigorously promulgated criticisms of statistical methods - particularly significance tests - methodological limitations, and misuses of statistics, we are the ones still living in the golden age of statistics. Statistics plays a vital role in collecting, summarising, analysing, and interpreting data in almost all branches of science from supporting big bang theory to the moving of particles at the sub-atomic level. This seminar will reflect on the past, present and future of statistics based on the following topics:

1. What is Statistics and what Statistical Science: towards the unified definition

2. Statistics and mathematics, is statistics a separate discipline?

3. The origin of the term statistics - for the first time the correct origin will be disclosed in this seminar

4. Some relatively unknown but important statistical stories of success and high appreciation of statisticians in the past

5. The golden age of statistics versus relatively poor public image of it in many countries - statistics as a grammar of science or vehicle of modern research

6. Reasons for poor public perception, ways and actions to overcome this

7. Future of statistics education - of students, scientific workers, practical researchers and general public

8. Crisis of statistics education in many developing countries

9. Recent harsh attacks on statistics as a discipline

10. Controversies in statistics, short overview, including Jeffreys-Lindley paradox

11. Towards shifting a paradigm in the Kuhn's sense in Statistical testing

12. Challenges and rise of statistics in 21st century.

The last part of the seminar is based on joint research with Prof. C.R. Rao.

**Mixed Graphical Models with Applications to Integrative Cancer Genomics**

Speaker: Genevera Allen

Affiliation: Rice University

When: Friday, 18 March 2016, 4:00 pm to 5:00 pm

Where: Room 303-310

[Please note the unusual day and time.]

"Mixed Data" comprising a large number of heterogeneous variables (e.g. count, binary, continuous, skewed continuous, among others) is prevalent in varied areas such as imaging genetics, national security, social networking, Internet advertising, and our particular motivation - high-throughput integrative genomics. There have been limited efforts at statistically modeling such mixed data jointly. In this talk, we address this by introducing several new classes of Markov Random Fields (MRFs), or graphical models, that yield joint densities which directly parameterize dependencies over mixed variables. To begin, we present a novel class of MRFs arising when all node-conditional distributions follow univariate exponential family distributions that, for instance, yield novel Poisson graphical models. Next, we present several new classes of Mixed MRF distributions built by assuming each node-conditional distribution follows a potentially different exponential family distribution. Fitting these models and using them to select the mixed graph in high-dimensional settings can be achieved via penalized conditional likelihood estimation that comes with strong statistical guarantees. Simulations as well as an application to integrative cancer genomics demonstrate the versatility of our methods.

Joint work with Eunho Yang, Pradeep Raviukmar, Zhandong Liu, Yulia Baker, and Ying-Wooi Wan.

**Using data from Youth2000 to inform clinicians, research and policy**

Speaker: Theresa Fleming and Simon Denny

Affiliation: Psychological Medicine and Paediatrics, U.

When: Friday, 18 March 2016, 3:00 pm to 4:00 pm

Where: Fale Pasifika Complex, Bldg 273, Level 1, Rm 104

The Adolescent Health Research Group at the University of Auckland has been tracking the health and wellbeing of young people in New Zealand with the Youth2000 survey series in 2001, 2007 and 2012. The team has been using innovative and world leading technology to administer the survey, and has surveyed more than 25,000 young people to date. This presentation will discuss the results from these surveys, place the results in the context of global trends and discuss how the Youth2000 surveys have been used to examine social environments such as socio-economic deprivation, schools and communities, and their relationships with student health and wellbeing. There will be time to discuss plans for the next survey and opportunities for collaboration.

Simon Denny and Theresa (Terry) Fleming are investigators with the Adolescent Health Research Group which carries out the New Zealand adolescent health surveys (The Youth2000 series). They are the nominated co-leads for the next planned Youth2000 survey. Simon is an adolescent specialist physician and is an associate professor in the Department of Paediatrics: Child and Youth Health. Terry is a senior lecturer in youth health/youth mental health in the Departments of Paediatrics and Psychological Medicine. Her major research areas are in online interventions for youth mental health and population youth health and well-being.

http://www.arts.auckland.ac.nz/en/about/our-research/research-centres-and-archives/compass.html

**Interpreting forensic spectrographic evidence using functional data**

Speaker: Anjali Gupta

Affiliation: Dept. Statistics, U. Auckland

When: Friday, 18 March 2016, 11:00 am to 12:00 pm

Where: Room 303-310

Forensic science is a discipline where scientific methods are applied to law enforcement in order to solve crimes. Forensic scientists play a vital role in solving cases, especially when there are no known witnesses. One of the most common type of evidence available in case of hit-and-run crimes and burglaries is glass. It becomes important for the forensic scientists to precisely and repeatedly measure the physical properties (refractive index and elemental characteristics) of glass and to understand the limitations of the measurement techniques. The forensic practitioners hold the responsibility of examining and interpreting the evidence.

Laser Induced Breakdown Spectroscopy (LIBS) is an analytical chemistry technique that has the potential to identify and measure the elements in a substance of interest. LIBS is applicable for any phase (solid, liquid or gas). LIBS has gained importance in the fields of material identification, biomedical science, forensics, military, art and archaeology in the recent years. In spite of its many advantages over other instruments, it has a few drawbacks such as poor precision and repeatability. That is, different spectra may be observed for the same sample over successive runs. This indicates poor precision of the instrument. Thus, LIBS is still not accepted as an analytical method for legal purposes.

In this talk we discuss an experiment designed to examine the variability in the spectra between the runs on the same day, and the variability between runs on different days using samples from a standard reference glass, and discuss the conclusions that can be drawn from the results. We also talk about the scope of this project, where we will try to use functional data analysis to analyse the spectrographic curves and compute the likelihood ratios, which will further be used to interpret the glass evidence.

**Current Climate in the Teaching of School Level Stats in the USA**

Speaker: Christine Franklin

Affiliation: University of Georgia

When: Wednesday, 16 March 2016, 11:00 am to 12:00 pm

Where: Room 303-310

The United States realizes there is a need to achieve a level of quantitative literacy for its high school graduates to prepare them to thrive in the modern world. There have been serious efforts to integrate more statistics at the school level since the 1980s. The Common Core State Standards for mathematics are the most recent effort at national standards for the United States. Grades 6-12 include standards for the teaching of statistics and probability that range from counting the number in each category to determining statistical significance through the use of simulation and randomization tests.

This presentation will provide a brief summary of the current implementation of the statistics standards at school level in the U.S. (the victories but also the continued struggles), an overview of the resources available and in development that support the statistics standards, the current efforts at teacher preparation, the desired assessment of statistics at the school level on the high stakes national tests versus the actual assessment, and how the USA currently compares to New Zealand. Much of the presentation is based upon my experiences in New Zealand as a 2015 Fulbright Scholar visitor in the University of Auckland Department of Statistics.

**Socioeconomic status and all-cause mortality: Testing life course hypotheses in New Zealand**

Speaker: Liza Bolton

Affiliation: COMPASS and Department of Statistics

When: Friday, 11 March 2016, 3:00 pm to 4:00 pm

Where: Fale Pasifika Complex, Bldg 273, Level 1, Rm 104

[Cross-listed from the COMPASS seminar series. Please note the different time and place. All welcome; drinks and nibbles to follow.]

Socioeconomic status (SES) has been shown to be related to mortality in a range of contexts. Low SES tends to increase mortality risk, but how exposure patterns across the life-course are related to mortality is not well understood, and have not been explored in the New Zealand context. This research uses New Zealand longitudinal census data to explore whether there is evidence of associations between mortality and cumulative exposure to low SES (accumulation hypothesis), changes in SES between life stages (social mobility hypothesis) and exposure to low SES during specific life stages (sensitive period hypothesis). Understanding these hypotheses in the New Zealand context may allow for better-targetted interventions to address mortality inequalities, for example, disparities between ethnic groups.

Liza Bolton is a PhD Candidate in Statistics at the University of Auckland, working with the Centre of Methods and Policy Application in the Social Sciences (COMPASS). Liza began her PhD in March 2015, under the supervision of Professor Alan Lee (Department of Statistics) and Dr Barry Milne (COMPASS).

http://www.arts.auckland.ac.nz/en/about/our-research/research-centres-and-archives/compass.html

**Dynamic microsimulation of programs and investments to reduce poverty and support development**

Speaker: Martin Spielauer

Affiliation: Consultant, World Bank

When: Friday, 4 March 2016, 3:00 pm to 4:00 pm

Where: Fale Pasifika Complex, Bldg 273, Level 1, Rm 104

[Cross-listed from the COMPASS seminar series. Please note the different time and place.]

Microsimulation is currently applied mostly in developed countries for the analysis and fine-tuning of policies with a longitudinal component like the sustainability of pension and health systems in the context of demographic change. This discussion aims at assessing the potential strengths and limitations of dynamic microsimulation in the context of applications for the developing world. With its ability to simultaneously handle distributional issues, population change, and the potentially strong demographic down-stream effects of policies, dynamic microsimulation can serve as a powerful tool complementing conventional data analysis and projections. Given the typically early stages in the design and implementation of social security systems, together with highly vulnerable populations, today's policy decisions potentially have huge impacts both on current living conditions and future development.

Dr. Martin Spielauer is an expert in dynamic microsimulation with 15 years of experience in microsimulation modeling. He has developed or contributed to models in a wide range of subject matter fields including demography, education, saving and wealth, pension systems, poverty, and health. He has been engaged in microsimulation projects and microsimulation training around the world. Dr. Spielauer has published both in peer-reviewed journals as well as publications of governments and international agencies. He is currently working as an independent consultant providing technical assistance in microsimulation model development.

http://www.arts.auckland.ac.nz/en/about/our-research/research-centres-and-archives/compass.html

**Evaluating stationarity via change-point alternatives with applications to fMRI data**

Speaker: Claudia Kirch

Affiliation: Institute for Mathematical Stochastics, Otto-von-Guericke University

When: Wednesday, 2 March 2016, 11:00 am to 12:00 pm

Where: Room 303-310

Functional magnetic resonance imaging (fMRI) is now a well established technique for studying the brain. However, in many situations, such as when data are acquired in a resting state, it is difficult to know whether the data are truly stationary or if level shifts have occurred. To this end, change-point detection in sequences of functional data is examined where the functional observations are dependent and where the distributions of change-points from multiple subjects are required. Of particular interest is the case where the change-point is an epidemic change -- a change occurs and then the observations return to baseline at a later time. The case where the covariance can be decomposed as a tensor product is considered with particular attention to the power analysis for detection. This is of interest in the application to fMRI, where the estimation of a full covariance structure for the three-dimensional image is not computationally feasible. Using the developed methods, a large study of resting state fMRI data is conducted to determine whether the subjects undertaking the resting scan have non-stationarities present in their time courses. It is found that a sizeable proportion of the subjects studied are not stationary. The change-point distribution for those subjects is empirically determined, as well as its theoretical properties examined.

**Terraces, Partial Terraces and Phylogenetic Inference**

Speaker: Arndt von Haeseler

Affiliation: Center for Integrative Bioinformatics Vienna

When: Wednesday, 24 February 2016, 11:00 am to 12:00 pm

Where: Room 303-310

We discuss the concept of phylogenetic terraces to improve the computational efficiency of phylogenetic inference programs. To this end, we provide the rules to detect terraces during tree search. More precisely we study the induced partition trees (i.e. gene trees that live inside species trees) and how topological rearrangements on species tree changes the associated partition trees.

We characterise the changes for Nearest Neighbour Interchange (NNI), Subtree Pruning and Regrafting and Tree Bisection and Reconnection operations. We further generalize the concept of terraces to partial terraces and study their occurrence for real alignments using NNI neighbourhoods.

Secondly, we provide a phylogenetic terrace aware data structure (PTA) for the efficient analysis of concatenated multiple alignments. Using PTA and the rules developed to detect (partial) terraces in the presence of missing data one saves computational time by avoiding unnecessary recompilations.

**Three talks**

Speaker: Yoonsuh Jung, Chaitanya Joshi and Bob Durrant

Affiliation: Department of Statistics, U. Waikato

When: Wednesday, 17 February 2016, 11:00 am to 12:00 pm

Where: Room 303-310

[Note: These three short talks will take place from **11:30-1:00**]

***Talk 1 - Yoonsuh Jung

Title:

Robust Regression for Highly Corrupted Response by Shifting Outliers

Abstract:

Outlying observations are often disregarded at the sacrifice of degrees of freedom or downsized via robust loss functions (for example, Huber's loss) to reduce the undesirable impact on data analysis. In this paper, we treat the outlying status of each observation as a parameter and propose a penalization method to automatically adjust the outliers. The proposed method shifts the outliers towards the fitted values, while preserve the non-outlying observations. We also develop a generally applicable algorithm in the iterative fashion to estimate model parameters and demonstrate the connection with the maximum likelihood based estimation procedure in the case of least squares estimation. We establish asymptotic property of the resulting parameter estimators under the condition that the proportion of outliers do not vanish as sample size increases. We apply the proposed outlier adjustment method to ordinary least squares and lasso-type of penalization procedure and demonstrate its empirical value via numeric studies. Furthermore, we study applicability of the proposed method to two robust estimators, Huber's robust estimator and Huberized lasso, and demonstrate its noticeable improvement of model fit in the presence of extremely large outliers.

***Talk 2 - Chaitanya Joshi

Title:

Fast Bayesian Inference using Low Discrepancy Sequences

Abstract:

In some cases, computational benefit can be gained by exploring the parameter space using a deterministic set of grid points instead of a Markov chain. We view this as a numerical integration problem and make three unique contributions. First, we explore the space using low discrepancy point sets instead of a grid. This allows for accurate estimation of marginals of any shape at a much lower computational cost than a grid based approach and thus makes it possible to extend the computational benefit to a parameter space with higher dimensionality (10 or more). Second, we propose a new, quick and easy method to estimate the marginals using least squares polynomials and prove the conditions under which this polynomial will converge to the true marginals. Our results are valid for a wide range of point sets including grids, random points and low discrepancy points. Third, we show that further accuracy and efficiency can be gained by taking into consideration the functional decomposition of the integrand and illustrate how this can be done using anchored f-ANOVA on weighted spaces.

***Talk 3 - Bob Durrant

Title:

Random Projections, Label Flipping, and Classification

Abstract:

Random projection is a simple and computationally cheap linear dimensionality reduction scheme, with some pleasing theoretical properties and a growing list of applications. In this talk I will briefly survey some of the earlier theory, and then give some distribution-free guarantees for the performance of a linear classifier (aka discriminant) working with randomly-projected data -vs- working with unprojected data, by quantifying the probability of "label flipping" in the projected space. Surprisingly this "flipping probability" does not depend on the dimensionality of the original data and I will give some geometric intuition for why this is the case. Finally, I describe some geometric properties of data that ensure the flipping probability will typically be small.

**Some Extensions of Matrix Visualization: the GAP Approach**

Speaker: Chun-Houh Chen

Affiliation: Institute of Statistical Science, Academia Sinica

When: Tuesday, 16 February 2016, 3:00 pm to 4:00 pm

Where: AUT campus - room WT121, corner of Wakefield, Rutland and Queen Streets

[This seminar is cross-listed from the AUT Computer and Mathematical Sciences seminar series, and takes place on the AUT campus at WT121, on the corner of Wakefield, Rutland and Queen Streets. See also the talk by the speaker on Friday 12 February 11:00 in the University of Auckland seminar series.]

Matrix visualization (MV) is suitable for visualizing thousands of variables in a single display. In this talk I'll summarize our works on extending the environment of matrix visualization via the Generalized Association Plots (GAP) approach for the following data types.

1. Matrix visualization for high-dimensional categorical data structure

For categorical data, MCA (multiple correspondence analysis) is most popular for visualizing reduced joint space for samples and variables of categorical nature. But similar to its continuous counter part, PCA (principal component analysis), MCA loses its efficiency when data dimensionality gets really high. In this study we extend the framework of matrix visualization from continuous data to categorical data. Categorical matrix visualization can effectively present complex information patterns for thousands of subjects on thousands of categorical variables in a single matrix visualization display.

2. Matrix Visualization for High-Dimensional Data with a Cartography Link

When an cartography link is attached to each subject of a high-dimensional categorical data, it is necessary to use a geographical map to illustrate the pattern of subject (region)-clusters with variable-groups embedded in the high-dimensional space. This study presents an interactive cartography system with systematic color-coding by integrating the homogeneity analysis into matrix visualization.

3. Matrix visualization for symbolic data analysis

Symbolic data analysis (SDA) has gained popularity over the past few years because of its potential for handling data having a dependent and hierarchical nature. Here we introduce matrix visualization (MV) for visualizing and clustering SDA data using interval-valued symbolic data as an example; it is by far the most popular SDA data type in the literature and the most commonly encountered one in practice. Many MV techniques for visualizing and clustering conventional data are converted to SDA data, and several techniques are newly developed for SDA data. Various examples of data with simple to complex structures are brought in to illustrate the proposed methods.

4. Covariate-adjusted matrix visualization via correlation decomposition

In this study, we extend the framework of matrix visualization (MV) by incorporating a covariate adjustment process through the estimation of conditional correlations. MV can explore the grouping and/or clustering structure of high-dimensional large-scale data sets effectively without dimension reduction. The benefit is in the exploration of conditional association structures among the subjects or variables that cannot be done with conventional MV.

Several biomedical examples will be employed for illustrating the versatility of the GAP approach matrix visualization.

Generalized Association Plots (GAP) http://gap.stat.sinica.edu.tw/Software/

https://www.stat.auckland.ac.nz/seminar/#at-2016-02-12

**The population of long-period transiting exoplanets**

Speaker: Dan Foreman-Mackey

Affiliation: Astronomy Department, University of Washington

When: Tuesday, 16 February 2016, 11:00 am to 12:00 pm

Where: Room 303-412

[Please note the non-standard day and room.]

The Kepler Mission has discovered thousands of exoplanets and revolutionized our understanding of their population. This large, homogeneous catalog of discoveries has enabled rigorous studies of the occurrence rate of exoplanets and extra-Solar planetary systems as a function of their physical properties. Transit surveys like Kepler are most sensitive to planets with shorter orbital periods than the gas giant planets that dominate the dynamics of our Solar System. I will present my work to develop and apply a fully-automated search for the transits of long-period exoplanets in the archival Kepler light curves. Since the method involves no human intervention, I precisely measured the completeness function of the catalog and place constraints on the occurrence rate of exoplanets with orbital periods longer than 2 years. These long-period exoplanets are common and the occurrence rate increases with smaller planetary radius.

**Timing black holes and neutron stars: unravelling fundamental physics with X-ray variability**

Speaker: Daniela Huppenkothen

Affiliation:

When: Monday, 15 February 2016, 11:00 am to 12:00 pm

Where: Room 303-310

[Please note the non-standard day.]

The sky in X-rays is incredibly dynamic. Neutron stars - the remnants of stellar explosions - and black holes vary on time scales ranging from milliseconds to decades, their brightness occasionally changing by several orders of magnitude within seconds or minutes. Studying this variability is one of the best ways to understand key physical processes that are unobservable on Earth: general relativity in strong gravity, extremely dense matter and the strongest magnetic fields known to us are just a few examples.

In this talk, I will give an overview of X-ray variability in neutron stars and black holes, and show how studying variability helps us unravel the physics governing these sources. I will present key statistical methods and models we have been developing recently as well as point out the opportunities and challenges of the spectral-timing revolution which we are currently moving toward with data from current and future space missions.

**Matrix Visualization: New Generation of Exploratory Data Analysis**

Speaker: Chun-Houh Chen

Affiliation: Institute of Statistical Science, Academia Sinica

When: Friday, 12 February 2016, 11:00 am to 12:00 pm

Where: Room 303-310

[Please note the non-standard day. See also the talk in the AUT seminar series on Tuesday 16 February 3:00]

"It is important to understand what you CAN DO before you learn to measure how WELL you seem to have DONE it" (Exploratory Data Analysis: John Tukey, 1977)

Data analysts and statistics practitioners nowadays are facing difficulties in understanding higher and higher dimensional data with more and more complex nature while conventional graphics/visualization tools do not answer the needs. It is statisticians' responsibility to come up with graphics/visualization environments that can help users really understand what one CAN DO for complex data generated from modern techniques and sophisticated experiments.

Matrix visualization (MV) for continuous, binary, ordinal, and nominal data with various types of extensions provide users more comprehensive information embedded in complex high dimensional data than conventional EDA tools such as boxplot and scatterplot, with dimension reduction techniques such as principal component analysis and multiple correspondence analysis.

In this talk I'll summarize our work on creating MV environment for conducting statistical analyses and introducing statistical concepts into MV environment for visualizing more versatile and complex data structure. Many real world examples will be demonstrated in this talk for illustrating the strength of MV for visualizing all types of datasets collected from scientific experiments and social surveys.

Generalized Association Plots (GAP) http://gap.stat.sinica.edu.tw/Software/

See also:

https://www.stat.auckland.ac.nz/seminar/#at-2016-02-16

**Online appointment scheduling: a taxonomy and review**

Speaker: Maartje van de Vrugt

Affiliation: University of Twente

When: Thursday, 28 January 2016, 2:00 pm to 3:00 pm

Where: Room 303-610

Online appointment scheduling has received increasing academic attention over the past years. In online appointment scheduling customers receive a quick response to their appointment request. Thus, future demand for the same service period is still unknown when appointments are scheduled. We review the literature on online appointment scheduling according to a taxonomy, which consists of the number of appointments each customer requires, the number of resource types at the facility, and the horizon at which the scheduling decisions are made. We provide an overview of the scheduling decisions, the objectives, and the operations research methods applied in different application areas. We identify similarities and differences between application areas and categories of our taxonomy, and highlight gaps in the literature that represent opportunities for future research. By reviewing the literature across various application areas, we aim to stimulate mutual interchange of research results in the field of online appointment scheduling.

**Modern Probability Theory**

Speaker: Kevin Knuth

Affiliation: Departments of Physics and Informatics, University at Albany

When: Wednesday, 27 January 2016, 11:00 am to 12:00 pm

Where: Room 303-310

A theory of logical inference should be all-encompassing, applying to any subject about which inferences are to be made. This includes problems ranging from the early applications of games of chance, to modern applications involving astronomy, biology, chemistry, geology, jurisprudence, physics, signal processing, sociology, and even quantum mechanics. This talk focuses on how the theory of inference has evolved in recent history: expanding in scope, solidifying its foundations, deepening its insights, and growing in calculational power.

http://knuthlab.rit.albany.edu/

**Composite likelihood estimator, mixed model and complex sampling**

Speaker: Xudong Huang

Affiliation: U. of Auckland, Dept. of Statistics

When: Wednesday, 20 January 2016, 11:00 am to 12:00 pm

Where: Room 303-310

We want to fit a mixed model to a population distribution, but we have data from a complex (multistage) sample. The sampling is informative, that is, the model holding for the population is different from the model holding for the (biased) sample. Ignoring the sampling design and just fitting the mixed model to the sample distribution will lead to biased inference. Although both the model and sampling involve "clusters", the model clusters and sample clusters need not be the same. We will use a composite likelihood method to estimate the parameters of the population model. This can be done in more than one way, so we will study the efficiency of different approaches.