Department of Statistics


2020 Seminars


»
Integrative analysis of high-dimensional data with applications to soil microbiome data., Innocenter Amima
»
Bayes in the time of Big Data, Andrew Holbrook
»
The population properties of binary black holes with Bayesian hierarchical modelling, Eric Thrane
»
To summarise or not to summarise: A comparison of likelihood-free methods with and without summary statistics, Chris Drovandi
»
Subsampling MCMC: Bayesian inference for large data problems, Matias Quiroz
»
Bayesian functional regression for prediction and variable selection, Daniel Kowal
»
Use of model reparametrization to improve variational Bayes, Linda. S. L. Tan
»
On the Use of Dictionary Learning in Statistical Inference, Xiaomeng Zheng
»
Spectral and heritability analysis of EEG time series data using a nested Dirichlet process, Mark Fiecas
»
Understanding surgical outcomes in New Zealand using large national data sets, Luke Boyle
»
Likelihood approximations for time series and calibration of approximate Bayesian credible sets, Yifu Tang
»
Conditional maximum likelihood for mixed model under two phase design, Felicity Yi Xue
»
Posing Investigative Questions About Categorical Data – A Year 9 Case Study, Malia Puloka
»
Capstone courses – What, Why and How?
, Rachel Passmore
»
Robust Bayesian Analysis of Multivariate Time Series, Yixuan Liu
»
Developing Biological Models for the Probabilistic Genotyping of NGS Data, Kevin Cheng
»
Leveraging Pleiotropic association using sparse group variable selection in GWAS data, Benoit Liquet
»
Delineating Student and Teacher Comprehension of Randomness in Distributions, Amy Renelle
»
Multiple imputation through denoising and variational autoencoders, Agnes Yongshi Deng
»
Two-phase subsampling design for DNA sequencing with application in the relatedness of endangered species, Zoe Luo
»
Two “Paradoxes” to enrich your summer holiday, Mark Holmes
»
Past seminars

Seminars by year: Current | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020

Integrative analysis of high-dimensional data with applications to soil microbiome data.

Speaker: Innocenter Amima

Affiliation: Department of Statistics, Auckland University

When: Wednesday, 2 December 2020, 11:00 am to 12:00 pm

Where: 303-310

The Vineyard Ecosystem (VE) project intends to determine the long-term effects of management practices on commercial grapevine productivity and vine longevity. This novel project uses an ecology approach to understand the interconnections between components within the vineyard ecosystems.

Microbiome count data are high dimensional, compositional and over-dispersed with many zero abundances (~50-90% of all the values). Statistical methods have been proposed to perform dimension reduction however, some methods fail to account for microbiome properties. For our preliminary analysis, we used principal coordinate analysis (PCoA). Given that no species-specific parameters were estimated, it was a challenge identifying species present in the components. In this talk, preliminary results from the VE project will be discussed together with future research plans. I will briefly introduce factor analysis (FA), a model-based approach, for dimension reduction and parsimonious modelling.

Bayes in the time of Big Data

Speaker: Andrew Holbrook

Affiliation: department of biostatistics, University of California, Los Angeles

When: Thursday, 12 November 2020, 1:00 pm to 2:00 pm

Where: 303-610

Abstract: Big Bayes is the computationally intensive co-application of big data and large, expressive Bayesian models for the analysis of complex phenomena in scientific inference and statistical learning. Standing as an example, Bayesian multidimensional scaling (MDS) can help scientists learn viral trajectories through space and time, but its computational burden prevents its wider use. Crucial MDS model calculations scale quadratically in the number of observations. We mitigate this limitation through massive parallelization using multi-core central processing units, instruction-level vectorization and graphics processing units (GPUs). Fitting the MDS model using Hamiltonian Monte Carlo, GPUs can deliver more than 100-fold speedups over serial calculations and thus extend Bayesian MDS to a big data setting. To illustrate, we employ Bayesian MDS to infer the rate at which different seasonal influenza virus subtypes use worldwide air traffic to spread around the globe. We examine 5392 viral sequences and their associated 14 million pairwise distances arising from the number of commercial airline seats per year between viral sampling locations. To adjust for shared evolutionary history of the viruses, we implement a phylogenetic extension to the MDS model and learn that subtype H3N2 spreads most effectively, consistent with its epidemic success relative to other seasonal influenza subtypes.

Dr Holbrook is a Assistant Professor in the department of biostatistics, University of California, Los Angeles. His research interest includes Bayesian statistics (theory and methods) and hierarchical modelling, computational statistics and high-performance computing, spatial epidemiology, Alzheimer’s disease.

https://ph.ucla.edu/faculty/holbrook

The population properties of binary black holes with Bayesian hierarchical modelling

Speaker: Eric Thrane

Affiliation: Monash University in the School of Physics and Astronomy

When: Thursday, 22 October 2020, 1:00 pm to 2:00 pm

Where: 303-610

Bayesian inference finds elegant application in gravitational-wave astronomy thanks to the clear predictions of general relativity and the great simplicity with which gravitational-wave sources can be described. Gravitational-wave astronomers use Bayesian inference to solve a variety of problems, for example, to determine the masses of merging black holes and to work out the neutron star equation of state, which governs how matter behaves at the highest possible densities. As the catalog of gravitational-wave signals has grown to dozens, it is increasingly fruitful to apply the Bayesian method of hierarchical modelling to study the population properties of black holes and neutron stars. In this talk, which is pitched for a mixed audience of statisticians and physicists, I will discuss some of the most exciting discoveries from the field of gravitational-wave astronomy, and highlight how hierarchical modelling is used to answer emerging questions about the fate of massive stars and how black holes merge.

Prof Eric Thrane is a professor at Monash University in the School of Physics and Astronomy. His research interest includes the astrophysical inference using data from gravitational-wave observatories to answer questions such as: how do compact binaries form, what is the fate of massive stars, what is the nature of matter at the highest possible densities? https://users.monash.edu.au/~erict/

To summarise or not to summarise: A comparison of likelihood-free methods with and without summary statistics

Speaker: Chris Drovandi

Affiliation: School of Mathematical Sciences at the Queensland University of Technology (QUT)

When: Thursday, 15 October 2020, 1:00 pm to 2:00 pm

Where: 303-610

Likelihood-free methods are useful for parameter estimation of complex simulable models with intractable likelihood functions. Such models are prevalent in many disciplines including genetics, biology, ecology and cosmology. Likelihood-free methods avoid explicit likelihood evaluation by finding parameter values of the model that generate data close to the observed data. The general consensus has been that it is most efficient to compare datasets on the basis of a low dimensional informative summary statistic, sacrificing information loss for reduced dimensionality. More recently, researchers have explored various approaches for efficiently comparing empirical distributions in the likelihood-free context in an effort to avoid data summarisation. Here I will present some preliminary results comparing likelihood-free methods with and without data summarisation. This is joint work with David Frazier.

Dr Chris Drovandi is an Associate Professor in the School of Mathematical Sciences at the Queensland University of Technology (QUT), Aus. His research interests are in Bayesian algorithms for complex models, optimal Bayesian experimental design methods and the translation of Bayesian methods across many disciplines.

Subsampling MCMC: Bayesian inference for large data problems

Speaker: Matias Quiroz

Affiliation: School of Mathematical and Physical Sciences at the University of Technology Sydney

When: Thursday, 8 October 2020, 1:00 pm to 2:00 pm

Where: 303-610

Subsampling MCMC: Bayesian inference for large data problems. Abstract: This talk reviews our work on subsampling MCMC, a posterior sampling algorithm to speed up inference for large datasets by combining i) data subsampling and ii) the pseudo-marginal Metropolis-Hastings framework. I will outline the general methodology and discuss some recent developments that have enabled the method to i) produce ''exact'' inference, ii) be applied to more complex datasets, and iii) use more efficient likelihood estimators.

Dr Quiroz is a lecturer of the School of Mathematical and Physical Sciences at the University of Technology Sydney. His research interests lie in the area of Bayesian Statistics and particularly Bayesian computations, such as Monte Carlo methods and variational Bayes. https://www.uts.edu.au/staff/matias.quiroz

Bayesian functional regression for prediction and variable selection

Speaker: Daniel Kowal

Affiliation: Rice University

When: Thursday, 1 October 2020, 1:00 pm to 2:00 pm

Where: Zoom

As high-resolution monitoring and measurement systems generate vast quantities of complex and highly correlated data, functional data analysis has become increasingly vital for many scientific, medical, business, and industrial applications. Functional data are typically high dimensional, highly correlated, and may be measured concurrently with other variables of interest. In the presence of such complexity, Bayesian models are appealing: they can accommodate multiple sources of dependence concurrently, such as multivariate observations, covariates, and time-ordering, and provide full uncertainty quantification via the posterior distribution. However, key challenges remain: constructing scalable algorithms, providing sufficient modeling flexibility alongside much-needed regularization, and producing accurate predictions of functional data. In this talk, I will present new Bayesian models and algorithms for high-dimensional and dynamic functional regression. These methods are motivated by two applications: (1) selecting which—if any—items from a sleep questionnaire are predictive of intraday physical activity and (2) using dynamic macroeconomic variables to forecast interest rate curves. Model implementations are available in an R package at GitHub.com/drkowal/dfosr.

Daniel Kowal from the Rice University, US. Dr. Kowal is a Dobelman Family Assistant Professor of the Rice University. His research area includes statistical methodology and algorithms for massive data sets with complex dependence structures, such as functional, time series, and spatial data. https://profiles.rice.edu/faculty/daniel-kowal

https://auckland.zoom.us/j/97233604751

Use of model reparametrization to improve variational Bayes

Speaker: Linda. S. L. Tan

Affiliation: National University of Singapore

When: Thursday, 24 September 2020, 1:00 pm to 2:00 pm

Where: Zoom

We propose using model reparametrization to improve variational Bayes inference for hierarchical models whose variables can be classified as global (shared across observations) or local (observation specific). Posterior dependence between local and global variables is minimized by applying an invertible affine transformation on the local variables. The functional form of this transformation is deduced by approximating the posterior distribution of each local variable conditional on the global variables by a Gaussian density via a second order Taylor expansion. Variational Bayes inference for the

reparametrized model is then obtained using stochastic approximation. Our approach can be readily extended to large datasets via a divide and recombine strategy. Using generalized linear mixed models, we demonstrate that reparametrized variational Bayes (RVB) provides improvements in both accuracy and convergence rate compared to state of the art Gaussian variational approximation methods.

https://auckland.zoom.us/j/98517239097

On the Use of Dictionary Learning in Statistical Inference

Speaker: Xiaomeng Zheng

Affiliation: Department of Statistics, Auckland University

When: Wednesday, 16 September 2020, 11:00 am to 12:00 pm

Where: Zoom

Dictionary learning (DL) is an active research area in statistics and computer science. The problem addressed by DL is the representation of the data as a sparse linear combination of the columns of a matrix called dictionary. Both the dictionary and the sparse representations are learned from the data.

In this talk, we show how DL can be employed in the imputation of univariate and multivariate time series. For univariate time series, we also introduce an iterative method that is better suited for long sequences of missing data. In the multivariate case, the main contribution consists in the use of a structured dictionary. In all DL imputation methods that we propose, the size of the dictionary and the sparsity level of the representation are selected by using information theoretic criteria. We also evaluate the effect of removing the trend/seasonality before applying DL. We present the results of an extensive experimental study on real-life data. The positions of the missing data are simulated by applying two strategies: (i) sampling without replacement, which leads to isolated occurrences of the missing data, and (ii) sampling via Polya urn model that is likely to produce long sequences of missing data. In all scenarios, the novel DL-based methods compare favourably with the state-of-the-art.

All these results have been obtained during the first year of the PhD studies done under the supervision of Dr. Ciprian Doru Giurcaneanu and Dr. Jiamou Liu. In the second part of the talk, we will present the plan of the research for the next two years, which is mainly focused on designing novel DL algorithms for the case when the representation errors have a non- Gaussian distribution.

https://auckland.zoom.us/j/96197821035?pwd=MEtXRVY5dWEzL3p1K1dJTUpFUC9IQT09

Spectral and heritability analysis of EEG time series data using a nested Dirichlet process

Speaker: Mark Fiecas

Affiliation: Division of Biostatistics, University of Minnesota

When: Thursday, 13 August 2020, 1:00 pm to 2:00 pm

Where: Via Zoom

Abstract: In this talk, we will analyze the spectral features of resting-state EEG time series data collected from twins enrolled in the Minnesota Twin Family Study (MTFS). Our goal is to calculate the heritability of the spectral features of the resting EEG data. Due to the twin design of the MTFS, the time series will have similar underlying characteristics across individuals. To account for this, we develop a Bayesian nonparametric modeling approach for estimating the spectral densities of the EEG data. In our methodology, we use Bernstein polynomials and a Dirichlet process (DP) to estimate each subject-specific spectral density. In order to estimate the spectral densities for the entire sample, we nest this model using a nested DP process. Thus, the top level DP clusters individuals with similar spectral densities and the bottom-level dependent DP fits a functional curve to the individuals within that cluster. We then extract relevant spectral features from the estimates of the spectral densities and estimate their heritability. This is joint work with Dr. Brian Hart (UnitedHealthGroup), Dr. Michele Guindani (UC Irvine), and Dr. Stephen Malone (Univ. of Minnesota).

Mark is an assistant professor, Division of Biostatistics, University of Minnesota. The focus of his research is to understand the structure and function of the human brain through the use of imaging technology. His research focuses on functional connectivity, imaging genetics and time series analysis. https://med.umn.edu/bio/stroke-study-research-team-1/mark-fiecas

https://auckland.zoom.us/j/96962166186?pwd=UGhTZUpiT1cwT1NsL0F5Q2ZNdExqZz09

Understanding surgical outcomes in New Zealand using large national data sets

Speaker: Luke Boyle

Affiliation: University of Auckland

When: Wednesday, 29 July 2020, 11:00 am to 12:00 pm

Where: 303-310

Weighing the benefits of surgery is complicated and involves many assumptions about future benefits and perceived risks. Shared decision making, where a patient and a clinician share responsibility for a clinical decision, is widely regarded as the best approach to weigh the risks and benefits of an operation. A clinician can provide clinical context and risk assessment, while a patient will explain their values and together, they reach a conclusion about how to proceed. A clinician’s assessment of risk largely comes from experience; however, it has been shown that models for mortality can better estimate risk than clinician assessments (Moonesinghe et al., 2013). Ultimately, the decision to undergo surgery needs to be the patient’s choice. Providing accurate and relevant information allows patients to make the best choice for themselves. We believe New Zealand is an ideal place to experiment with new ideas about risk communication and to develop new tools for assessing risk as we have access to high-quality, longitudinal, national datasets. This PhD project will investigate different approaches to providing information about risk to patients as well as considering how to fairly compare risk between groups.

Likelihood approximations for time series and calibration of approximate Bayesian credible sets

Speaker: Yifu Tang

Affiliation: Department of Statistics, University of Auckland

When: Wednesday, 22 July 2020, 11:00 am to 12:00 pm

Where: 303-310

In time series analysis, the exact likelihood of the data, under gaussian assumption, is often complicated or even intractable due to the difficulty of evaluating the inverse of the autocovariance matrix. Whittle (1957) proposed an approximation (known as Whittle likelihood) to the exact likelihood to facilitate computation. Various approximations have been suggested since then. These approximations often rely on certain assumptions. It is interesting to see what will happen to these approximations if some of the assumptions do not hold. Appropriate methods will be adopted to assess the performance of the approximations under various conditions.

Bayesian spectral analysis of time series, whether it is parametric or nonparametric, often relies on above-mentioned approximate likelihoods. Therefore, the resulting posterior distribution (often called pseudo-posterior) is different from the true posterior derived from the exact likelihood. The credible sets produced by pseudo-posterior is thus called approximate credible sets. It is of interest to know the coverage of an approximate credible set under the true posterior. Calibration procedures are proposed to solve this problem.

In my provisional year review seminar, I will briefly introduce likelihood approximations in time series setting, nonparametric Bayesian spectral analysis of time series and general calibration procedures for approximate Bayesian credible sets. Meanwhile, the preliminary results that I have obtained, and my future PhD research will also be discussed.

Conditional maximum likelihood for mixed model under two phase design

Speaker: Felicity Yi Xue

Affiliation: Department of Statistics, University of Auckland

When: Wednesday, 24 June 2020, 11:00 am to 12:00 pm

Where: via Zoom

Collecting detailed and useful information from a whole population is expensive and time-consuming. An appealing approach to address this problem is using a two phase design. The literature of two-phase designs has mainly focused in populations that are independent. However, this assumption is not met when populations are naturally clustered, such as studies of families or studies of repeated measures. There are several well-known methods designed for the analysis of independent two phase samples, for example, maximum likelihood (ML), weighted likelihood (WL) and conditional likelihood (CML ). However, there are fewer methods for the analysis of such samples when the population is cluster correlated. This includes weighted generalised estimating equations and mixed models with sampling weights.

This project aims to develop a new method for the analysis of correlated data by combining the theory of CML and the theory of mixed models.

Join Zoom Meeting

https://auckland.zoom.us/j/8239678860

Meeting ID: 823 967 8860

One tap mobile

+6448860026,,8239678860# New Zealand

+6498846780,,8239678860# New Zealand

Dial by your location

+64 4 886 0026 New Zealand

+64 9 884 6780 New Zealand

Meeting ID: 823 967 8860

Find your local number: https://auckland.zoom.us/u/awshYcxIc

Join by SIP

8239678860@130.216.15.174

8239678860@130.216.15.175

Join by H.323

130.216.15.174

130.216.15.175

Meeting ID: 823 967 8860

https://auckland.zoom.us/j/8239678860

Posing Investigative Questions About Categorical Data – A Year 9 Case Study

Speaker: Malia Puloka

Affiliation: Department of Statistics, Auckland University

When: Friday, 19 June 2020, 2:00 pm to 3:00 pm

Where: via Zoom

Posing investigative questions about data is a critical step in the investigation of statistical data because the question defines the purpose of the investigation. I found in my prior research with Year 13 students that they could not pose investigative questions about categorical variables when reasoning from an eikosogram, a tool for visualising categorical data. In my pilot study for this PhD research, with undergraduate introductory statistics students on reasoning about categorical representations, these findings were confirmed. The findings from both research studies revealed that students also lacked important reasoning skills. It is therefore proposed that strengthening the foundations for reasoning with categorical data should start early, ideally at the year 9 or 10 school level.

In my provisional year review seminar, I will present the rationale for my research on posing investigative questions about categorical data. I will also talk about my plan and methodology, and the data collection, which has been completed.

Zoom meeting ID 91969802393

https://auckland.zoom.us/j/91969802393

Capstone courses – What, Why and How?


Speaker: Rachel Passmore

Affiliation: Department of Statistics, University of Auckland


When: Wednesday, 20 May 2020, 3:00 pm to 4:00 pm

Where: via Zoom

Why has the Faculty of Science decided to introduce compulsory capstone courses? What has been the catalyst for this new initiative and what are the predicted benefits?

The first compulsory capstones will be delivered in 2021. Capstone courses are not a particularly new phenomenon globally but they are relatively rare in Science disciplines. In my provisional year review seminar, I will present findings from a literature review that identifies a number of issues that have contributed to the growth in popularity of capstone courses. I will briefly describe cognitive apprenticeship theory and explain how this theory can support statistics capstone course development. Six specific research questions are posed; methodology and data collection procedures to answer them will be described. Provisional results from data collected from a capstone-like course in Semester 2, 2019, will also be presented.

Join Zoom Meeting

https://auckland.zoom.us/j/2715549005

Meeting ID: 271 554 9005

https://auckland.zoom.us/j/2715549005

Robust Bayesian Analysis of Multivariate Time Series

Speaker: Yixuan Liu

Affiliation: Department of Statistics, The University of Auckland

When: Wednesday, 18 March 2020, 11:00 am to 12:00 pm

Where: 303-310

There is a surge in the literature of nonparametric Bayesian inference on multivariate time series over the last decade, many approaches consider modelling the spectral density matrix using the Whittle likelihood which is an approximation of the true likelihood and commonly employed for Gaussian time series. Meier et al. (2019) proposes a nonparametric Whittle likelihood procedure along with a Bernstein polynomial prior weighted by a Hermitian positive definite Gamma process. However, it is known that nonparametric techniques are less efficient and powerful than parametric techniques when the latter specify the parameters which model the observations perfectly. Therefore, Kirch et al. (2019) suggests a nonparametric correction to the parametric likelihood in the univariate case that takes the efficiency of parametric models and amends sensitivities through the nonparametric correction. Along with this novel likelihood, the Bernstein polynomial prior equipped with a Dirichlet process wight is employed.

In this talk, I will review these two approaches. Meanwhile, my current PhD research will be discussed which focuses on the extension of the corrected Whittle likelihood procedure to the multivariate case.

Developing Biological Models for the Probabilistic Genotyping of NGS Data

Speaker: Kevin Cheng

Affiliation: Department of Statistics, Auckland University

When: Wednesday, 4 March 2020, 3:00 pm to 4:00 pm

Where: 303-310

The analysis of forensic DNA methods using short tandem repeats (STR) and capillary electrophoresis (CE) methods have been used within forensic DNA laboratories internationally for over 20 years [1]. More recently, laboratories have started investigating next generation sequencing (NGS) methods for analysis of forensic samples. There are known limitations with a CE-based approach that NGS methods may be able to resolve[2].

An example of the benefits of an NGS approach is its capability of resolving iso-alleles. These are alleles with the same number of STRs, but with different DNA sequences[3]. Using the sequence information within these iso-alleles, different repeat patterns of the same repeat motif(s) can be observed allowing for increased discrimination of profiles and possibly improved resolution of contributors[4].

Analogous to how interpretation methods were limited during the early implementation of CE-based technology, the current methods for the interpretation of DNA profiles obtained through an NGS approach are limited. There is an increasing need for more sophisticated models to be developed to enable the probabilistic genotyping of NGS DNA profiles.

In order to develop biological models for the interpretation of NGS DNA profiles, we much first understand how these profiles are developed. In this presentation we describe models such as stutter, peak height variance, and locus specific PCR efficiency. We describe experiments designed to determine these models and present some findings.

Key words: NGS, probabilistic genotyping, DNA

References

1. Coble, M.D. and J.-A. Bright, Probabilistic genotyping software: An overview. Forensic Science International: Genetics, 2019. 38: p. 219-224.

2. de Knijff, P., From next generation sequencing to now generation sequencing in forensics. Forensic Science International: Genetics, 2019. 38: p. 175-180.

3. Warshauer, D.H., J.L. King, and B. Budowle, STRait Razor v2.0: The improved STR Allele Identification Tool – Razor. Forensic Science International: Genetics, 2015. 14: p. 182-186.

4. Young, B.A., et al., Estimating number of contributors in massively parallel sequencing data of STR loci. Forensic Science International: Genetics, 2019. 38: p. 15-22.

Leveraging Pleiotropic association using sparse group variable selection in GWAS data

Speaker: Benoit Liquet

Affiliation: University of Pau & Pays de L'Adour, ACEMS (QUT)

When: Wednesday, 4 March 2020, 11:00 am to 12:00 pm

Where: 303-310

Results from genome-wide association studies (GWAS) suggest that complex diseases are often affected by many variants with small effects, known as polygenicity. Bayesian methods provide attractive tools for identifying signal in data where the effects are small but clustered. For example, by incorporating biological pathway membership in the prior they are able to integrate the ideas of gene set enrichment to identify groups of biologically significant genetic variants. Accumulating evidence suggests that genetic variants may affect multiple different complex diseases, a phenomenon known as pleiotropy

In this work we propose frequentist and Bayesian statistical methods to leverage pleiotropic effects and incorporate prior pathway knowledge to increase statistical power and identify important risk variants. We offer novel feature selection methods for the group variable selection in multi-task regression problem. We develop methods using both penalised likelihood methods and Bayesian spike and slab priors to induce structured sparsity at a pathway, gene or SNP level.

Delineating Student and Teacher Comprehension of Randomness in Distributions

Speaker: Amy Renelle

Affiliation: Department of Statistics, Auckland University

When: Wednesday, 26 February 2020, 11:00 am to 12:00 pm

Where: 303-310

As a naturally occurring phenomenon, randomness is a curious and complex statistical concept. You can see randomness everywhere – in a statistics classroom, within scientific fields, and in everyday experiences. However, it has been frequently reported that we all are prone to holding randomness misconceptions. In my provisional year review seminar, I will present findings from the literature on Social Constructivism, heuristic thinking, and the potential benefits of using digital learning tools and multisensory learning to help mitigate potential misconceptions. This will be followed by commenting on my current and prospective methodology, before a brief discussion on some preliminary findings regarding teachers' perceptions of randomness.

Multiple imputation through denoising and variational autoencoders

Speaker: Agnes Yongshi Deng

Affiliation: Department of Statistics, Auckland University

When: Wednesday, 19 February 2020, 10:00 am to 11:00 am

Where: 303-310

Missing values are ubiquitous in clinical and social science data. Incomplete data not only leads to loss of information but can also introduce bias, which poses a significant challenge for data analysis. Various imputation procedures were designed to handle incomplete data under different missingness mechanisms. Rubin (1977) introduced multiple imputation to attain valid inference from data with ignorable nonresponse. Some techniques and R packages are developed to implement multiple imputations, such as MICE, Amelia and MissForest. However, the running time of imputation using these methods can be excessive for large datasets. We propose a scalable multiple imputation method based on variational and denoising autoencoders. Our R package misle is built using the tensorflow package in R, which enables fast computation and thus provides a scalable solution for missing data.

In this talk, I will demonstrate some features of the R package misle and compare the performance of several commonly used multiple imputation techniques. Future work of my PhD research will also be discussed.

Two-phase subsampling design for DNA sequencing with application in the relatedness of endangered species

Speaker: Zoe Luo

Affiliation: Department of Statistics, Auckland University

When: Wednesday, 22 January 2020, 11:00 am to 12:00 pm

Where: 303-310

Whole-genome sequencing has been completed for the entire kākāpō species. However, this sort of effort is not feasible in most situations and only some individuals can be sequenced. Despite the decreasing cost of DNA sequencing, budget remains a substantial problem for most research funders. A cost-saving strategy is to resequence a small subsample from the original sample with higher resolution, then use the data from the subsample to infer the rest of the sample. This strategy is called two-phase sampling, because the initial sampling of the cohorts is followed by a subsampling of individuals to be resequenced. In this talk, I will review four existing subsampling strategies designed for human cohorts with their benefit and drawback. The aim of my PhD is to investigate the optimal way to choose a subsample in both endangered species and human cohorts, based on known relatedness, phenotypes, or genotypes at a small number of sites.

Two “Paradoxes” to enrich your summer holiday

Speaker: Mark Holmes

Affiliation: University of Melbourne

When: Wednesday, 8 January 2020, 11:00 am to 12:00 pm

Where: 303-310

I will describe two counterintuitive results that I learned of recently. The first concerns the expected time to select a ball of a given colour in ``Polya’s urn’’, and has probably been known (to some people) for a long time. The second one concerns the expected time for Batman to catch the Joker if they walk independently on a square or hexagon, etc., and is an original result jointly with Peter Taylor.

Top


Please give us your feedback or ask us a question

This message is...


My feedback or question is...


My email address is...

(Only if you need a reply)

A to Z Directory | Site map | Accessibility | Copyright | Privacy | Disclaimer | Feedback on this page