Department of Statistics
Seminars
Speaker: Ruiting Mao
Affiliation: Department of Statistics, University of Auckland
When: Thursday, 5 September 2024, 10:00 am to 11:00 am
Where: 303-B05
Bayesian statistical methods have played a pivotal role in signal detection and the physical parameter estimation of gravitational waveform models. The future space-based gravitational wave (GW) detector, the Laser Interferometer Space Antenna (LISA), which is sensitive to the millihertz frequency band, makes it possible to detect some promising sources of GWs. However, Bayesian inference for features of interest and noise characterization is often computationally expensive and subject to model misspecification with complex waveforms and nonstationary noise artifacts in the LISA data stream. Through this work, I will present the application of deep learning models to address these challenges inherent in LISA data analysis. Specifically, I will discuss two key issues: 1) Exploring calibration techniques to quantify and correct the approximation errors introduced by using computationally faster but less accurate waveform models in Bayesian parameter estimation, and 2) Investigating deep learning methods to fill in data gaps from the LISA data stream effectively.
(This is a PhD PYR talk.)
Engaging in, and teaching, ethical practice of statistics and data scienceSpeaker: Rochelle Tractenberg
Affiliation: Georgetown University, Washington DC
When: Tuesday, 23 July 2024, 2:00 pm to 3:00 pm
Where: 303-310
The American Statistical Association's Ethical Guidelines for Statistical Practice define "Statistical Practice" to include designing the collection of, summarizing, processing, analyzing, interpreting, or presenting, data; as well as model or algorithm development and deployment. The Guidelines are intended to support every individual who uses "statistical practice", irrespective of their level, training, degree or job title, to do so in an ethical way. When it comes to encouraging (and teaching) "ethical statistical practice", there are two dimensions that must be recognized:
(i) To practice ethically, i.e., execute each task in accordance with ethical practice standards (like the Guidelines); and
(ii) To identify, and respond to, unethical actions/requests.
In this talk we will explore how a Stakeholder Analysis can be used with the ASA Ethical Guidelines (or any guidance) to practice ethically, and teach ethical statistical practice. We will also consider an Ethical Reasoning paradigm that facilitates identifying and making an informed decision about responding to ethical dilemmas. This paradigm is also useful for both engaging in, and teaching, ethical statistical practice. Both of these tools will be examined in the context of a 7-task “statistics and data science pipeline", which itself can help instructors to reinforce student learning about the scientific method, the Problem, Plan, Data, Analysis, Conclusion cycle, and even the eight step UN-based Generic Statistical Business Process model which was developed to support "official statistics", a special case of statistical practice.
Bio: Rochelle Tractenberg is a tenured professor in the Department of Neurology, with appointments in Biostatistics, Bioinformatics & Biomathematics and Rehabilitation Medicine, at Georgetown University in Washington, DC. She is a multi-disciplinary research methodologist and ASA-accredited Professional Statistician (PStat®), as well as a cognitive scientist focused on higher education curriculum design and evaluation. Her clinical and translational work integrates theories and principles of statistics, psychometrics, and domain-specific measurement to problems of assessment and the determination of changes in cognition, brain aging, and other difficult-to-measure constructs, using qualitative and quantitative methods. She is also an internationally recognized expert on ethical statistics and data science practice, having published two books, Ethical Practice of Statistics and Data Science and Ethical Reasoning for a Data-Centered World, in 2022. In addition to ethical statistics and data science practice, she has also contributed to guidelines for ethical mathematical practice (US based) and particularly, on how to integrate ethical content into quantitative courses. She is developing a new edition of Ethical Practice of Statistics and Data Science, specifically for government settings (expected 2025) and is collaborating on a forthcoming UN Handbook on Ethical Practice in Official Statistics. Professor Tractenberg is an elected Fellow of the American Statistical Association, the International Statistics Institute, and the American Association for the Advancement of Science, and was nominated for the 2022 Einstein Foundation Award for Promoting Quality in Research. Each of these nominations highlighted her commitment to, and support for, ethical statistical practice and scientific stewardship.
Designing to Support Doing Data Science and Statistics in SchoolsSpeaker: Hollylynne Lee
Affiliation: NC State University
When: Tuesday, 16 July 2024, 4:00 pm to 5:00 pm
Where: 303-310
Abstract: The U.S. often looks to New Zealand for resources and research related to teaching and learning statistics. In this talk, Hollylynne will discuss two recent projects situated in the U.S. that are advancing the teaching and learning of statistics and data science for secondary schools. These projects have designed curricula and online professional learning experiences for teachers at all stages of their career, from undergraduate education through life-long learning as a practicing teacher. We collaborate with a team at CODAP to integrate advanced data experiences into classrooms. The presentation will have something for everyone related to research, design of educational materials, and ideas for secondary classrooms.
Bio: Dr. Hollylynne Lee is a Distinguished University Professor of Mathematics and Statistics Education in the STEM Education department at NC State University, Raleigh NC, USA. She is also a Senior Faculty Fellow at the Friday Institute for Educational Innovation where she directs the Hub for Innovation and Research in Statistics and Data Science Education (https://fi.ncsu.edu/teams/hirise/). With experience teaching in elementary, middle, and high school classrooms, she brings a depth of practical perspectives to her research, and ensures her research and designs of educational resources are directly applicable to teachers and students. Her current work includes a focus on teachers’ professional learning for teaching with data using tools like CODAP and transforming undergraduate teacher preparation related to teaching statistics and data science. She loves reading, kayaking, watching volleyball, spending time with family, and her dog and cat. https://ced.ncsu.edu/people/hstohl/
Investigating Statistical Literacy of Health Professionals in Papua New Guinea
Speaker: Deborah Kakis
Affiliation: UoA
When: Tuesday, 11 June 2024, 10:00 am to 11:00 am
Where: 303-310
Abstract:
In today’s healthcare landscape, where evidence-based practice is considered the gold standard, data and statistical literacy are important skills for healthcare professionals. These competencies enable practitioners to collect, store and manage medical data,
analyse data, interpret research findings, and make informed decisions. In Papua New Guinea (PNG), a developing nation with unique healthcare challenges, fostering these literacies becomes critical.
Healthcare professionals in PNG deal with data daily, whether patient records, public health data, or administrative information.
Ensuring the reliability and utility of this data for evidence-based practice requires strong data literacy skills to guarantee accurate
collection and storage, while statistical literacy enables practitioners to extract meaningful insights to inform their practice.
However, many challenges hinder healthcare professionals in PNG from developing strong foundations in these areas, leading to a lack of confidence in their data and statistical literacy skills, which are necessary for evidence-based practice.
To address this gap, this proposed study aims to assess the current data and statistical literacy levels among healthcare professionals in PNG. By evaluating their proficiency, we can identify areas for improvement and tailor target training programs accordingly. Enhancing statistical and data literacy equips healthcare professionals to evaluate treatment efficacy confidently, identify emerging trends, and actively contribute to evidence-based care.
This is the PYR seminar
Modern Variable Selection for Vector Generalized Linear ModelsSpeaker: Wenqi Zhao
Affiliation: UoA
When: Monday, 27 May 2024, 1:00 pm to 2:00 pm
Where: 303-257
Abstract:
The generalized linear model (GLM) is the framework in
statistics for modeling the relationship between a response variable
and one or more predictor variables, it is typically used to
fit random variables to linear regression to predict observations.
While GLMs offer relatively straightforward interpretation of
coefficients, they may not capture complex interactions or nonlinear
relationships in the data. Vector generalized linear models(VGLMs)
and vector generalized additive models (VGAMs) can greatly extend
GLMs, currently VGAM implements over 150 family functions, it has
a large flexible framework to vary model elements. Variable
selection is a crucial step in statistical modeling identifying the
most relevant observations for predicting the response variable.
In VGLM/VGAM framework, usually using the minimum value
of some information criterion (IC). Among such, the Akaike
IC (AIC) and Bayesian IC (BIC) are the most common.
VGAMs also can penalize regression splines using P-spline
smoothers, which we term ‘P-spline VGAMs’, however, fitting VGAMs
with penalized regression splines can be computationally intensive,
particularly when dealing with large datasets or high-dimensional
predictor spaces. When the variables are greater than the
observations,
In this project, we propose to combine elastic net and VGLM/VGAM
framework to create a new model selection method. Elastic net
regularization techniques can help prevent over#tting and
multidisciplinary. Elastic net can result in sparser models with fewer
predictors. This regularization path helps in identifying and handling
multicollinearity by favoring models with fewer predictors in
VGLM/VGAM framework.
This is the PYR seminar.
Childhood Risk and Resilience Factors for Pasifika Youth Respiratory Health: Accounting for Attrition and MissingnessSpeaker: Dawson Zhai
Affiliation: UoA
When: Friday, 24 May 2024, 1:00 pm to 2:00 pm
Where: 303-310
Abstract:
In New Zealand, 7% of deaths are related to respiratory diseases, with Pacific people at higher risk. Based on knowledge of lung development, lung function can be damaged in two ways: 1) Lung function reduction: early insults may lower the maximum lung function and/or accelerate its decline after the peak; 2) Predisposition to later respiratory disease: early disease raises the risk of later disease occurring. Conversely, some resilience factors can create beneficial effects on respiratory function and/or provide protection to stop subsequent respiratory diseases; among these factors are childhood levels of physical activity, smoke exposure, immunisation, housing conditions, and breastfeeding.
Using Pacific Island Family Study (PIFS) cohort data, this work will investigate the causal effects of identified early-life factors on early-adulthood lung function, quality of life and comorbidities. The PIFS cohort is a longitudinal cohort, the participants of which were enrolled at birth in Middlemore Hospital (n=1398) between March and December 2000. A respiratory assessment (n=466) was conducted within the cohort when participants were 18 years old. In this PIFS birth cohort respiratory study, the primary respiratory outcome was the z-score of the Forced Ejection Volume in 1 second (FEV1). Secondary outcomes consisted of FEV1 adjusted for height and sex; the healthy lung function (HLF) indicator, defined as the z-score exceeding -1.64; health-related and respiratory-health-related quality of life scores; and respiratory condition indicators. The attrition and missingness present in the group undergoing respiratory assessment will inform much of the analysis plan, as will the longitudinal character of the risk and protective factors and their confounders.
This is the PYR seminar.
Statistical Methods and Designs for Multi-Wave Validation StudiesSpeaker: Gustavo, Guimaraes DeCastro Amorim
Affiliation: Vanderbilt University Medical Center
When: Thursday, 23 May 2024, 11:00 am to 12:00 pm
Where: 303-310
Abstract:
Measurement errors are present in many data collection procedures and can harm analyses by biasing estimates. To correct for measurement errors, researchers often validate a subsample of records and then incorporate the information learned from this validation sample into the estimation. In practice, the validation sample is often selected using simple random sampling (SRS). However, SRS leads to inefficient estimates because it ignores information on the error-prone variables, which can be highly correlated to the unknown truth. Applying and extending ideas from the two-phase sampling literature, we propose optimal and nearly-optimal designs for selecting the validation sample in the classical measurement-error framework. We also present novel extensions of estimators that make use of all available data collected in two or more waves. We show through simulations that incorporating information from intermediate steps can lead to substantial gains in efficiency. These works are motivated by and illustrated in Multi-National HIV Research Cohorts.
About the speaker :
Dr Amorim is an Assistant Professor of Biostatistics. His research interest include developing novel statistical methods for problems arising in public health studies, semiparametric models for model misspecification, two-phase designs, measurement-error problems and ordinal data analysis.
https://www.vumc.org/biostatistics/person/gustavo-amorim
COVID-19 vaccine fatigue in Scotland: How do the trends in attrition rates for the second and third doses differ by age, sex, and council area?Speaker: Robin Muegge
Affiliation: University of Glasgow
When: Thursday, 16 May 2024, 11:00 am to 12:00 pm
Where: 303-310
Abstract: Vaccine fatigue is the propensity for individuals to start but not finish a vaccination program with several doses, which thus means they are less protected. This is an especially important topic for COVID-19, where the vaccination program commonly consists of two doses followed by a booster vaccine dose to get full protection. COVID-19 vaccine hesitancy (the delay in acceptance or refusal of the first dose of the vaccine) has been studied extensively, and a few studies investigated the willingness to receive the booster vaccine dose. In contrast, attrition rates across subsequent doses caused by evolving vaccine fatigue have yet to be examined, which is the novel contribution of this paper. Our study focuses on Scotland, where the vaccine rollout began on 8th December 2020. We model vaccine attrition rates in the first transition (from doses one to two) and the second transition (from doses two to three) for the 32 council areas in Scotland. We estimate the effects of sex and transition, examine trends and patterns in the attrition rates by age group and council area and evaluate if these differ by sex or transition. We model the attrition rates with a hierarchical binomial logistic regression model that allows for flexible autocorrelation estimation for the corresponding neighbourhood and age group structures via correlated random effects models. Inference is based on a Bayesian paradigm, using integrated nested Laplace approximation (INLA). Our main findings are that attrition rates smoothly decrease with increasing age, that they are much higher in the second transition than in the first, that they are generally higher for males than females, and that the variation in attrition rates between age groups is greater for males than females.
At the end of the seminar, I will introduce my current work on outlier detection in areal data, titled “Disease mapping: What if Tobler's First Law of Geography doesn't hold?”
Bio: Robin Muegge is a PhD student in statistics from the University of Glasgow, UK. His research is in spatial and spatio-temporal areal data modelling under the supervision of Duncan Lee, Nema Dean, and Eilidh Jack. Robin completed his B.Sc. Mathematics at the Leibniz University of Hanover in Germany, and his M.Sc. Statistics at Portland State University, USA. He spent 11 weeks at the University of Wollongong, collaborating with Andrew Zammit Mangion, and is visiting the University of Auckland from the 13th to the 17th of May before returning to Glasgow.
Advanced methods for time series data applied to prediction of operating modes and detection of anomalies for wind turbinesSpeaker: Hannah Yun
Affiliation: UoA
When: Monday, 13 May 2024, 10:00 am to 11:00 am
Where: 303-257
Abstract:
Wind turbine can be characterised by distinct operating modes that reflect production efficiency. In this talk, we focus on the forecasting problem for univariate discrete-valued time series of operating modes of a wind turbine. We define three prediction strategies to overcome the difficulties associated with missing data. These strategies are evaluated through experiments using five forecasting methods across two real-life datasets. Two of the forecasting methods have been introduced in the statistical literature as extensions of the well-known context algorithm: variable length Markov chains and Bayesian context tree. Additionally, we consider a Bayesian method based on conditional tensor factorisation and two different smoothers from the classical tools for time series forecasting. Each pair prediction strategy/forecasting method is evaluated in terms of prediction accuracy versus computational complexity. We provide guidance on the methods that are suitable for forecasting the time series of operating methods. The prediction results demonstrate that high accuracy can be achieved with reduced computational resources.
We will also briefly discuss how recent advances in the field of dictionary learning can be tailored to detect equipment health deterioration in the case of wind turbines.
This is the PYR seminar.
New Methods for Fitting Hawkes Models with Large DataSpeaker: Conor Kresin
Affiliation: University of Otago
When: Thursday, 2 May 2024, 11:00 am to 12:00 pm
Where: 303-310
Abstract :
Hawkes processes are concise mathematical representations of diverse point process data, ranging from disease spread and wildfire occurrences to non-physical phenomena such as financial asset price movements. Models for point process data are often fit using maximum likelihood (MLE) or Markov Chain Monte Carlo (MCMC), but such methods are slow or computationally intractable for data with large n. In this talk, I will present a novel estimator based on the Stoyan-Grabarnik (sum of inverse intensity) statistic. Unlike MLE or MCMC approaches, the proposed estimator does not require approximation of a computationally expensive integral. I will show that under quite general conditions, this estimator is consistent for estimating parameters governing spatial-temporal point processes such as the Hawkes process and present simulations demonstrating the performance of the estimator. In the second portion of the talk, I will discuss increasingly flexible parametric Hawkes models, culminating in Continuous Long Short Term Memory (cLSTM) recurrent neural networks.
About the speaker :
Conor Kresin is a lecturer of the Department of Mathematics and Statistics, University of Otago. His research interest include Point process theory and applications, stochastic geometry, disease modelling, information theory, causal inference.