Department of Statistics

Seminars

»: Assessment of vaccine safety and effectiveness using a global data network: a statistical perspective in the context of COVID-19 pandemic, Han Lu

»: Past seminars

Seminar Information

: Map (303 & 303S) bldg
: RSS Feed
: Calendar
: Mailing list

All upcoming seminars hosted by the Department of Statistics are posted on the stat-seminar@list.science. mailing list by the list moderators.

Assessment of vaccine safety and effectiveness using a global data network: a statistical perspective in the context of COVID-19 pandemic

Speaker: Han Lu

Affiliation: UoA

When: Monday, 21 July 2025, 12:00 pm to 1:00 pm

Where: 303-310

Abstract:

To help maximise important health, social, and economic benefits of vaccines, it is imperative that detection and risk assessment of adverse events of special interest (AESI) following vaccination is carried out as close to the occurrence of the events as possible. The estimation of background and post-vaccination incidence rates is a rapid and useful tool for the surveillance of potential vaccine-related AESI. Such comparisons have the potential to investigate early safety concerns well before a more sophisticated analysis can be conducted. One level of post-marketing vaccine safety monitoring is to investigate the association between exposures and adverse events using observational cohort studies or case-based study designs such as the self-controlled case series (SCCS) analysis. The SCCS method is derived from a Poisson model to estimate the relative incidences between defined risk and control windows during the observation period. As it only requires cases with individuals acting as their own control, all time-invariant confounders are self-controlled in the analysis. There are certain limitations when the SCCS methods are applied to real world data, especially for the COVID-19 vaccine safety monitoring when the vaccines were developed quickly during the pandemic years with multiple vaccine platforms and brands administered in different countries with mixed doses (i.e. homologous vs. heterologous schedules). The modelling strategies need to be further developed to incorporate these real-world challenges, particularly for rare AESI with small sample sizes which may only be detectable via global data network.

This research project aims to address several study objectives. First, we performed a comprehensive up-to-date literature review on developed SCCS methods and their applications in case studies since this approach was first introduced by Farrington in 1995. Second, we will develop novel SCCS methods to address multiple challenges in vaccine safety evaluation such as misclassification of adverse events and small sample size in rare AESI, develop and validate new methods using simulation studies, and apply to real-world global data. Third, we will summarise the background incidence rates of a wide range of potential AESIs based on the latest research evidence, and calculate the observed versus expected rates of AESIs following COVID-19 vaccination in New Zealand population by sex, age group, total-response ethnicity, NZ Deprivation Index (NZDep) and the Index of Multiple Deprivation (IMD) using national administrative data.

This is a PYR seminar.

Top

Past seminars

»: Muscle CARs: Conditional Auto-regressions for Muscle Fibre-Type Data, Tilman Davies

»: GCN-Driven Feature Selection and Prediction on Block-Wise Missing Multi-Omics Data, Qingyu Meng

»: Evaluating NLP tools designed to assist instructors with formative assessment for large-enrollment STEM classes, Matt Beckman

»: Forecasting Multiple Time Series with Graph Convolutional Networks, Guoping Hu

»: Practice to Research to Practice: My journey as a statistics teacher and statistics teacher educator, Stephanie Casey

»: The health data paradox - simultaneously simple and complex, Pernille Christensen

»: Cumulant-based approximation for fast and efficient prediction for species distribution, Osamu Komori

»: Modeling Population-Scale Commuting Patterns in New Zealand, Michael J. Kane

»: Mathematical Reasoning over Multimodal Large Language Models, Shuangyan Deng

»: Image-Derived Phenotypes in Whole-Body MRI, Brandon Whitcher

Previous year seminars:
2025 |2024 |2023 |2022 |2021 |2020 |2019 |2018 |2017 |2016 |2015 |2014 |2013 |2012 |2011 |2010 |2009 |2008 |2007 |2006 |2005 |2004 |

Muscle CARs: Conditional Auto-regressions for Muscle Fibre-Type Data

Speaker: Tilman Davies

Affiliation: University of Otago

When: Thursday, 26 June 2025, 12:00 pm to 1:00 pm

Where: 303-310

Abstract : Researchers in physiology are interested in the spatial configuration of different fiber types across a mammalian muscle, which have long been acknowledged as reflecting muscle health and mobility. Biological imperatives that drive changes in type, about which little is known in many cases, can manifest in spatially dependent ways in the muscle. Data take the form of spatially arranged coordinates, each classified as one of $m$ possible types. Historically, studies investigating fiber-type configurations have treated `type' as a binary variable——\emph{fast} or \emph{slow}. This is a widely acknowledged over-simplification of reality, and, as we shall see, can yield misleading inference. In this talk, I show how we can apply and interpret conditional autoregressions (CARs) for Gaussian data to a binary response in a probit-style attack and, with a simple reparameterisation, extend this technique to cope with more than two fibre types. A real data set serves to provide a compelling example of Simpson’s paradox when analysed in either a 3-type or collapsed 2-type form.

Dr Davies is a Senior Lecturer in the Department of Mathematics and Statistics at the University of Otago.

https://www.otago.ac.nz/maths-and-stats/people/dr-tilman-davies

GCN-Driven Feature Selection and Prediction on Block-Wise Missing Multi-Omics Data

Speaker: Qingyu Meng

Affiliation: UoA

When: Wednesday, 11 June 2025, 10:30 am to 11:30 am

Where: 303-B05

Abstract:

Precision medicine increasingly leverages integrated multi-omics data to deliver personalized healthcare. However, real-world multi-omics datasets frequently exhibit block-wise missingness, posing challenges for biomarker identification and accurate clinical outcome prediction. This research proposes a two-stage Graph Convolutional Network (GCN) framework specifically designed to handle block-wise missing multi-omics data. The framework aims to robustly select predictive biomarkers and incorporate subtype-level relationships to enhance clinical outcome prediction. In Stage I, an iterative GCN-based feature scoring mechanism is combined with a random-walk graph fusion strategy to select informative features across omics layers, even under block-wise missingness. In Stage II, a GCN prediction model integrates subtype-guided sample similarity graphs to improve the prediction of clinical outcomes. This multi-omics GCN-based framework addresses block-wise missingness by selecting more informative and comprehensive features across omics layers. By further incorporating subtype-guided sample structure into prediction, it enhances the accuracy and reliability of clinical outcome modeling.

This is a PYR seminar.

Evaluating NLP tools designed to assist instructors with formative assessment for large-enrollment STEM classes

Speaker: Matt Beckman

Affiliation: The Pennsylvania State University

When: Wednesday, 9 April 2025, 11:00 am to 12:00 pm

Where: 303-310

Abstract:

This talk seeks to articulate the benefit of free-response tasks and timely formative assessment feedback, a roadmap for developing human-in-the-loop natural language processing (NLP) assisted feedback, and results from a pilot study establishing proof of principle. If we are to pursue Statistics and Data Science Education across disciplines, we will surely encounter both opportunity and necessity to develop scalable solutions for pedagogical best practices. Research suggests that “write-to-learn” tasks improve learning outcomes, yet constructed-response methods of formative assessment become unwieldy when class sizes grow large. In the pilot study, several short-answer tasks completed by nearly 2000 introductory tertiary statistics students were evaluated by human raters and an NLP algorithm. After briefly describing the tasks, the student contexts, the algorithm and the raters, this talk discusses the results which indicate substantial inter-rater agreement and group consensus. The talk will conclude with recent developments building upon this pilot, as well as implications for teaching and future research.

Bio: Matt Beckman is an Associate Research Professor of Statistics at Penn State University, Director of the Consortium for the Advancement of Undergraduate Statistics Education (CAUSE), and 2025 NZSA Visiting Lecturer. He is co-founder of a Statistics & Data Science Education Research Lab and affiliated faculty with the Social Science Research Institute and the Center for Socially Responsible Artificial Intelligence at Penn State. Matt’s primary research interests tend to focus on assessment and he is currently PI for the NSF-funded “Project CLASSIFIES” which investigates the use of NLP tools to assist instructors of large-enrollment classes with providing formative assessment feedback on short-answer, free response tasks.

Forecasting Multiple Time Series with Graph Convolutional Networks

Speaker: Guoping Hu

Affiliation: UoA

When: Monday, 7 April 2025, 12:00 pm to 1:00 pm

Where: 303-310

Forecasting multiple time series at different levels is often required in many situations, which is commonly known as hierarchical time series forecasting. Supply chain management is a typical application that requires demand forecasting at the store, city, or country level for decision-making. In hierarchical forecasting, top-down, bottom-up, and optimal linear combination are the most common methods. While top-down and bottom-up methods use only information from the top and bottom levels, respectively, linear combination methods use individual forecasts from all series and levels and combine them linearly, often outperforming traditional top-down and bottom-up methods. Despite this, these approaches do not directly use the explanatory information that may exist at various levels of the hierarchy. In addition to producing accurate forecasts, selecting a suitable method to generate basic and reconciled forecasts simultaneously is necessary. Prediction reconciliation involves adjusting predictions to be consistent across different levels. In this talk, we introduce a novel end-to-end hierarchical time-series forecasting framework based on deep learning that jointly optimizes forecasting and reconciliation tasks. The novel framework incorporates a spatiotemporal forecasting module based on a graph convolutional network and gated recurrent unit to improve base forecast accuracy. It employs a multilayer perceptron for forecast reconciliation to ensure hierarchical consistency, effectively utilizing hierarchical information throughout the process. This forecasting framework can utilize hierarchical information to generate accurate and consistent predictions for all the time series within the hierarchy. We evaluate the proposed methodology on two real-world large scale retail datasets. The results indicate that our method achieves superior performances on hierarchical forecasting tasks compared to state-of-the-art methods, especially in scenarios with promotional information.

This is a PYR seminar.

Practice to Research to Practice: My journey as a statistics teacher and statistics teacher educator

Speaker: Stephanie Casey

Affiliation: Eastern Michigan University

When: Wednesday, 2 April 2025, 11:00 am to 12:00 pm

Where: 303-310

ABSTRACT: In this talk, I will be sharing my journey from a high school teacher to a statistics teacher educator at the university level. My focus will be on how I’ve turned my teaching experiences as a practicing teacher into research efforts, and then the research results into products used by teacher educators to improve the preparation of teachers to teach statistics.

BIO: Dr. Stephanie Casey is a Professor of Mathematics Education at Eastern Michigan University, USA. She is a 2025 Fulbright Scholar, where she is researching students' interpretations of modern, big data visualizations in collaboration with the University of Canberra's STEM Education Research Centre (SERC). Her research focuses on the teaching and learning of data science and statistics, motivated by her experience teaching secondary mathematics for fourteen years. She has co-authored two sets of statistics teacher education curriculum materials that are widely used with preservice secondary STEM teachers throughout the United States.

Link: https://sites.google.com/site/stephaniecaseymath/

The health data paradox - simultaneously simple and complex

Speaker: Pernille Christensen

Affiliation: Noted

When: Wednesday, 26 March 2025, 11:00 am to 12:00 pm

Where: 303-310

Ask any one person involved with health data what they need it for, and they’ll likely be able to give you a straightforward, simple answer. These could be to know the proportion of the population struggling with mental health, to see if the improvement project had the intended effect, to meet funding requirements, to know the caseload and allocate resources, to communicate with a team about an individual patient, and so on.

Individually simple and straightforward. Each of these individual purposes, however, is linked together in an intricate, interconnected, and highly complex web. They are linked together by issues such as limitations in time and resources for data collection, differing priorities, siloed, rigid data systems, questions on ownership, privacy, and ethics rules. The complexity is further compounded by deeper questions such as: is the data truthful? Does it tell the whole story? Is it durable when the political scene changes or new discoveries are made?

In this session, we will explore this complex web from different points of view of the types of people involved with health data and question whether the complexities naturally occur or are introduced.

Biography:

As a data architect at Noted, I design and build data warehousing and business insight tools for Noted’s customers - giving them valuable insights into their business, supporting them in achieving the best health outcomes for their clients.

I have worked in the health sector for more than two decades, in both Denmark and New Zealand, and I hold a medical degree and PhD in Health and Medical sciences from the University of Copenhagen, Denmark, and a Masters degree in Applied Statistics from Pennsylvania State University, USA. My work includes clinical work as a medical doctor and researcher, with international research collaborations and scientific task force positions, and non-clinical work related to biostatistics, health intelligence and machine learning working with diverse data sets from national health surveys, complex research data, health systems data, and data related to wellbeing in general.

Cumulant-based approximation for fast and efficient prediction for species distribution

Speaker: Osamu Komori

Affiliation: Seikei University, Japan

When: Thursday, 20 March 2025, 2:00 pm to 3:00 pm

Where: 303-610

Abstract:

This talk describes the Japan Biodiversity Mapping Project and Poisson point processes regression used in their analysis. A Japanese vascular plants data analysis is presented based on some simulation studies and comparison with Maxent and other methods.

Bio:

The speaker is Professor, Department of Science and Technology at Seikei University, and Visiting Professor, School of Statistical Thinking, Institute of Statistical Mathematics, Japan. His research interests include data science & statistical modeling, species distribution modeling, machine learning & predictive modeling.

Modeling Population-Scale Commuting Patterns in New Zealand

Speaker: Michael J. Kane

Affiliation: MD Anderson Cancer Center, The University of Texas

When: Wednesday, 5 March 2025, 11:00 am to 12:00 pm

Where: 303-310

Abstract : Human mobility patterns reveal how individuals—and populations—navigate spatial environments, offering critical insights for urban planning and transportation policy, among others. In this talk, we explore two modeling approaches applied to New Zealand’s 2018 census data to capture the dynamics of commuting behavior across Statistical Area 2 (SA2) regions. The first model identifies “loci” within the commuting network—locations that exhibit disproportionately high rates of both destination and transit movement. By analyzing the interconnectivity between SA2 areas, this approach reveals the hubs and corridors that shape everyday commuting patterns. The second model leverages an attention-based architecture, inspired by techniques used in large language models, to encode individual commuting trajectories. This model not only assesses the likelihood of a given sequence of locations but also enables the synthesis of plausible new trajectories. By capturing the dependencies in movement sequences, the attention model provides a powerful tool for predicting and simulating commuting behaviors.

Mathematical Reasoning over Multimodal Large Language Models

Speaker: Shuangyan Deng

Affiliation: UoA

When: Wednesday, 19 February 2025, 2:00 pm to 3:00 pm

Where: 303-310

Abstract: Mathematical reasoning is a fundamental challenge in artificial intelligence, particularly for Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs), which must integrate textual and visual data to solve complex problems. Despite recent advancements, MLLMs still face significant challenges, including the lack of comprehensive benchmarks, the inability to learn effectively from reasoning errors, and difficulties in leveraging external tools for mathematical problem-solving. This research addresses three key problems: (1) evaluating MLLMs' mathematical reasoning capabilities through FinMR, a novel multimodal benchmark designed for financial reasoning tasks; (2) improving reasoning accuracy using in-context learning (ICL) with AI-generated error feedback, enabling models to refine their problem-solving strategies based on prior mistakes; and (3) developing a tool-integrated planner that allows MLLMs to dynamically utilize external computational tools for solving complex mathematical problems. By introducing these innovations, this study enhances the reasoning ability of MLLMs and paves the way for more reliable and interpretable AI-driven mathematical reasoning. Future work will focus on expanding dataset diversity, refining multimodal learning strategies, and improving the alignment between textual and visual modalities for greater accuracy and generalization.

This is a PYR seminar.

Image-Derived Phenotypes in Whole-Body MRI

Speaker: Brandon Whitcher

Affiliation: University of Westminster, London

When: Wednesday, 19 February 2025, 11:00 am to 12:00 pm

Where: 303-310

Population imaging studies, like the UK Biobank, provide us with an unprecedented amount of medical imaging data. At the Research Centre for Optimal Health we have focused on the abdominal protocol. Multiple data analysis pipelines have been developed to extract a wide variety of quantitative features from these data, what we call image-derived phenotypes (IDPs). The IDPs are then used to describe the UK adult population, investigate diseases, and provide input into genome-wide association studies with our collaborators.

Bio

Brandon Whitcher has spent the last 20+ years focused on quantitative imaging biomarkers for clinical, pharmaceutical and wellness applications. He has been employed in the pharmaceutical industry and a variety of startup companies. For the last seven years Dr Whitcher has been a member of the Research Centre for Optimal Health at The University of Westminster, London, UK. Since 2023 he divides his time between the university and as an NHS employee in the Radiology Department at the Royal Marsden Hospital, South London.

Top

Hosting

Department of Statistics

Seminars

Seminar Information

Please give us your feedback or ask us a question