Department of Statistics
Seminars
Speaker: Guoping Hu
Affiliation: UoA
When: Monday, 7 April 2025, 12:00 pm to 1:00 pm
Where: 303-310
Forecasting multiple time series at different levels is often required in many situations, which is commonly known as hierarchical time series forecasting. Supply chain management is a typical application that requires demand forecasting at the store, city, or country level for decision-making. In hierarchical forecasting, top-down, bottom-up, and optimal linear combination are the most common methods. While top-down and bottom-up methods use only information from the top and bottom levels, respectively, linear combination methods use individual forecasts from all series and levels and combine them linearly, often outperforming traditional top-down and bottom-up methods. Despite this, these approaches do not directly use the explanatory information that may exist at various levels of the hierarchy. In addition to producing accurate forecasts, selecting a suitable method to generate basic and reconciled forecasts simultaneously is necessary. Prediction reconciliation involves adjusting predictions to be consistent across different levels. In this talk, we introduce a novel end-to-end hierarchical time-series forecasting framework based on deep learning that jointly optimizes forecasting and reconciliation tasks. The novel framework incorporates a spatiotemporal forecasting module based on a graph convolutional network and gated recurrent unit to improve base forecast accuracy. It employs a multilayer perceptron for forecast reconciliation to ensure hierarchical consistency, effectively utilizing hierarchical information throughout the process. This forecasting framework can utilize hierarchical information to generate accurate and consistent predictions for all the time series within the hierarchy. We evaluate the proposed methodology on two real-world large scale retail datasets. The results indicate that our method achieves superior performances on hierarchical forecasting tasks compared to state-of-the-art methods, especially in scenarios with promotional information.
This is a PYR seminar.
Evaluating NLP tools designed to assist instructors with formative assessment for large-enrollment STEM classesSpeaker: Matt Beckman
Affiliation: The Pennsylvania State University
When: Wednesday, 9 April 2025, 11:00 am to 12:00 pm
Where: 303-310
Abstract:
This talk seeks to articulate the benefit of free-response tasks and timely formative assessment feedback, a roadmap for developing human-in-the-loop natural language processing (NLP) assisted feedback, and results from a pilot study establishing proof of principle. If we are to pursue Statistics and Data Science Education across disciplines, we will surely encounter both opportunity and necessity to develop scalable solutions for pedagogical best practices. Research suggests that “write-to-learn” tasks improve learning outcomes, yet constructed-response methods of formative assessment become unwieldy when class sizes grow large. In the pilot study, several short-answer tasks completed by nearly 2000 introductory tertiary statistics students were evaluated by human raters and an NLP algorithm. After briefly describing the tasks, the student contexts, the algorithm and the raters, this talk discusses the results which indicate substantial inter-rater agreement and group consensus. The talk will conclude with recent developments building upon this pilot, as well as implications for teaching and future research.
Bio: Matt Beckman is an Associate Research Professor of Statistics at Penn State University, Director of the Consortium for the Advancement of Undergraduate Statistics Education (CAUSE), and 2025 NZSA Visiting Lecturer. He is co-founder of a Statistics & Data Science Education Research Lab and affiliated faculty with the Social Science Research Institute and the Center for Socially Responsible Artificial Intelligence at Penn State. Matt’s primary research interests tend to focus on assessment and he is currently PI for the NSF-funded “Project CLASSIFIES” which investigates the use of NLP tools to assist instructors of large-enrollment classes with providing formative assessment feedback on short-answer, free response tasks.
Speaker: Stephanie Casey
Affiliation: Eastern Michigan University
When: Wednesday, 2 April 2025, 11:00 am to 12:00 pm
Where: 303-310
ABSTRACT: In this talk, I will be sharing my journey from a high school teacher to a statistics teacher educator at the university level. My focus will be on how I’ve turned my teaching experiences as a practicing teacher into research efforts, and then the research results into products used by teacher educators to improve the preparation of teachers to teach statistics.
BIO: Dr. Stephanie Casey is a Professor of Mathematics Education at Eastern Michigan University, USA. She is a 2025 Fulbright Scholar, where she is researching students' interpretations of modern, big data visualizations in collaboration with the University of Canberra's STEM Education Research Centre (SERC). Her research focuses on the teaching and learning of data science and statistics, motivated by her experience teaching secondary mathematics for fourteen years. She has co-authored two sets of statistics teacher education curriculum materials that are widely used with preservice secondary STEM teachers throughout the United States.
Link: https://sites.google.com/site/stephaniecaseymath/
The health data paradox - simultaneously simple and complexSpeaker: Pernille Christensen
Affiliation: Noted
When: Wednesday, 26 March 2025, 11:00 am to 12:00 pm
Where: 303-310
Ask any one person involved with health data what they need it for, and they’ll likely be able to give you a straightforward, simple answer. These could be to know the proportion of the population struggling with mental health, to see if the improvement project had the intended effect, to meet funding requirements, to know the caseload and allocate resources, to communicate with a team about an individual patient, and so on.
Individually simple and straightforward. Each of these individual purposes, however, is linked together in an intricate, interconnected, and highly complex web. They are linked together by issues such as limitations in time and resources for data collection, differing priorities, siloed, rigid data systems, questions on ownership, privacy, and ethics rules. The complexity is further compounded by deeper questions such as: is the data truthful? Does it tell the whole story? Is it durable when the political scene changes or new discoveries are made?
In this session, we will explore this complex web from different points of view of the types of people involved with health data and question whether the complexities naturally occur or are introduced.
Biography:
As a data architect at Noted, I design and build data warehousing and business insight tools for Noted’s customers - giving them valuable insights into their business, supporting them in achieving the best health outcomes for their clients.
I have worked in the health sector for more than two decades, in both Denmark and New Zealand, and I hold a medical degree and PhD in Health and Medical sciences from the University of Copenhagen, Denmark, and a Masters degree in Applied Statistics from Pennsylvania State University, USA. My work includes clinical work as a medical doctor and researcher, with international research collaborations and scientific task force positions, and non-clinical work related to biostatistics, health intelligence and machine learning working with diverse data sets from national health surveys, complex research data, health systems data, and data related to wellbeing in general.
Cumulant-based approximation for fast and efficient prediction for species distributionSpeaker: Osamu Komori
Affiliation: Seikei University, Japan
When: Thursday, 20 March 2025, 2:00 pm to 3:00 pm
Where: 303-610
Abstract:
This talk describes the Japan Biodiversity Mapping Project and Poisson point processes regression used in their analysis. A Japanese vascular plants data analysis is presented based on some simulation studies and comparison with Maxent and other methods.
Bio:
The speaker is Professor, Department of Science and Technology at Seikei University, and Visiting Professor, School of Statistical Thinking, Institute of Statistical Mathematics, Japan. His research interests include data science & statistical modeling, species distribution modeling, machine learning & predictive modeling.
Modeling Population-Scale Commuting Patterns in New ZealandSpeaker: Michael J. Kane
Affiliation: MD Anderson Cancer Center, The University of Texas
When: Wednesday, 5 March 2025, 11:00 am to 12:00 pm
Where: 303-310
Abstract : Human mobility patterns reveal how individuals—and populations—navigate spatial environments, offering critical insights for urban planning and transportation policy, among others. In this talk, we explore two modeling approaches applied to New Zealand’s 2018 census data to capture the dynamics of commuting behavior across Statistical Area 2 (SA2) regions. The first model identifies “loci” within the commuting network—locations that exhibit disproportionately high rates of both destination and transit movement. By analyzing the interconnectivity between SA2 areas, this approach reveals the hubs and corridors that shape everyday commuting patterns. The second model leverages an attention-based architecture, inspired by techniques used in large language models, to encode individual commuting trajectories. This model not only assesses the likelihood of a given sequence of locations but also enables the synthesis of plausible new trajectories. By capturing the dependencies in movement sequences, the attention model provides a powerful tool for predicting and simulating commuting behaviors.
Mathematical Reasoning over Multimodal Large Language Models
Speaker: Shuangyan Deng
Affiliation: UoA
When: Wednesday, 19 February 2025, 2:00 pm to 3:00 pm
Where: 303-310
Abstract: Mathematical reasoning is a fundamental challenge in artificial intelligence, particularly for Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs), which must integrate textual and visual data to solve complex problems. Despite recent advancements, MLLMs still face significant challenges, including the lack of comprehensive benchmarks, the inability to learn effectively from reasoning errors, and difficulties in leveraging external tools for mathematical problem-solving. This research addresses three key problems: (1) evaluating MLLMs' mathematical reasoning capabilities through FinMR, a novel multimodal benchmark designed for financial reasoning tasks; (2) improving reasoning accuracy using in-context learning (ICL) with AI-generated error feedback, enabling models to refine their problem-solving strategies based on prior mistakes; and (3) developing a tool-integrated planner that allows MLLMs to dynamically utilize external computational tools for solving complex mathematical problems. By introducing these innovations, this study enhances the reasoning ability of MLLMs and paves the way for more reliable and interpretable AI-driven mathematical reasoning. Future work will focus on expanding dataset diversity, refining multimodal learning strategies, and improving the alignment between textual and visual modalities for greater accuracy and generalization.
This is a PYR seminar.
Image-Derived Phenotypes in Whole-Body MRISpeaker: Brandon Whitcher
Affiliation: University of Westminster, London
When: Wednesday, 19 February 2025, 11:00 am to 12:00 pm
Where: 303-310
Population imaging studies, like the UK Biobank, provide us with an unprecedented amount of medical imaging data. At the Research Centre for Optimal Health we have focused on the abdominal protocol. Multiple data analysis pipelines have been developed to extract a wide variety of quantitative features from these data, what we call image-derived phenotypes (IDPs). The IDPs are then used to describe the UK adult population, investigate diseases, and provide input into genome-wide association studies with our collaborators.
Bio
Brandon Whitcher has spent the last 20+ years focused on quantitative imaging biomarkers for clinical, pharmaceutical and wellness applications. He has been employed in the pharmaceutical industry and a variety of startup companies. For the last seven years Dr Whitcher has been a member of the Research Centre for Optimal Health at The University of Westminster, London, UK. Since 2023 he divides his time between the university and as an NHS employee in the Radiology Department at the Royal Marsden Hospital, South London.
Exploring Deep Learning Techniques for Subtype Classification Modeling in Alzheimer's DiseaseSpeaker: Xiaoyan Sun
Affiliation: Department of Statistics, University of California Irvine
When: Wednesday, 22 January 2025, 10:00 am to 11:00 am
Where: 303-310
The growing prevalence of Alzheimer's disease poses a significant healthcare challenge, particularly with the ageing global population. Understanding disease subtypes through multi-omics data integration is pivotal for advancing personalised medicine and improving patient outcomes. However, existing methods often struggle with high-dimensional data, heterogeneous information integration, and effective disease subtype classification. In this study, we propose a novel deep learning framework designed to address three core challenges in multi-omics analysis: dimensionality reduction, data integration, and subtype classification. The framework employs a multi-head attention mechanism for feature selection, capturing complex relationships in high-dimensional data. It integrates Graph Convolutional Networks (GCN) for robust data fusion, while leveraging contrastive learning to enhance subtype classification accuracy. The framework effectively handles complex, high-dimensional data, addressing challenges of missing data and heterogeneity while capturing both global and local patterns in multi-omics data, all while maintaining interpretability.
(This is a talk for Confirmation Review.)
Teaching and Learning Bayesian Statistics with {bayesrules}Speaker: Mine Dogucu
Affiliation: Department of Statistics, University of California Irvine
When: Wednesday, 11 December 2024, 4:00 pm to 5:00 pm
Where: 303-310
Abstract: Bayesian statistics is becoming more popular in data science. Data scientists are often not trained in Bayesian statistics and if they are, it is usually part of their graduate training. During this talk, we will introduce an introductory course in Bayesian statistics for learners at the undergraduate level and comparably trained practitioners. We will share tools for teaching (and learning) the first course in Bayesian statistics, specifically the {bayesrules} package that accompanies the open-access Bayes Rules! An Introduction to Bayesian Modeling with R book. We will provide an outline of the curriculum and examples for novice learners and their instructors.
Speaker: Mine Dogucu, Associate Professor of Teaching and Vice Chair for Undergraduate Studies, Department of Statistics, University of California Irvine
Bio: Mine Dogucu is Associate Professor of Teaching and Vice Chair of Undergraduate Studies in the Department of Statistics at University of California Irvine. Her goal is to create educational resources for statistics and data science that are accessible physically and cognitively. Her work focuses on modern pedagogical approaches in the statistics curriculum, making data science education accessible, and undergraduate Bayesian education. She is the co-author of the book Bayes Rules! An Introduction to Applied Bayesian Modeling. She works on a few projects funded by the United States National Science Foundation and the National Institutes of Health. She writes blog posts about data, pedagogy, and data pedagogy at DataPedagogy.com.
Nonparametric Density Estimation for Compositional DataSpeaker: Jiajin (George) Xie
Affiliation: Department of Statistics, University of Auckland
When: Thursday, 28 November 2024, 12:00 pm to 1:00 pm
Where: 303-310
This study addresses the challenges of density estimation for compositional data, a type of data constrained to reflect relative proportions within a whole. Such data are prevalent across diverse fields, including microbiome analysis, geology, and machine learning. The research develops and evaluates nonparametric methods for high-dimensional compositional data density estimation, focusing on the mixture-based density estimation (MDE) approach. Two types of mixture components are explored: Gaussian distributions applied to log-ratio-transformed compositional data, which offer excellent flexibility, and Dirichlet distributions applied directly to compositions, effectively handling cases with zero values. The performance of these methods is assessed through simulation studies and compared with finite mixture and kernel density estimation techniques. Results demonstrate the superior accuracy and adaptability of the proposed methods in capturing intricate data structures across various scenarios.
(This is a PYR talk.)
Visualization and Analysis of Suicide Methods in Tokyo Using Interactive GraphsSpeaker: Takafumi KUBOTA
Affiliation: Tama University, Japan
When: Wednesday, 30 October 2024, 11:00 am to 12:00 pm
Where: 303-310
This study aims to visualize the trends in suicide methods in Tokyo, using Japan's regional suicide statistics to provide insights that can inform effective prevention strategies. Suicide is a significant social issue, and analyzing regional data can offer valuable perspectives for targeted interventions. The research focuses on visualizing the trends for different suicide methods by creating bar graphs, line charts, and choropleth maps. These visualizations are generated after data cleaning to clearly depict the occurrence and trends associated with each method.
The application is developed using the R packages shiny and plotly, enabling users to interactively explore the data. With shiny, users can select the items of interest, such as region,time period, or suicide method, from a menu, while plotly allows for the implementation of interactive graphs that dynamically update based on the selected parameters. This approach facilitates the identification of specific regional trends, such as railway suicides or jumps from high-rise buildings that are more prevalent in Tokyo.
Through the development and analysis of this application, the study aims to enhance the understanding of regional and method-specific suicide trends, providing recommendations for suicide prevention measures. The visualized data is expected to serve as a valuable tool for policymakers and researchers,contributing to the strengthening of suicide prevention efforts.