Department of Statistics Seminars

Current | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011

»
Conformal invariance and self-avoiding walks, Professor Greg Lawler
»
Capacity allocation and rostering for an intensive care unit, Dr. Ilze Ziedins
»
Explanatory vs. Predictive Modeling in Scientific Research, A. Prof. Galit Shmueli
»
R-in-Finance: Rmetrics, An Environment for Financial Computing and Computational Finance, Dr. Diethelm Wurtz
»
iTRAQing through the magical MudPIT, Dr. Katya Ruggiero
»
How from where? Spatial analysis of species-rich plant communities, Dr. George Perry
»
Automating the Marking of R Code, Dr. Paul Murrell
»
Robustness of efficiency in semiparametric models for incomplete data, A. Prof. Thomas Lumley
»
Teaching Applied Statistics Using a Virtual Manufacturing Process, A. Prof. Stefan Steiner
»
Supercomputing and You, Stephen Cope
»
Weighted Regressions: How Wrong Can They Be?, A. Prof. David Fletcher
»
Probability - a slippery concept, Prof. Chris Triggs
»
The interaction between copyright and digital rights management, Richard Stallman
»
Detecting anomalies in sensor network data, Dr. Richard Jarrett
»
Approximate Bayesian computation, Dr. David Welch
»
Bayesian Locally-Optimal Design of Knockout Tournaments, A. Prof. Mark E. Glickman
»
Developing a Simulation Model of the Primary Care System: Work-in-Progress., Prof. Peter Davis and colleagues from the Social Statistics Research Group
»
Tails of natural hazards: Implications for risk, erosion and ecology, Dr. Bruce Malamud
»
Bayesian Methods in Software Testing and Capability Maturity Model, Dr. Salilesh Mukhopadhyay
»
Quantification of sonar impacts on marine mammals, Dr. Carl Donovan
»
Haplotypic analysis of large-scale genetic data sets, Dr. Brian Browning
»
A brief introduction StatWeave/SWeave/Beamer, Nick Horton and David Scott
»
Marginal modelling of capture-recapture data, Dr. Alain Vandal
»
Dynamic, Interactive Documents for Reproducible Research and Pedagogy, A/ Prof. Duncan Temple Lang
»
Developing probabilistic statistical thinking, Prof. Helen MacGillivray
»
Learning statistical thinking in large classes through statistical discovery journeys (Real data, real projects, real students), Prof. Helen MacGillivray
»
Data Saving and Sharing in a Digital Age: Issues and Implications, Prof. Peter Davis
»
From a single heart beat - exploring the dynamics of cardiac function using ultrasound images, Katrina Poppe
»
Statistical issues in palaeoclimatology, Dr. Anthony Fowler
»
Analysis of Probability-Linked Data, Prof. Ray Chambers
»
Bayesian Hierarchical Occupancy Modelling of the Swedish Bird Survey, Dr. James Russell
»
Much ado about nothing: a review of the state of the art of incomplete data, Dr. Nicholas J. Horton
»
Four Case Studies in non-vaccine HIV Prevention Trial Design, Dr. Deborah Donnell
»
Bayesian Modelling for Ecological Count Data, Yilin (Sammie) Jia
»
The Tyranny of Power, Prof. J. Martin Bland
»
Variable Inclusion and Shrinkage Algorithms, Dr Gareth James
»
"Mathematical learning": A powerful bridge between different fields of mathematics and statistics. Applications in biological network prediction, Kevin Bleakley
Conformal invariance and self-avoiding walks
Professor Greg Lawler

Speaker: Professor Greg Lawler

Affiliation: Dept. of Mathematics, U. Chicago

When: Friday, 19 December 2008, 10:00 am to 11:00 am

Where: Statistics Seminar Room 222, Science Centre

Professor Lawler is one of the world's top probabilists. His recent work with Oded Schramm and Wendelin Werner was awarded the Polya Prize in 2006 and, in part, led to the Fields medal for Werner, also in 2006. Professor Lawler is the current editor of the Annals of Probability. Perhaps most significantly, he also appears on Wikipedia!

Self-avoiding walks are a model of polymers. In two dimensions, it was predicted by physicists that there is a conformally invariant scaling limit. A different conjecture was made by Mandelbrot relating planar self-avoiding walks to the outer boundary of Brownian motion. Recent advances, especially the Schramm-Loewner evolution created by the late Oded Schramm, have allowed us to understand the continuum model very well. In particular, Mandelbrot's conjecture about Brownian motion has been solved and we have good theoretical understanding as to why the typical self-avoiding walk of n steps has diameter n^{3/4}. However, the discrete problem is still very open. This talk will be a survey of the work on this problem including the role of conformal invariance in understanding the limit. Much of this talk will discuss work in collaboration with Schramm and Wendelin Werner.

http://www.math.uchicago.edu/~lawler/

Capacity allocation and rostering for an intensive care unit
Dr. Ilze Ziedins

Speaker: Dr. Ilze Ziedins

Affiliation: Dept. of Statistics, U. Auckland

When: Thursday, 4 December 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

How many beds does an intensive care unit need? What is a good nursing roster when the aim is to treat as many patients as possible, while keeping cancellations low? This talk describes a simulation model that was built to help answer these questions for the Cardiovascular Intensive Care Unit at Auckland City Hospital. The model has been used to assist in determining the number of beds needed, and work is currently underway on using it to design good rosters. This is joint work with William Chen, Ross Ihaka and staff at Auckland City Hospital, including Andrew McKee, Pam McCormack, Elizabeth Shaw, Louise Watson and Steve Withy.

http://www.stat.auckland.ac.nz/showperson?firstname=Ilze&surname=Ziedins

Explanatory vs. Predictive Modeling in Scientific Research
A. Prof. Galit Shmueli

Speaker: A. Prof. Galit Shmueli

Affiliation: Dept of Decision, Operations & Information Technologies, Smith School of Business, U. Maryland

When: Friday, 28 November 2008, 12:00 pm to 1:00 pm

Where: Statistics Seminar Room 222, Science Centre

Explanatory models are designed for testing hypotheses that specify how and why certain empirical phenomena occur. Predictive models are aimed at predicting new observations with high accuracy. An age-old debate in philosophy of science deals with the difference between predictive and explanatory goals. In mainstream statistical research, however, the distinction between explanatory and predictive modeling is mostly overlooked, and there is a near-exclusive focus on explanatory methodology. This focus has permeated into empirical research in many fields such as information systems, economics and in general, the social sciences. We discuss the issue from a practical statistical modeling perspective. Our premise is that (1) both explanatory and predictive statistical models are essential for advancing scientific research; and (2) the different goals lead to key differences at each step of the modeling process. In this talk we discuss the statistical divergences between modeling for an explanatory goal and modeling for a predictive goal. In particular, we analyze each step of the statistical modeling process (from data collection to model use) and describe the different statistical components and issues that arise in explanatory modeling vs. predictive modeling. We close with a discussion of implications of this work to the general scientific community and to the field of statistics.

(Joint with Otto Koppius, Erasmus University, The Netherlands)

http://www.smith.umd.edu/faculty/gshmueli/web/html/

R-in-Finance: Rmetrics, An Environment for Financial Computing and Computational Finance
Dr. Diethelm Wurtz

Speaker: Dr. Diethelm Wurtz

Affiliation: Institute for Theoretical Physics, ETH Zurich

When: Tuesday, 18 November 2008, 12:00 pm to 1:00 pm

Where: Room 303.279, Science Centre

R/Rmetrics has become the premier open source solution for teaching financial market analysis and valuation of financial instruments. With hundreds of functions build on modern methods R/Rmetrics combines explorative data analysis and statistical modeling. Rmetrics is embedded in R, both building an environment which creates for students a first class system for applications in statistics and finance.

In the heart of the software environment are powerful time/date and time series management tools, functions for analyzing financial time series, functions for forecasting, decision making and trading, functions for the valuation of financial instruments, and functions for portfolio design, optimization and risk management.

In this talk I will give an overview on R/Rmetrics and present new directions and recent developments.

http://www.itp.phys.ethz.ch/research/comp/econophys

iTRAQing through the magical MudPIT
Dr. Katya Ruggiero

Speaker: Dr. Katya Ruggiero

Affiliation: School of Biological Sciences, U. Auckland

When: Thursday, 13 November 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

The ability to comprehensively view a proteome - the entire complement of proteins expressed within an organism's tissue, biofluid or cell - is of fundamental importance since this would enable insights into areas such as the effects of disease or therapeutic treatment, processes such as growth and ageing, or adaptation to environment or stimulus. This is the domain of quantitative proteomics - addressing questions beyond protein identification and delving into questions of differential expression and expression profiles.

MudPIT (Multidimensional Protein Identification Technology) coupled with iTRAQ (isobaric Tags for Relative and Absolute Quantitation) are becoming popular tools for proteome interrogation. But, how well-understood is the data that is generated? And what are the challenges associated with the design and analysis of data from a MudPIT-iTRAQ experiment? The first half of this talk will be devoted to describing these technologies and the data that they generate, and will be followed by a discussion of some the experimental design and analysis challenges I have encountered with MudPIT-iTRAQ experiments conducted in the School of Biological Sciences.

How from where? Spatial analysis of species-rich plant communities
Dr. George Perry

Speaker: Dr. George Perry

Affiliation: SGGES, U. Auckland

When: Thursday, 6 November 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

Although spatial processes are believed to play a crucial role in the maintenance of local species richness, explicitly linking these processes to ecological patterns is difficult. In this seminar I will describe attempts to unravel the links between ecological pattern and ecological process in four fire-prone, high-diversity plant communities using high resolution spatial data and spatial point pattern analysis and process modelling. More specifically, I will consider: (i) the spatial patterns formed by individuals of the same species (conspecifics) and (ii) the patterns formed by individuals of pairs of different species (i.e., bivariate interactions between heterospecifics) and (iii) whether site-level (e.g., soil resources) or species-level (e.g., life history traits) are useful predictors of the spatial arrangement of these hyper-diverse plant communities.

http://www.sges.auckland.ac.nz/the_school/our_people/perry_george/index.shtm

Automating the Marking of R Code
Dr. Paul Murrell

Speaker: Dr. Paul Murrell

Affiliation: Dept. of Statistics, U. Auckland

When: Thursday, 23 October 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

The course STATS 220 has weekly labs where the students

submit computer code to perform various tasks. In the latter half of the course, the students submit R code. This talk will describe an R package that allows these student submissions to be marked (semi-)automatically, including allocating partial marks for answers that are "close" to the correct answer.

http://www.stat.auckland.ac.nz/showperson?firstname=Paul&surname=Murrell

Robustness of efficiency in semiparametric models for incomplete data
A. Prof. Thomas Lumley

Speaker: A. Prof. Thomas Lumley

Affiliation: Biostatistics, U. Washington

When: Thursday, 2 October 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

An interaction term between two binary exposures in a logistic regression model can be estimated by case-control logistic regression. If the two exposures are known to be independent, the same interaction term can also be estimated more efficiently by a case-only analysis. Unfortunately, the case-only analysis loses its efficiency advantage under small departures from independence that cannot reliably be diagnosed from the data.

The same phenomenon occurs very generally when comparing estimators based on sampling weights with the semiparametric efficient estimator in regression models for incomplete data. The estimators based on sampling weights are always consistent for the same value as would be obtained with complete data, even if the regression model is misspecified. The semiparametric efficient estimator, which is more efficient when the model is correctly specified, loses its efficiency advantage when the model is misspecified by an amount too small to be reliably detected.

http://www.biostat.washington.edu/people/faculty.php?netid=tlumley

Teaching Applied Statistics Using a Virtual Manufacturing Process
A. Prof. Stefan Steiner

Speaker: A. Prof. Stefan Steiner

Affiliation: Director, Business and Industrial Statistics Research Group (BISRG), Department of Statistics, U. Waterloo

When: Wednesday, 24 September 2008, 3:00 pm to 4:00 pm

Where: Statistics Seminar Room 222, Science Centre

Teaching Applied Statistics Using a Virtual Manufacturing Process

This non technical talk describes an innovative and successful use of technology, through a virtual process, to aid in the teaching of statistical concepts and methodology. The virtual process simulates a manufacturing process for automobile camshafts that has a number of processing steps and many inputs.

At the start of an upper year undergraduate course Stat 435/835: Statistical Methods for Process Improvement, each team of students is given a budget and assigned the task of reducing variation in a critical output characteristic of a different version of the virtual process. Throughout the term, the teams plan and analyze a series of process investigations (~1/week) to first learn about how their process works and, by the end of term, how to improve it. The teams interact with the virtual process through a web interface. Each team submits a weekly written report describing their recent progress and twice per term presents to the class at a "management review meeting." The virtual process is also used as the context for all midterms and exams. Based on anecdotal evidence and survey results, students find interacting with the virtual process fun, stimulating and challenging.

The goals of this talk are to show how the virtual process aids in the teaching of material and concepts in Stat 435/835 and to describe its main pedagogical benefits. With thought and some adaptation something similar should be possible for other applied statistics courses.

http://www.bisrg.uwaterloo.ca/

Supercomputing and You
Stephen Cope

Speaker: Stephen Cope

Affiliation: Department of Statistics, University of Auckland

When: Thursday, 11 September 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

Huge amounts of processing power is available to researchers at the University of Auckland.

I bring news from the High Performance Research Symposium and APAN26 Sustainable Networking conference with an overview on the latest trends in supercomputing.

I follow that up with some practical ways your programs can be modified to take advantage of the computing resources that we have, not just in our Department, but distributed throughout New Zealand.

This is aimed at non-technical researchers and is particularly useful at anyone looking forward to writing software to do analysis or simulations.

http://www.stat.auckland.ac.nz/~kimihia/sun-grid#talk-2008-09-11

Weighted Regressions: How Wrong Can They Be?
A. Prof. David Fletcher

Speaker: A. Prof. David Fletcher

Affiliation: Dept. of Mathematics and Statistics, U. Otago

When: Thursday, 28 August 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

In many regression-type settings, we know that the error in the response variable differs from one observation to another. In ecology, for example, we might have several estimates of the annual breeding rate of an animal in each of several years, with each estimate having a different level of precision. The aim of the analysis might be to relate the breeding rate to one or more environmental variables measured in the same years.

A modern and useful approach to the analysis of this type of data is to fit a hierarchical model using Markov chain Monte Carlo methods. In a consulting context, however, it may be useful to provide the client with an analysis that is simpler for them to implement, e.g. a weighted regression.

I will consider problems involved in implementing weighted regression correctly and how one might determine when a standard (unweighted) regression is sufficient. An important and common special case occurs when there is no explanatory variable and we simply want to calculate a weighted mean.

http://www.maths.otago.ac.nz/home/department/staff/_staffscript.php?s=david_fletcher

Probability - a slippery concept
Prof. Chris Triggs

Speaker: Prof. Chris Triggs

Affiliation: Dept. of Statistics, U. Auckland

When: Monday, 11 August 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

I have been invited to speak to the annual conference of the judges of the Environment Court. In this talk I review the interpretation of probability by the legal process. In passing I will discuss misconceptions, fallacies, and misunderstandings which have lead statisticians into conflict with the courts.

http://www.stat.auckland.ac.nz/showperson?firstname=Chris&surname=Triggs

The interaction between copyright and digital rights management

Speaker: Richard Stallman

Affiliation:

When: Friday, 8 August 2008, 11:00 am to 12:30 pm

Where: Conference Center

Copyright developed in the age of the printing press, and was designed to fit with the system of centralized copying imposed by the printing press. But the copyright system does not fit well with computer networks, and only draconian punishments can enforce it.

The global corporations that profit from copyright are lobbying for draconian punishments, and to increase their copyright powers, while suppressing public access to technology. But if we seriously hope to serve the only legitimate purpose of copyright--to promote progress, for the benefit of the public--then we must make changes in the other direction.

http://www.stallman.org/

Detecting anomalies in sensor network data
Dr. Richard Jarrett

Speaker: Dr. Richard Jarrett

Affiliation: CSIRO Mathematical and Information Sciences

When: Thursday, 7 August 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

Sensor networks have the potential to deliver huge amounts of data in real time which needs to be processed and acted upon. A key concern for water distribution utilities is the identification of anomalies in the sensor network data, as a means of detecting system perturbations that may require immediate attention. This talk will describe a number of techniques which we have tried to allow real-time assessment of the data. We will also discuss some of the other issues that arise, including mixing of water from different sources and the estimation of travel time for water in the network.

http://www.cmis.csiro.au/Richard.Jarrett/

Approximate Bayesian computation
Dr. David Welch

Speaker: Dr. David Welch

Affiliation: Center for Infectious Disease Dynamics, Pennslyvania State University

When: Thursday, 31 July 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

This talk will give an overview of approximate Bayesian computation (ABC), sometimes known as likelihood-free Bayesian inference. ABC is an approach to statistical inference that works within the Bayesian paradigm but uses simulation-based approximations to avoid explicit computation of the likelihood. ABC has become widely used in genetics, and is gaining acceptance in other application areas such as epidemiology and systems biology.

I'll outline different sampling schemes that have been proposed for use within the ABC framework, including rejection sampling, Markov chain Monte Carlo and sequential Monte Carlo. I'll also discuss novel density estimation methods first introduced by Beaumont et al. (2002) which use linear regression to correct some of the bias in using ABC techniques.

This is joint work with David Balding and Mark Beaumont.

http://www.cidd.psu.edu/people/bio_welch.html

Bayesian Locally-Optimal Design of Knockout Tournaments
A. Prof. Mark E. Glickman

Speaker: A. Prof. Mark E. Glickman

Affiliation: Dept of Health Policy and Management, Boston University School of Public Health

When: Monday, 28 July 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

The elimination or knockout format is one of the most common designs for pairing competitors in tournaments and leagues. In each round of a knockout tournament, the losers are eliminated while the winners advance to the next round. Typically, the goal of such a design is to identify the overall best player. Using a common probability model for expressing relative player strengths, we develop an adaptive approach to pairing players for each round that maximizes the probability that the best player advances to the next round. We evaluate our method using simulated game outcomes under several data-generating mechanisms, and compare it to random pairings, to the standard knockout format, and to variants of the standard format by Hwang (1982) and Schwenk (2000).

http://math.bu.edu/people/mg/

Developing a Simulation Model of the Primary Care System: Work-in-Progress.

Speaker: Prof. Peter Davis and colleagues from the Social Statistics Research Group

Affiliation: Dept. of Sociology, U. Auckland

When: Thursday, 24 July 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

Health costs are increasing inexorably, and New Zealand is moving steadily towards a society with a greater proportion of the population in the retirement years. Does this inevitably mean that health expenditure is going to increase dramatically in the near future, or are there other likely scenarios? This is an HRC-funded project designed to construct a computer-based model of the primary care system, using existing data sets and imputed information to test various scenarios for the future by simulation. The team working on the project are all recent graduates of the Department of Statistics - Janet Pearson, Martin von Randow and Sanat Pradhan - and so the project also demonstrates the variety of ways in which basic statistical concepts and training can be applied to inform policy and public decision-making.

Tails of natural hazards: Implications for risk, erosion and ecology
Dr. Bruce Malamud

Speaker: Dr. Bruce Malamud

Affiliation: Kings College, London

When: Tuesday, 22 July 2008, 4:00 pm to 5:00 pm

Where: Room 120, Science Centre 303

There is increasing evidence that the extremes of many natural hazards satisfy power-law or other heavy-tailed frequency-size statistics. Examples include earthquakes, volcanic eruptions, landslides, snow avalanches, forest and wildfires, meteorite impacts, and possibly floods. Although power-law distributions are commonly associated with the frequency-size distribution of small to large earthquakes, the frequency-size statistics of many other natural hazards are frequently associated with distributions that are more thin-tailed. The occurrence for large and very-large events using power-law frequency-size distributions is often much more conservative, with a greater chance of a large event occurring in a given period of time, compared to thinner tail distributions. The choice of the statistical distribution used or assumed has many implications to Earth Sciences research. In this paper we will present the frequency-size distributions for wildfires and landslides, both found to be robustly power-law for the medium and large events, and the implications of these statistics to erosion, ecology, and risk

http://www.kcl.ac.uk/schools/sspp/geography/people/acad/malamud/

Bayesian Methods in Software Testing and Capability Maturity Model

Speaker: Dr. Salilesh Mukhopadhyay

Affiliation: Feasible Solution LLC

When: Thursday, 10 July 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

Most of the time the testing phase of the application does not allow an End-to-End testing with all the interfaces up and running in QA environment. To remedy the situation the common practice is to mainly have coverage analysis with appropriate risk mitigation in Financial (large) applications. However the testing scenarios of E-Commerce, Business to Business Models, Electronic Raw Material Acquisition, Auction Engines are not at all different. The present paper provides a statistical analysis of the scenarios and calculates the associated risk for each phase of testing cycle like, Unit Testing, System Integration Testing, End -to-End Testing, User Acceptance Testing.

The purpose of the present paper is to provide an outline of Bayesian Methods (Prior and Posterior Analysis) in Software testing with special emphasize on Capability Maturity Model. Different types of testing scenarios will be analyzed for each phase of testing like Unit Testing, System Integration Testing, End-to-End Testing, User Acceptance Testing and Post Production Maintenance Testing. Manual and Automated Testing will be discussed in details for stable QA environment. Finally the benefits of using quantitative analysis to mitigate the associated risk in each phase will be discussed in details.

ABOUT THE SPEAKER

Dr. Salilesh Mukhopadhyay is a leading expert in Quality Assurance. He brings more than 25 years of teaching, research and professional experience to different Universities, Software Industry in India, Australia and USA. He has worked for companies such as Paine-Webber, IXL Inc.,INautix, CSFB Direct, e-STEEL, America's Job Bank, ACNielsen, Detroit Edison Energy Company, UBS and Keane Inc. He is CEO of Feasible Solution LLC. which he founded in 1998.

Quantification of sonar impacts on marine mammals
Dr. Carl Donovan

Speaker: Dr. Carl Donovan

Affiliation: School of Mathematics and Statistics, University of St Andrews

When: Thursday, 26 June 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

The potential effects of new powerful sonars on the marine environment is currently high profile, particularly the potential link with marine mammal strandings. The University of St Andrews and BAE Systems Insyte have been involved in the development of an advisory software tool for the mitigation of sonar impacts on marine mammals. The end result of this collaboration was the SAFESIMM algorithm and subsequent software for UK Governmental bodies.

The SAFESIMM algorithm quantifies the likely physical effects of a sonar trial on marine mammals based on simulations, sonar propagation models, and extensive databases of oceanographic recordings, marine mammal distributions/characteristics.

The calculation framework, required data, trials and tribulations of this process will be presented.

http://creem2.st-andrews.ac.uk/

Haplotypic analysis of large-scale genetic data sets
Dr. Brian Browning

Speaker: Dr. Brian Browning

Affiliation: Dept. of Statistics, U. Auckland

When: Thursday, 19 June 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 279, Science Centre

Large-scale genetic data sets contain hundreds of thousands of genetic markers genotyped on thousands of individuals. At each genetic marker, the data for each individual are an unordered pair of alleles (i.e. variants) called a genotype. One allele is inherited from each parent, and the alleles inherited together from a single parent are called a haplotype. Haplotypes are not directly observed, but they are a natural biological unit for statistical analysis. Haplotypic analyses of large-scale genetic data face several statistical and computational challenges:

  • - A statistical model for the haplotype frequencies must be developed.
  • - Haplotypes must be inferred from the observed genotype data.
  • - Haplotypes must be clustered so that the clusters can be tested for association with the trait status.
  • - The algorithms must scale to data sets with > 15 billion data points.

In this talk, I will review the existing methods for haplotypic analysis of large-scale data sets, including our own method that is implemented in the Beagle software package, and I will present our haplotypic analysis of the Wellcome Trust Case Control Consortium study dataset (14,000 cases of 7 common diseases and 3,000 shared controls genotyped for 500,000 markers).

A brief introduction StatWeave/SWeave/Beamer

Speaker: Nick Horton and David Scott

Affiliation: Smith College and Dept. of Statistics, U. Auckland

When: Tuesday, 10 June 2008, 10:00 am to 11:00 am

Where: Statistics Seminar Room 222, Science Centre

Part I:

Nick will provide a brief introduction to StatWeave, an open source package similar in flavour to Sweave. It allows users to include R, SAS, Stata, Matlab and/or Maple code in a LaTeX or Open Office document which may be executed to create a final document that is updated with output (text and graphics) using the most recent data and changes to the code. This can drastically automate analyses as well as move towards replicable research (as well as saying goodbye to typical "copy and paste" type errors).

Part II:

David will give a brief introduction to Beamer. Beamer is a LaTeX package that allows users to produce "Powerpoint-like" slides. It has great advantages in that typesetting mathematical expressions becomes trivial - especially inline mathematics. With the addition of SWeave (or perhaps StatWeave) you can include R-code which can be executed and the output nicely formatted for your slides.

Marginal modelling of capture-recapture data
Dr. Alain Vandal

Speaker: Dr. Alain Vandal

Affiliation: Department of Mathematics and Statistics, McGill University

When: Thursday, 22 May 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

In the absence of individual covariate information, the usual representation for capture-recapture data from k lists is that of an incomplete 2^k-1 contingency table, possibly subject to stratification. The unobserved cell in the table is the number of unobserved individuals in the population of interest. Log-linear models form a natural and popular class of models to predict the value in the unobserved cell (and thus the size of the population). Such models usually take the cells of the contingency table to be distributed multinomially and model the cell expectations: this is joint log-linear modelling.

An alternative is to model the expected number of individuals in the intersection of lists in each subset of the set of all lists, an option called marginal log-linear modelling. Using the M

http://www.math.mcgill.ca/vandal/

Dynamic, Interactive Documents for Reproducible Research and Pedagogy
A/ Prof. Duncan Temple Lang

Speaker: A/ Prof. Duncan Temple Lang

Affiliation: Department of Statistics, UC Davis

When: Thursday, 15 May 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

I'll describe a framework for authoring documents that provides a different and richer way to think about reproducible documents. Our concept of a document acts as a notebook in capturing not only the results, code and data used in the analysis, simulation, etc., but also aspects of the thought process involved in the work. The documents can contain explorations of alternative approaches that made no difference, dead-ends, possible directions, etc. that are often as important as the "final" result. This aids the author when returning to a project, or a reviewer attempting to understand different details than would be found in a regular "edited" paper. But importantly, such documents can also provide "students" with a view to statistical thinking and practice in new, richer ways.

Different projections or views of the document can be created for different audiences and in different formats. Such documents can also contain information that allow the readers to interactively alter the inputs to the computations and explore different claims and decisions in the analysis. I'll discuss the concepts and possibilities of this and provide a high-level overview of an evolving implementation of this framework using R, XML, XSL, HTML/Web browsers

This is joint work with Deborah Nolan (UC Berkeley) that is still in development.

http://www.stat.ucdavis.edu/~duncan/

Developing probabilistic statistical thinking
Prof. Helen MacGillivray

Speaker: Prof. Helen MacGillivray

Affiliation: Department of Mathematics and Statistics, Queensland University of Technology

When: Thursday, 8 May 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

In the focus over the past decade on data-driven, realistic approaches to building statistical literacy and data analysis curriculum, the explicit development of probability reasoning beyond coins and dice has received less attention. There are two aspects of probability at the introductory tertiary level: for use in introductory data analysis; and as foundation for further study in statistical modelling and applications, and increasingly in areas in information technology, engineering, finance, health and others. This paper advocates a minimalist objective-oriented approach in the former, and a constructivist, collaborative and data-linked approach in the latter. The latter is the main focus here, discussing learning and assessment strategies and analysis of student and tutor feedback, and student performance. Objectives of the course include helping students to unpack, analyse and extend what they have brought with them to tertiary study, develop problem-solving skills, link with data and real investigations and processes, and consolidate and synthesize foundation mathematical skills.

http://www.sci.qut.edu.au/about/staff/mathsci/statistics/macgillivrayh.jsp

Learning statistical thinking in large classes through statistical discovery journeys (Real data, real projects, real students)
Prof. Helen MacGillivray

Speaker: Prof. Helen MacGillivray

Affiliation: Mathematics and Statistics, QUT

When: Thursday, 8 May 2008, 12:00 pm to 1:00 pm

Where: Statistics Seminar Room 222, Science Centre

Statistical empowerment is

- confidence to use the basics in real problems, and to continue to learn;

- understanding what one is doing. In service or introductory tertiary courses, this may not necessarily include the theory of why, but must develop sufficient confidence with the what;

- knowing your toolbox - no matter how basic - choosing tools appropriately and using with awareness;

- interpreting output with discretion and synthesizing results in context;

- inclusive of communicating and problem-tackling.

This paper reports on a decade of using free-choice group projects in planning, implementing, analysing and reporting a data investigation in introductory university statistics courses for science, engineering, mathematics and related degree programs. Overall, the strategy has proved highly successful for learning and teaching across diverse cohorts in a number of classes of sizes typically between 200 and 500. For example, in semester 1, 2008, there are 750 students doing data investigation projects in contexts entirely of their choosing. The paper discusses how the challenges of managing, assessing and resourcing such a strategy in large introductory classes have been met, and identifies some interesting pedagogical questions for debate. But the main focus of the paper is on the students

http://www.sci.qut.edu.au/about/staff/mathsci/statistics/macgillivrayh.jsp

Data Saving and Sharing in a Digital Age: Issues and Implications

Speaker: Prof. Peter Davis

Affiliation: Dept. of Sociology and Dept. of Statistics, University of Auckland

When: Thursday, 1 May 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

Governments, research funders and academic communities are getting increasingly interested in the potential of saving research data for the purposes of secondary and shared use. This is now made much easier with the arrival of GRID technologies (the Kiwi Advanced Research and Education Network (KAREN) in New Zealand) and with the wholesale digitisation of data of all kinds. There are implications for researchers. For example, the NIH in the UK requires all grantees to make their data available within a year of the publication of their first major research output. The ESRC in the UK requires grantees to deposit their data. Should we follow this lead? What are the issues? What are the implications? This session will provide an overview of a TEC-funded project to initiate data saving and sharing using the GRID for the social sciences, but with much wider implications. The session will end with a hands-on demonstration showing the ease and power of access to multiple data sets locally and internationally.

http://www.nzssds.org.nz/

From a single heart beat - exploring the dynamics of cardiac function using ultrasound images

Speaker: Katrina Poppe

Affiliation: Dept. of Statistics and Cardiovascular Research Laboratory, Dept. of Medicine, University of Auckland

When: Wednesday, 23 April 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

Just as ultrasound is used to view a foetus during pregnancy, cardiac ultrasound, or echocardiography, is a commonly used clinical tool that allows us to see the heart as it beats.

Cardiac motion is a continuum that depends on a close interrelationship between the contraction and relaxation phases of the cardiac cycle (heart beat). However measurements that assess cardiac function are traditionally made at specific brief moments during the cardiac cycle.

In an effort to maximise the information gained from cardiac images, we are exploring the use of functional data analysis to represent cardiac function. This presentation will give an overview of our work so far using information obtained from three-dimensional ultrasound images of the left ventricle.

Statistical issues in palaeoclimatology
Dr. Anthony Fowler

Speaker: Dr. Anthony Fowler

Affiliation: SGES, University of Auckland

When: Thursday, 17 April 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

Researchers at the University of Auckland have been collecting and measuring kauri tree-ring samples for about 30 years. We now have a substantial collection of material for the last several thousand years, from a combination of living trees, colonial-era buildings, and sub-fossil wood from Northland and Waikato swamps. Our attention has recently been focused on extracting evidence of the past climate of northern New Zealand from ring-width variations at inter-annual through millennial time scales. Although an understanding of both climate and biology is a prerequisite, the construction of the equations used to relate tree-rings to climate is essentially a statistical exercise, and the research has now reached the stage where, in my opinion, it will be most effectively progressed by raising the statistical bar. To this end, this seminar will: a) background the kauri tree-ring research; b) highlight some of the statistical issues we are encountering; and, c) identify areas where I think collaboration would be particularly fruitful (e.g. re-sampling, time series analysis, Bayesian inference). I will also take the opportunity to comment on multi-proxy issues and opportunities, including a potential research project to reconstruct a (probabilistic) El Nino - Southern Oscillation event chronology for the last several hundred years.

http://www.sges.auckland.ac.nz/the_school/our_people/fowler_anthony/index.shtm

Analysis of Probability-Linked Data
Prof. Ray Chambers

Speaker: Prof. Ray Chambers

Affiliation: Centre for Statistical and Survey Methodology, University of Wollongong

When: Tuesday, 15 April 2008, 11:00 am to 12:00 pm

Where: MLT 3, Science Centre

Over the last 25 years, advances in information technology have led to the creation of linked individual level databases containing vast amounts of information relevant to research in health, epidemiology, economics, demography, sociology and many other scientific areas. In many cases this linking is not perfect but can be modelled as the outcome of a stochastic process, with a non-zero probability that a unit record in the linked database is actually based

on data drawn from distinct individuals. The impact of the resulting linkage errors on analysis of data extracted from such a source is only slowly being appreciated. In this talk I will describe a framework for statistical analysis of such probability-linked data. Applications to linear and logistic regression modelling of this type of data will be discussed.

http://www.socstats.soton.ac.uk/staff/chambers/

Bayesian Hierarchical Occupancy Modelling of the Swedish Bird Survey
Dr. James Russell

Speaker: Dr. James Russell

Affiliation: Université de la Réunion

When: Thursday, 10 April 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

Species presence or absence across a landscape, and functions of it such as species richness, are all fundamental ecological parameters. Species presence is confounded by species detectability however, and repeat surveys are required in order to estimate species detectability to ascertain true presence-absence across a landscape, and hence for multiple species the overall species richness.

Bayesian hierarchical models with species augmentation provide a novel method for analysing species richness across a landscape for large datasets with imperfect detection. National bird monitoring surveys lend

themselves to this form of analysis. I will present results of one such analysis on the Swedish Bird Survey from 1996-2007*. This large dataset surveys 716 sites, with over 200 species and 8 replicated surveys. Species

detectability and occupancy are modelled with site-specific covariates for latitude, elevation and habitat. Results are encouraging and provide guidance for long-term monitoring of the Swedish avifauna.

  • the talk will only present results from analysis of the reduced 1996 dataset due to massive computational burden preventing the author from analysing the entire dataset thus far.

http://www.stat.auckland.ac.nz/~jrussell/

Much ado about nothing: a review of the state of the art of incomplete data
Dr. Nicholas J. Horton

Speaker: Dr. Nicholas J. Horton

Affiliation: Department of Mathematics and Statistics, Smith College, and Department of Statistics, University of Auckland

When: Thursday, 20 March 2008, 4:00 pm to 5:00 pm

Where: Computer Science Seminar Room 279, Science Centre

*** Please note that this seminar is in room 303S.279, and not 303.222. ***

Missing data are a recurring problem that can cause bias or lead to inefficient analyses. The development of statistical methods to address missingness has been actively pursued in recent years, including imputation, likelihood and weighting approaches (Ibrahim et al., JASA, 2005; Horton and Kleinman, TAS, 2007). Each approach is considerably more complicated when there are many patterns of missing values and both categorical and continuous random variables are involved. Implementations of routines to incorporate observations with incomplete variables in regression models are now widely available, though not commonly used. We review these methods in the context of a motivating example from a large health services research dataset. Some discussion of the feasibility of sensitivity analyses to the missing at random assumption will also be provided.While there are still limitations to the current implementations, and additional efforts are required of the analyst, it is feasible and scientifically desirable to incorporate partially observed values as well as undertake sensitivity analyses to modelling and missingness assumptions.

http://www.math.smith.edu/~nhorton

Four Case Studies in non-vaccine HIV Prevention Trial Design
Dr. Deborah Donnell

Speaker: Dr. Deborah Donnell

Affiliation: Vaccine and Infectious Disease Institute, Fred Hutchinson Cancer Research Center, Seattle, WA

When: Thursday, 13 March 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

The HIV Prevention Trials Network, funded by the US National Institutes of Health, has fielded 10 Phase III efficacy trials in the last 10 years, with non-vaccine interventions ranging from behavioral modification to microbicides and antiretroviral therapy for the prevention of HIV transmission. Each intervention strategy carries its own set of parameters : choice of population, public health relevance, probable efficacy and available resources that play into choices made in the design of the trial.

This talk compares the designs of four of these HIV prevention trials currently in the field, and raises some of the challenges we have faced in designing these trials. Using these case studies I will illustrate how the different parameters of each prevention modality led to different choices in detectable efficacy, primary analysis, and interim monitoring guidelines.

http://www.scharp.org/bios/deborah_donnell.html

Bayesian Modelling for Ecological Count Data
Yilin (Sammie) Jia

Speaker: Yilin (Sammie) Jia

Affiliation: Department of Statistics, The University of Auckland

When: Thursday, 6 March 2008, 4:00 pm to 5:00 pm

Where: Seminar Room 222, Science Centre

In this talk I will present some likelihood based models in Bayesian approach on the ecological count data sampled by Warwick et al (RM Warwick, KR Clarke, JM Gee, 1990.) The effect of disturbance by soldier crabs Mictyris platycheles H. Milne Edwards on meiobenthic community structure. Journal of Experimental Marine Biology and Ecology. Vol. 135, no. 1, pp. 19-33. ).

We use generalized linear (mixed) models to fit count data. Poisson (POI) modelling is one of the traditional techniques to model count data. However, there are two main features in ecological count data: excess zeros and over-dispersion, in which POI might not take care of. Someone argued that the Poisson-Gamma (PGA) model can cater for such features. My talk will show how PGA model and POI model can fit this dataset.

The results from the NB model and POI model will be presented. Some posterior predictive checks will be shown to compare the differences between the PGA model and POI model. The results will be also compared to three multivariate techniques: ANOSIM, non-metric MDS and Permutational MANOVA using PRIMER. Some problems encountered when applying Bayesian approach will be discussed. Some of my further studies will be introduced.

*** Please note that the half hour talk by Dr Russell Millar which was to accompany this talk has been cancelled. ***

http://www.stat.auckland.ac.nz/~yjia012/

The Tyranny of Power

Speaker: Prof. J. Martin Bland

Affiliation: Dept. of Health Sciences, University of York

When: Tuesday, 26 February 2008, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 222, Science Centre

Power calculations are widely used to decide (or justify) the planned sample sizes for medical research studies. They are usually required by grant-giving bodies, ethics committees, and others. They have several problems, including the need to decide on a target difference to detect. They rely on the idea that the analysis will take the form of a significance test. At least twenty years age, statisticians began to argue for the results of medical research to be presented in the form of estimates of risk or treatment effects, with interval estimates, rather than as P values. These arguments have been accepted by the medical research community and now most major journals in the filed include this in their instructions to authors. Hence we have a mismatch between the principles used in design and those used in analysis. I propose that we design studies according to the required width of an interval estimate rather than power to detect or exclude a given difference. I shall give an example of a funded clinical trial designed in this way.

Professor Bland is a recognised expert in the field of medical statistics.

Variable Inclusion and Shrinkage Algorithms

Speaker: Dr Gareth James

Affiliation: Marshall School of Business, U.S.C.

When: Thursday, 14 February 2008, 4:00 pm to 5:00 pm

Where: Seminar Room 222, Science Centre

The Lasso is a popular and computationally efficient procedure for automatically performing both variable selection and coefficient shrinkage on linear regression models. One limitation of the Lasso is that the same tuning parameter is used for both variable selection and shrinkage. As a result, it may end up selecting a model with too many variables to prevent over shrinkage of the regression coefficients. We will discuss a new class of methods called Variable Inclusion and Shrinkage Algorithms (VISA). This approach is capable of selecting sparse models while avoiding over shrinkage problems. VISA uses a path algorithm, so it is computationally efficient. It will be shown through extensive simulations that the new approach significantly outperforms the Lasso and also provides improvements over more recent procedures, such as the Dantzig selector, Relaxed Lasso and Adaptive Lasso. VISA also possesses interesting theoretical results which we will briefly touch on.

http://www-rcf.usc.edu/~gareth/

"Mathematical learning": A powerful bridge between different fields of mathematics and statistics. Applications in biological network prediction
Kevin Bleakley

Speaker: Kevin Bleakley

Affiliation: Institut de Mathématiques et de Modélisation de Montpellier

When: Thursday, 24 January 2008, 4:00 pm to 5:00 pm

Where: Seminar Room 222, Science Centre

On a midnight-snack trip to the fridge, I'm dying for an orange. But the electricity is out and I know there are grapefruit in the fridge too. I hate grapefruit. How will I know I've grabbed an orange? I suddenly remember that grapefruit are usually heavier than oranges and have a smoother skin. Based on this memory, I grab what I think is an orange, and start to peel... What is the chance I am right?

Mathematical learning is a formalisation of this kind of question. We take what we know from the past (weight, smoothness) and try and find the "best" rule to predict the class (orange or grapefruit) of a novel object. A recent family of methods, based on what are called "kernels", used in tandem with the SVM algorithm, are causing a (not-so?) quiet revolution in all things "prediction", such as voice, text and image recognition. Furthermore, they have found a new home in post-genomic biology, where masses of numerical information, if used well, can provide real biological insight.

In this talk, I'll give a (hopefully) not-too-technical introduction to these methods, highlighting their theoretical origins in functional analysis and statistics. Then I'll introduce a way to use/choose kernels and the SVM algorithm to predict protein-protein interaction networks and metabolic networks in the cell, and show results on two benchmark biological data sets.

http://www.math.univ-montp2.fr/~bleakley/school.html

Suggestion Box

You can help improve this website. If you've found something wrong with this page, whether broken, incorrect, or missing, then let us know so we can improve it.

If you need course advice or confidentiality, then please contact a Postgraduate Advisor.

Please tick as many options as apply:

Comments or any additional details: