Department of Statistics


Seminars

On Akaike and likelihood cross-validation criteria for model selection
Dr. Benoit Liquet

Speaker: Dr. Benoit Liquet

Affiliation: INSERM, Victor Segalen University, Bordeaux 2

When: Thursday, 16 February 2012, 4:00 pm to 5:00 pm

Where: ECE Seminar Room 303.257, Science Centre

The talk discusses Akaike and likelihood crossvalidation criteria for model/estimator choice. After a presentation of the main concept on model selection, we will focus on the choice of estimators in non-standard cases. First, we study two examples arising when we wish to assess the quality of estimators on a particular set of information, while the estimators may use a larger set of information. The first example occurs when we construct a model for an event which happens if a continuous variable is above a certain threshold. We can compare estimators based on the observation of only the event or on the whole continuous variable. The other example is that of predicting survival based on survival information only, or using in addition information on patient's disease. We develop modified AIC and LCV criteria to compare estimators in this non-standard situation. Second, we study the choice of estimators in prognostic studies. Estimators for a clinical event may use repeated measurements of markers in addition to fixed covariates. These measurements can be linked to the clinical event by joint modelling involving latent structures. When the objective is to choose between different estimators based on joint models for prediction, the conventional Akaike information criterion (AIC) is not well adapted and decision should be based on predictive accuracy. We define an adapted risk function called expected prognostic cross entropy (EPCE) and further modify it for right-censored observations. The risk functions can be estimated by leave-one-out cross validation, for which we give approximate formulas and asymptotic distributions.

http://www.biostatisticien.eu/liquet

How big are the real mortality reductions produced by cancer screening? Why do so many trials report only 20%?
Professor Jim Hanley

Speaker: Professor Jim Hanley

Affiliation: Department of Epidemiology, Biostatistics, and Occupational Health, McGill University

When: Monday, 20 February 2012, 11:00 am to 12:00 pm

Where: ECE Seminar Room 303.257, Science Centre

Influential reports on the reductions produced by screening for cancers of the prostate, colon and lung have appeared recently. The reported reductions in these randomized trials have been modest, and smaller than expected. But even more surprisingly, all three figures are very similar. I explain why these figures are underestimates and why the seemingly-universal 20% reduction is an artifact of the prevailing data-analysis methods and stopping rules. A different approach to the analysis of data from cancer screening trials is called for.

http://www.medicine.mcgill.ca/epidemiology/hanley/

Delights of directional statistics: (a) free-lunch learning, (b) crystals, earthquakes and orthogonal axial frames
Professor Peter Jupp

Speaker: Professor Peter Jupp

Affiliation: U. St. Andrews

When: Wednesday, 22 February 2012, 11:00 am to 12:00 pm

Where: ECE briefing room 257

Observations that are directions, axes, or rotations require the techniques of directional statistics. This talk aims to illustrate the special flavour of this area through glimpses at two topics.

(a) Free-lunch learning

Free-lunch learning (FLL) is a phenomenon in which relearning partially-forgotten mental associations induces recovery of other associations. When memory is modelled in terms of an artificial neural network, the extent of FLL can be quantified in geometrical terms and involves Grassmann manifolds of subspaces of the weight space. Joint work with Jim Stone (Psychology, Sheffield) will be described, in which simple properties of uniform distributions yield results on the expected amount of FLL. The form of forgetting plays an important role.

(b) Crystals, earthquakes and orthogonal axial frames

Orthogonal axial frames are (ordered) sets of orthogonal axes. They arise as (i) key geometrical elements (known in seismology as `focal mechanisms') of earthquakes, (ii) principal axes of certain physical tensors (e.g. stress tensors), (iii) axes of orthorhombic crystals. Some tools for the analysis of data that are orthogonal axial frames will be will be described. This is joint work with Richard Arnold (Wellington).

http://www.mcs.st-and.ac.uk/~pej/

Optimal Asset Pricing
Dr. Rolf Turner

Speaker: Dr. Rolf Turner

Affiliation: Department of Statistics, University of Auckland

When: Thursday, 8 March 2012, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 303.279, Science Centre

It is a well-known phenomenon that airline passengers travelling on the same flight (same origin and same destination) and in the same class (cabin) will often have paid substantially different fares. This apparent anomaly in the pricing pattern is due to the fact there is a time-varying elasticity of demand (or "price sensitivity") for this particular "product".

My co-author Pradeep Banerjee and I have developed a differential equations model which permits one to derive an optimal pricing policy in such a setting. (The policy is "optimal" in terms of the expected value of a stock of goods at a specified time.) Deriving the optimal policy requires a model for the price sensitivity and for an inhomogeneous Poisson arrival rate of customers. So far we have worked with smooth price sensitivity functions. However it is somewhat easier to translate intuitive conjectures about price sensitivity into a function framed as being piecewise linear in price.

In this talk I will explain a bit about how the differential equations for the optimal prices are derived, and then discuss how the technique must be adjusted to deal with the piecewise linear setting. I will also discuss some of the techniques that I and my Summer Scholarship student Ray Shahlori have used to code up the solution procedure in R. I will show some examples of solutions.

http://www.stat.auckland.ac.nz/showperson?firstname=Rolf&surname=Turner

Top
Analysis of Low-Copy Number Forensic DNA Profiles
Professor David Balding

Speaker: Professor David Balding

Affiliation: Institute of Genetics, University College London

When: Thursday, 24 November 2011, 4:00 pm to 5:00 pm

Where: Room 303.279, Science Centre

Recently, forensic DNA profiling has been used with far smaller volumes of DNA than was previously thought possible. This "low copy number" profiling enables DNA to be recovered from the slightest traces left by touch or even merely breath, but brings with it serious interpretation problems that courts have not yet adequately solved. The most important challenge to interpretation arises when either or both of "dropout" and "dropin" create discordances between the crime scene DNA profile and that expected under the prosecution allegation. Stochastic artefacts affecting the peak heights read from the electropherogram (epg) are also problematic, in addition to the effects of masking from the profile of a known contributor. I will outline a framework for assessing such evidence, based on likelihoods that involve dropout and masking by stutter and other artefacts, and discuss possible options for modelling dropin. I will apply it to casework examples and reveal serious deficiencies in some reported analyses. In particular, analysis based on inclusion probabilities, widely used in the USA and other countries, can be systematically unfair to defendants, sometimes extremely so.

http://www.ucl.ac.uk/ugi/research/DavidBalding

Queues with advance reservations
Prof. Peter Taylor

Speaker: Prof. Peter Taylor

Affiliation: U. Melbourne

When: Wednesday, 23 November 2011, 11:00 am to 12:00 pm

Where: ECE briefing room, 303-257

Queues where, on "arrival", customers make a reservation for service at some time in the future are endemic. However, there is surprisingly little about them in the literature.

Simulations illustrate some interesting implications of the facility to make such reservations. For example introducing independent and identically distributed reservation periods into an Erlang loss system can either increase or decrease the blocking probability from that given by Erlang's formula, despite the fact that the process of `reserved arrivals' is still Poisson.

In this talk we shall discuss a number of ways of looking at such queues. In particular, we shall obtain various transient and stationary distributions associated with the `bookings diary' for the in finite server system. However, this does not immediately answer the question of how to calculate the above-mentioned blocking probabilities. We shall conclude with a few suggestions as to how this calculation might be carried out.

(joint work with Robert Maillardet)

http://www.ms.unimelb.edu.au/Personnel/profile.php?PC_id=147

Nonparametric inference for an inverse queueing problem
Dr. Susan Pitts

Speaker: Dr. Susan Pitts

Affiliation: U. Cambridge

When: Thursday, 17 November 2011, 4:00 pm to 5:00 pm

Where: 303S.279, Science Centre

Consider an M/G/1 queueing model, where customers arrive at a single server in a Poisson process with rate lambda, the service-time distribution function is G, and customers are served in order of arrival. Output quantities of interest, for example the distribution of the workload, are determined once lambda and G are known.

We consider an inverse statistical estimation problem for this queueing model, where the service-time distribution G and the traffic intensity are unknown, but sampled data are available on the workload. We use these data to construct nonparametric estimators of G and the traffic intensity, and we study asymptotic properties of these estimators.

This is joint work with M.B. Hansen (Aalborg).

http://www.statslab.cam.ac.uk/Dept/People/pitts.html

Managing Uncertainty in Complex Crop Models

Speaker: Esther Meenken

Affiliation: Plant and Food Research, Lincoln

When: Thursday, 10 November 2011, 4:00 pm to 5:00 pm

Where: Room 303.279, Science Centre

Crop models are tools to help us understand the physiological mechanisms controlling crop development. They are used for on farm management, research simulations and policy making decisions. Almost all such models are deterministic. Describing the effects of uncertainty from estimation of the input parameters on the predicted outputs is important for many reasons including; predictions within the whole system, obtaining a range of possibilities rather than single point estimates, helping to identify mechanisms that are poorly understood or parameterised, obtaining more robust model predictions, and being more able to assess risk.

In this talk I will discuss a particular crop model used to predict flowering time in wheat, and the importance of collaboration between the quantitative sciences of crop modelling and statistics to describe such a system. I conclude with an outline current milestones of my doctoral project.

http://www.stat.auckland.ac.nz/showperson?firstname=Esther&surname=Meenken

Tidy data
Dr Hadley Wickham

Speaker: Dr Hadley Wickham

Affiliation: Rice U.

When: Thursday, 3 November 2011, 4:00 pm to 5:00 pm

Where: 303s.279, Science Centre

It's often said that 80% of the effort of analysis is spent just getting the data ready to analyse, the process of data cleaning. Data cleaning is not only a vital first step, but it is often repeated multiple times over the course of an analysis as new problems come to light. Despite the amount of time it takes up, there has been little research on how to do clean data well. Part of the challenge is the breadth of activities that cleaning encompasses, from outlier checking to date parsing to missing value imputation. To get a handle on the problem, this talk focusses on a small, but important, subset of data cleaning that I call data ``tidying'': getting the data in a format that is easy to manipulate, model, and visualise.

In this talk you'll see some of the crazy data sets that I've struggled with over the years, and learn the basic tools for making messy data tidy. I'll also discuss tidy tools, tools that take tidy data as input and return tidy data as output. The idea of a tidy tool is useful for critiquing existing R functions, and will help to explain why some tasks that seem like they should be easy are in fact quite hard. This work ties together `reshape2`, `plyr` and `ggplot2` with a consistent philosophy of data. Once you master this data format, you'll find it much easier to manipulate, model and visualise your data.

http://had.co.nz/

Information criteria for semi-parametric models
Dr. Yuichi Hirose

Speaker: Dr. Yuichi Hirose

Affiliation: U. Victoria

When: Thursday, 27 October 2011, 4:00 pm to 5:00 pm

Where: 303s.279, Science Centre

Since Akaike proposed an Information Criterion, this approach to model selection has been an important part of Statistical data analysis. Since then, many Information Criteria have been proposed and it is still an active field of research. Despite there being many contributors in this topic, we have not had proper Information Criteria for semi-parametric models. In this talk, we give ideas to develop an Information Criteria for semi-parametric models.

http://www.victoria.ac.nz/smsor/staff/yuichi-hirose.aspx

Recent developments in exploratory and integrative multivariate approaches for `omics' data. Application to a kidney transplant study.
Dr Kim-Anh Le Cao

Speaker: Dr Kim-Anh Le Cao

Affiliation: Queensland Facility for Advanced Bioinformatics, University of Queensland

When: Thursday, 15 September 2011, 4:00 pm to 5:00 pm

Where: Room 303.257, Science Centre

With the availability of many `omics' data, such as transcriptomics, proteomics or metabolomics, the integrative or joint analysis of multiple datasets from different technology platforms is becoming crucial to unravel the relationships between different biological functional levels. However, the development of such analyses is a major computational and technical challenge as most approaches suffer from high data dimensionality, as the number of measured biological entities (the variables) is much greater than the number of samples or patients.

Promising exploratory and integrative approaches have been recently developed for that purpose, such as sparse variants of Principal Component Analysis, and Partial Least Squares regression, in order to select the relevant variables related to the system under study.

I will illustrate a whole range of these methodologies to a kidney transplant study from the PROOF Centre (Centre of Excellence for the Prevention of Organ Failure, Vancouver) that includes transcriptomics, proteomics and clinical data. I will show how we can get a deeper understanding of the data and select potential biomarkers to classify acute rejection or non rejection of kidney transplant. All these methodologies are implemented in the R package mixOmics as well as in its associated web-interface http://mixomics.qfab.org/.

http://www.math.univ-toulouse.fr/~lecao/

A statistician, a politician and a grandma walk into a bar: explaining the evidence that the earth is getting hotter.
Charlotte Wickham

Speaker: Charlotte Wickham

Affiliation: U.C. Berkeley

When: Thursday, 18 August 2011, 4:00 pm to 5:00 pm

Where: Room 303.257, Science Centre

An estimate of the history of the Earth's surface temperature is central to monitoring and explaining climate change, but production of the estimate isn't the end of the story. How should the explanation of the estimate change depending on the audience? I will discuss the basic temperature data involved, highlighting some of the peculiarities that are common points of discussion, then present some ideas for making the field more accessible to all.

http://www.stat.berkeley.edu/~wickham/

Predicting daily fishing success: The assessment of lunar and indigenous fishing calendars

Speaker: Ben Stevenson

Affiliation: University of Auckland

When: Tuesday, 16 August 2011, 4:00 pm to 5:00 pm

Where: PLT1 (303-G20)

Recreational fishers in New Zealand often make use of lunar and indigenous Maori fishing calendars in order to predict fishing success on specific days. Little is known as to the performance of such predictions and whether or not they hold any practical use to the everyday angler. Here, investigation primarily implements generalised nonlinear mixed effects models fitted in AD Model Builder (ADMB) using data provided by the Ministry of Fisheries. These data are based on catches of snapper Pagrus auratus, New Zealand's most popular recreational species, and take the form of diary and boatramp surveys. Our analyses also allow for the evaluation of ADMB in its capabilities of explaining large datasets using complex nonlinear mixed effects models.

Sequence diversity under the multispecies coalescent
Dr. Joseph Heled

Speaker: Dr. Joseph Heled

Affiliation: Computational Evolution Group, Dept. of Computer Science, University of Auckland

When: Thursday, 4 August 2011, 4:00 pm to 5:00 pm

Where: 301.407

The study of sequence diversity under phylogenetic models is a classic. Theoretical studies of diversity under the Kingman coalescent appeared shortly after the introduction of the coalescent in 1982.In this talk I revisit the topic under the multispecies coalescent, an extension of the single population model to multiple populations. I derive exact formulas for the sequence dissimilarity under a basic multispecies setup, discuss the effects of relaxing some of the model assumptions, and show some simple usages for real data.

http://compevol.auckland.ac.nz/people/

Top


Please give us your feedback or ask us a question

This message is...


My feedback or question is...


My email address is...

(Only if you need a reply)

A to Z Directory | Site map | Accessibility | Copyright | Privacy | Disclaimer | Feedback on this page