**2011 Seminars**

## Department of Statistics

# 2011 Seminars

Seminars by year: Current | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012

**Analysis of Low-Copy Number Forensic DNA Profiles**

Speaker: Professor David Balding

Affiliation: Institute of Genetics, University College London

When: Thursday, 24 November 2011, 4:00 pm to 5:00 pm

Where: Room 303.279, Science Centre

Recently, forensic DNA profiling has been used with far smaller volumes of DNA than was previously thought possible. This "low copy number" profiling enables DNA to be recovered from the slightest traces left by touch or even merely breath, but brings with it serious interpretation problems that courts have not yet adequately solved. The most important challenge to interpretation arises when either or both of "dropout" and "dropin" create discordances between the crime scene DNA profile and that expected under the prosecution allegation. Stochastic artefacts affecting the peak heights read from the electropherogram (epg) are also problematic, in addition to the effects of masking from the profile of a known contributor. I will outline a framework for assessing such evidence, based on likelihoods that involve dropout and masking by stutter and other artefacts, and discuss possible options for modelling dropin. I will apply it to casework examples and reveal serious deficiencies in some reported analyses. In particular, analysis based on inclusion probabilities, widely used in the USA and other countries, can be systematically unfair to defendants, sometimes extremely so.

http://www.ucl.ac.uk/ugi/research/DavidBalding

**Queues with advance reservations**

Speaker: Prof. Peter Taylor

Affiliation: U. Melbourne

When: Wednesday, 23 November 2011, 11:00 am to 12:00 pm

Where: ECE briefing room, 303-257

Queues where, on "arrival", customers make a reservation for service at some time in the future are endemic. However, there is surprisingly little about them in the literature.

Simulations illustrate some interesting implications of the facility to make such reservations. For example introducing independent and identically distributed reservation periods into an Erlang loss system can either increase or decrease the blocking probability from that given by Erlang's formula, despite the fact that the process of `reserved arrivals' is still Poisson.

In this talk we shall discuss a number of ways of looking at such queues. In particular, we shall obtain various transient and stationary distributions associated with the `bookings diary' for the infinite server system. However, this does not immediately answer the question of how to calculate the above-mentioned blocking probabilities. We shall conclude with a few suggestions as to how this calculation might be carried out.

(joint work with Robert Maillardet)

http://www.ms.unimelb.edu.au/Personnel/profile.php?PC_id=147

**Nonparametric inference for an inverse queueing problem**

Speaker: Dr. Susan Pitts

Affiliation: U. Cambridge

When: Thursday, 17 November 2011, 4:00 pm to 5:00 pm

Where: 303S.279, Science Centre

Consider an M/G/1 queueing model, where customers arrive at a single server in a Poisson process with rate lambda, the service-time distribution function is G, and customers are served in order of arrival. Output quantities of interest, for example the distribution of the workload, are determined once lambda and G are known.

We consider an inverse statistical estimation problem for this queueing model, where the service-time distribution G and the traffic intensity are unknown, but sampled data are available on the workload. We use these data to construct nonparametric estimators of G and the traffic intensity, and we study asymptotic properties of these estimators.

This is joint work with M.B. Hansen (Aalborg).

http://www.statslab.cam.ac.uk/Dept/People/pitts.html

**Managing Uncertainty in Complex Crop Models**

Speaker: Esther Meenken

Affiliation: Plant and Food Research, Lincoln

When: Thursday, 10 November 2011, 4:00 pm to 5:00 pm

Where: Room 303.279, Science Centre

Crop models are tools to help us understand the physiological mechanisms controlling crop development. They are used for on farm management, research simulations and policy making decisions. Almost all such models are deterministic. Describing the effects of uncertainty from estimation of the input parameters on the predicted outputs is important for many reasons including; predictions within the whole system, obtaining a range of possibilities rather than single point estimates, helping to identify mechanisms that are poorly understood or parameterised, obtaining more robust model predictions, and being more able to assess risk.

In this talk I will discuss a particular crop model used to predict flowering time in wheat, and the importance of collaboration between the quantitative sciences of crop modelling and statistics to describe such a system. I conclude with an outline current milestones of my doctoral project.

http://www.stat.auckland.ac.nz/showperson?firstname=Esther&surname=Meenken

**Tidy data**

Speaker: Dr Hadley Wickham

Affiliation: Rice U.

When: Thursday, 3 November 2011, 4:00 pm to 5:00 pm

Where: 303s.279, Science Centre

It's often said that 80% of the effort of analysis is spent just getting the data ready to analyse, the process of data cleaning. Data cleaning is not only a vital first step, but it is often repeated multiple times over the course of an analysis as new problems come to light. Despite the amount of time it takes up, there has been little research on how to do clean data well. Part of the challenge is the breadth of activities that cleaning encompasses, from outlier checking to date parsing to missing value imputation. To get a handle on the problem, this talk focusses on a small, but important, subset of data cleaning that I call data ``tidying'': getting the data in a format that is easy to manipulate, model, and visualise.

In this talk you'll see some of the crazy data sets that I've struggled with over the years, and learn the basic tools for making messy data tidy. I'll also discuss tidy tools, tools that take tidy data as input and return tidy data as output. The idea of a tidy tool is useful for critiquing existing R functions, and will help to explain why some tasks that seem like they should be easy are in fact quite hard. This work ties together `reshape2`, `plyr` and `ggplot2` with a consistent philosophy of data. Once you master this data format, you'll find it much easier to manipulate, model and visualise your data.

**Information criteria for semi-parametric models**

Speaker: Dr. Yuichi Hirose

Affiliation: U. Victoria

When: Thursday, 27 October 2011, 4:00 pm to 5:00 pm

Where: 303s.279, Science Centre

Since Akaike proposed an Information Criterion, this approach to model selection has been an important part of Statistical data analysis. Since then, many Information Criteria have been proposed and it is still an active field of research. Despite there being many contributors in this topic, we have not had proper Information Criteria for semi-parametric models. In this talk, we give ideas to develop an Information Criteria for semi-parametric models.

http://www.victoria.ac.nz/smsor/staff/yuichi-hirose.aspx

**Recent developments in exploratory and integrative multivariate approaches for `omics' data. Application to a kidney transplant study.**

Speaker: Dr Kim-Anh Le Cao

Affiliation: Queensland Facility for Advanced Bioinformatics, University of Queensland

When: Thursday, 15 September 2011, 4:00 pm to 5:00 pm

Where: Room 303.257, Science Centre

With the availability of many `omics' data, such as transcriptomics, proteomics or metabolomics, the integrative or joint analysis of multiple datasets from different technology platforms is becoming crucial to unravel the relationships between different biological functional levels. However, the development of such analyses is a major computational and technical challenge as most approaches suffer from high data dimensionality, as the number of measured biological entities (the variables) is much greater than the number of samples or patients.

Promising exploratory and integrative approaches have been recently developed for that purpose, such as sparse variants of Principal Component Analysis, and Partial Least Squares regression, in order to select the relevant variables related to the system under study.

I will illustrate a whole range of these methodologies to a kidney transplant study from the PROOF Centre (Centre of Excellence for the Prevention of Organ Failure, Vancouver) that includes transcriptomics, proteomics and clinical data. I will show how we can get a deeper understanding of the data and select potential biomarkers to classify acute rejection or non rejection of kidney transplant. All these methodologies are implemented in the R package mixOmics as well as in its associated web-interface http://mixomics.qfab.org/.

http://www.math.univ-toulouse.fr/~lecao/

**A statistician, a politician and a grandma walk into a bar: explaining the evidence that the earth is getting hotter.**

Speaker: Charlotte Wickham

Affiliation: U.C. Berkeley

When: Thursday, 18 August 2011, 4:00 pm to 5:00 pm

Where: Room 303.257, Science Centre

An estimate of the history of the Earth's surface temperature is central to monitoring and explaining climate change, but production of the estimate isn't the end of the story. How should the explanation of the estimate change depending on the audience? I will discuss the basic temperature data involved, highlighting some of the peculiarities that are common points of discussion, then present some ideas for making the field more accessible to all.

http://www.stat.berkeley.edu/~wickham/

**Predicting daily fishing success: The assessment of lunar and indigenous fishing calendars**

Speaker: Ben Stevenson

Affiliation: University of Auckland

When: Tuesday, 16 August 2011, 4:00 pm to 5:00 pm

Where: PLT1 (303-G20)

Recreational fishers in New Zealand often make use of lunar and indigenous Maori fishing calendars in order to predict fishing success on specific days. Little is known as to the performance of such predictions and whether or not they hold any practical use to the everyday angler. Here, investigation primarily implements generalised nonlinear mixed effects models fitted in AD Model Builder (ADMB) using data provided by the Ministry of Fisheries. These data are based on catches of snapper Pagrus auratus, New Zealand's most popular recreational species, and take the form of diary and boatramp surveys. Our analyses also allow for the evaluation of ADMB in its capabilities of explaining large datasets using complex nonlinear mixed effects models.

**Sequence diversity under the multispecies coalescent**

Speaker: Dr. Joseph Heled

Affiliation: Computational Evolution Group, Dept. of Computer Science, University of Auckland

When: Thursday, 4 August 2011, 4:00 pm to 5:00 pm

Where: 301.407

The study of sequence diversity under phylogenetic models is a classic. Theoretical studies of diversity under the Kingman coalescent appeared shortly after the introduction of the coalescent in 1982.In this talk I revisit the topic under the multispecies coalescent, an extension of the single population model to multiple populations. I derive exact formulas for the sequence dissimilarity under a basic multispecies setup, discuss the effects of relaxing some of the model assumptions, and show some simple usages for real data.

http://compevol.auckland.ac.nz/people/

**Unravelling the puzzle of**

Speaker: Professor Dalton Conley

Affiliation: NYU

When: Thursday, 28 July 2011, 4:00 pm to 5:00 pm

Where: ECE briefing room 303.257

Recently there has been increased interest in collecting biomarkers, in general, and genetic data, in particular, among social scientists. Indeed, scientists are being inundated with enormous amounts of genetic data that would seem to augur for an intense focus on the genetic roots of behavior. On the other hand, genome-wide association studies (GWAS) have failed to (additively) account for prior broad sense heritability estimates, sending human geneticists off onto a mystery novel plot in search of the

**Uses of Two-Phase Case-Control Designs in Genetics**

Speaker: Professor Duncan C. Thomas

Affiliation: Preventive Medicine (Division of Biostatistics) Keck School of Medicine University of Southern California

When: Friday, 22 July 2011, 2:00 pm to 3:00 pm

Where: Comp Sci. Seminar Room 303.279, Science Centre

The elegant theory of two-phase sampling designs that Norm Breslow has pioneered has arguably been under-utilized by practicing epidemiologists. In this lecture, I will briefly review the various approaches to the analysis of two-phase designs, compare the relative efficiencies of two-phase and counter-matched designs, and discuss recent theoretical work on optimization of these designs in certain rarified situations were analytic treatment of the full likelihood is possible. After a brief update on applications in environmental epidemiology for selection of locations to deploy exposure monitors for spatial modeling of air pollution levels, I will devote the balance of the talk to potential applications in genetic epidemiology. Following a genome-wide association study (GWAS), the next stage may be either fine mapping with additional single nucleotide polymorphism (SNP) markers or direct sequencing to try to identify the causal variants responsible for GWAS hits. Because of the high cost of such studies, it is generally feasible to do this only on a subset of subjects. Two-phase sampling studies, sampling jointly on disease and associated SNPs, can offer great efficiency in selecting individuals for further genotyping. Increasingly such studies are being conducted using DNA pooling, posing additional statistical complications. Another area of potential application concerns subsampling for biomarker measurement to elucidate latent disease pathways. Here, there is potential efficiency gain by sampling jointly on the basis of disease status, exposure, and genotypes. Outstanding issues requiring further methodological research include the feasibility of using semi-parametric maximum likelihood approaches and dealing with variable selection in high-dimensional settings, as well as the problem of reverse causation in biomarker studies.

http://www.usc.edu/programs/pibbs/site/faculty/thomas_d.htm

**a random Random Walk walk**

Speaker: Dr. Mark Holmes

Affiliation: Department of Statistics, University of Auckland

When: Thursday, 21 July 2011, 2:00 pm to 3:00 pm

Where: 303S.279

[This is a mathematics dept. colloquium]

A simple (symmetric) random walk is the basic model used to describe random movement in discrete time and space. It is a sum of independent and identically distributed increments, and therefore obeys the two fundamental convergence laws in probability, the "law of large numbers" and the "central limit theorem". In 1 dimension the model is recurrent, which is equivalent to saying that a gambler repeatedly playing a fair game will eventually become bankrupt. We will begin with a discussion of simple random walks in all dimensions, and will make our way towards much more complicated random walk models, where the increments need not be independent or identically distributed.

http://www.stat.auckland.ac.nz/~mholmes/

**A multivariate omnibus test: Swiss Army Knife or plastic spork?**

Speaker: Associate Professor Brian McArdle

Affiliation: Department of Statistics, University of Auckland

When: Thursday, 23 June 2011, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 303.222, Science Centre

I introduce a way of combining p-values (based on Fisher's Omnibus test) from separate univariate tests to all a test of a multivariate hypothesis. I investigate its potential to handle common situations in Multivariate Analysis of Variance that are currently intractable. I hope to show its potential for a wide variety of other situations.

http://www.stat.auckland.ac.nz/~bmcardle/

**Studying Evolutionary Models of Mutation using Genomic Data**

Speaker: Dr. Reed A. Cartwright

Affiliation: Dept. of Ecology and Evolutionary Biology, Rice University

When: Monday, 30 May 2011, 3:00 pm to 4:00 pm

Where: Statistics Seminar (303.222)

Mutation is the engine of evolution, generating the fundamental variation that powers the diversification of life. Although mutation is necessary for evolution, it comes with a cost because many mutations play significant roles in disease susceptibility. Thus both evolutionary biology and biomedical science are interested in finding and studying mutations. The focus of this talk will be studying evolutionary models of point mutation and indel formation, including identifying point mutations from deep sequencing of human families. The talk will also discuss statistical and computational strategies to estimate mutations in the presence of uncertainties in molecular sequence data, such as those that arise in second generation sequencing.

http://dererumnatura.us/about/

**Genetic variation in human traits**

Speaker: Professor Bruce Weir

Affiliation: Dept. of Biostatistics, University of Washington and Dept. Statistics, UoA

When: Thursday, 12 May 2011, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 303.222, Science Centre

Whole-genome searches for genetic variants that influence human traits often find several variants that meet criteria to be declared statistically significant. The contribution of these variants to the genetic variance of the traits, however, is generally much less than has been found previously by family studies, giving rise to the "missing heritability" problem. Peter Visscher and colleagues in Australia have used all the variants in a whole-genome analysis (not just those significantly associated with the trait) to estimate relatedness among members of the study group, and then estimate genetic variation in the trait. Their results are closer to those found in family studies. The methodology will be reviewed with reference to data collected in the GENEVA project.

Background publications include:

Lango Allen, H., Estrada, K., Lettre, G., Berndt, S.I., Weedon, M.N., Rivadeneira, F., Willer, C.J., Jackson, A.U., Vedantam, S., Raychaudhuri, S., et al. (2010). Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832-838.

Yang, J., Benyamin, B., McEvoy, B.P., Gordon, S., Henders, A.K., Nyholt, D.R., Madden, P.A., Heath, A.C., Martin, N.G., Montgomery, G.W., et al.

(2010). Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-569.

http://sph.washington.edu/faculty/fac_bio.asp?url_ID=Weir_Bruce

**The Distribution of the Maximum of Two Multivariate ARMA Processes.**

Speaker: Dr. Kit Withers

Affiliation: formerly Industrial Research Limited

When: Thursday, 5 May 2011, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 303.222, Science Centre

It's a great pleasure to be giving a talk at my alma mater a day after my Doctor of Science degree was awarded.

Today I've chosen to talk on the behaviour of the extremes of correlated observations. This is one of many papers I've written motivated by the study of climate change. I should say that the model today does not include trends.

For simplicity the correlation models I consider today are first order moving averages and first order autoregressive schemes. But we shall be considering multivariate series.

Today's paper combines the familiar problem of recurrence relations, together with new work extending Fredholm integral equation theory to non-symmetric kernels, and extending the Jordan form of a matrix to a function of two variables.

The distribution of the multivariate maximum of n correlated observations is given as a weighted sum of the nth powers of the eigenvalues of a non-symmetric Fredholm kernel.

http://homepages.slingshot.co.nz/~kitw

**A generalization of Fisher's exact test: rock-paper-scissors and friendly ghosts**

Speaker: Dr. Robin Hankin

Affiliation:

When: Thursday, 21 April 2011, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 303.222, Science Centre

Fisher's exact test, named for Sir Ronald Aylmer Fisher, tests contingency tables for homogeneity of proportion. Here I discuss a generalization of Fisher’s exact test for the case where some of the table entries are constrained to be zero. The null hypothesis suggests a new distribution, dubbed the "hyperdirichlet", which is conjugate to a broad and interesting class of discrete observations.

The test and distribution are implemented in the R programming language in the form of two packages.

**Generalizations of Ward's method in hierarchical clustering**

Speaker: Prof. Alan Lee

Affiliation: U. Auckland

When: Thursday, 14 April 2011, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 303.222, Science Centre

In this paper, we consider several generalizations of the popular Ward's method for agglomerative hierarchical clustering. Our work was motivated by clustering software, such as the R function hclust which accepts a distance matrix as input and applies Ward's definition of inter-cluster distance to produce a clustering.

The standard version of Ward's method uses squared Euclidean distance to form the distance matrix. We explore the effect on the clustering of using other definitions of distance, such as powers of the Minkowski distance.

(Joint work with Bobby Wilcox)

http://www.stat.auckland.ac.nz/showperson?firstname=Alan&surname=Lee

**Trial designs to measure the impact of HIV prevention interventions**

Speaker: Dr. Deborah Donnell

Affiliation: SHARP

When: Thursday, 31 March 2011, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 303.222, Science Centre

Recent successes in HIV prevention have led to a call for trials that assess the population impact of HIV prevention interventions. Combination prevention packages are debating the merits of simultaneously increasing uptake of multiple prevention strategies: HIV testing and linkage to treatment; strategies for increasing the number of HIV infected people on treatment; and uptake of adult male circumcision. A variety of designs can be used to measure impact including community randomized designs, stepped wedge designs and before vs. after assessments, each of which present different challenges. In the context of a campaign to scale up male circumcision to 80% coverage in the Kingdom of Swaziland, I will discuss the design of the Swaziland HIV Incidence Study, which aims to assess the impact on HIV incidence of the male circumcision campaign in concert with scale up of antiretroviral therapy and other prevention programs.

http://www.scharp.org/bios/deborah_donnell.html

**Personalised Medicine: any real prospects from genetics?**

Speaker: Prof. Thomas Lumley

Affiliation: Department of Statistics, The University of Auckland

When: Wednesday, 23 March 2011, 12:30 pm to 1:30 pm

Where: Graham Hill Lecture Theatre, Level 12, Hospital Support building, Grafton Campus

*Professor of Biostatistics, Dr Thomas Lumley, will be giving a talk in the medical school at Grafton.*

The Human Genome Project, the HapMap project, and the continuing fall in costs of genotyping have led to outbreaks of enthusiasm and/or concern over the prospects for predicting health and response to treatment from genetic variation. Many papers have been published from large-scale genetic association studies. Very little has come of it from a clinical or public health viewpoint, though there are a small number of promising findings. I will review three possible reasons for this: there isn't much to find, we couldn't find it if there was, and it wouldn't be much use if we could find it. I will also discuss the exceptions where there is a real prospect of directly using genomic findings in clinical medicine.

**The Convergence of Loop-Erased Random Walk to SLE(2) in the Natural Parametrization**

Speaker: Dr. Michael Kozdron

Affiliation: U. Regina

When: Tuesday, 1 March 2011, 11:00 am to 12:00 pm

Where: 303-257

The Schramm-Loewner evolution is a one-parameter family of random growth processes in the complex plane introduced by Oded Schramm in 1999. In the past decade, SLE has been successfully used to describe the scaling limits of various two-dimensional lattice models. One of the first proofs of convergence was due to Greg Lawler, Oded Schramm, and Wendelin Werner who gave a precise statement that the scaling limit of loop-erased random walk is SLE with parameter 2. However, their result was only for curves up to reparameterization. There is reason to believe that the scaling limit of loop-erased random walk is SLE(2) with the very specific natural time parameterization that was recently introduced by Greg Lawler and Scott Sheffield, and further studied by Greg Lawler and Wang Zhou. I will describe several possible choices for the parameterization of the discrete curve that should all give the natural time parameterization in the limit, but with the key difference being that some of these discrete time parameterizations are easier to analyze than the others. This talk is based on joint work in progress with Tom Alberts and Robert Masson.

http://stat.math.uregina.ca/~kozdron/

**The discrete-time parabolic Anderson model with heavy-tailed potential**

Speaker: Dr. Nicholas Petrelis

Affiliation: U. Nantes

When: Tuesday, 1 March 2011, 10:00 am to 11:00 am

Where: 303-257

We consider a discrete-time version of the parabolic Anderson model. This may be described as a model for a directed (1+d)-dimensional polymer interacting with a random potential, which is constant in the deterministic direction and i.i.d. in the d orthogonal directions.

The potential at each site is a positive random variable with a polynomial tail at infinity. We show that, as the size of the system diverges, the polymer extremity is localized almost surely at one single point which grows ballistically.

We give an explicit characterization of the localization point and of the typical paths of the model.

http://www.math.sciences.univ-nantes.fr/~petrelis/

**Competing risks and multi-state survival analysis**

Speaker: Jim Lewsey

Affiliation: Institute of Health and Wellbeing, University of Glasgow

When: Monday, 28 February 2011, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 303.222, Science Centre

My interest in this area was stimulated when studying the risk of recurrent non-fatal stroke following a first-ever stroke. Due to the high risk of death following a stroke, it was clear that the non-informative censoring assumption would be violated if the Kaplan-Meier approach was used to estimate risk of recurrence.

Using a competing risks approach [1], the cumulative incidence of events of interest can be estimated taking into account other (competing) events that could occur. In terms of modelling, cause-specific hazards can be modelled using standard Cox and parametric approaches, or the cumulative incidence curves can be modelled directly [2]. A competing risk model is a simple example of a multi-state model where different transitions between different states are simultaneously modelled and predictions of risk can be made by applying Markov theory.

In this seminar I will illustrate the difference between the Kaplan-Meier and cumulative incidence approaches and show applied results of competing risks and multi-state models from research into the epidemiology of cardiovascular disease.

1. H Putter, M Fiocco, RB Geskus. Tutorial in biostatistics: Competing risks and multi-state models. Statistics in Medicine 2007; 26:2389-2430

2. JP Fine and RJ Gray. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association 1999; 94: 496-509

**Fold-up derivatives of set-valued functions: Further problems and applications to probability theory**

Speaker: Prof. Estate Khmaladze

Affiliation: Victoria U.

When: Monday, 28 February 2011, 11:00 am to 12:00 pm

Where: Statistics Seminar Room 303.222, Science Centre

Suppose C(t), defined for non-negative t, is a set-valued function, such that each C(t) is a bounded Borel subset of the reals. Suppose it is continuous in the Hausdorff metric. Can it also be differentiable?

We will explain why a statistician or specialist of image analysis may be interested in this notion, and define a "fold-up" derivative.

We discuss how this derivative compares with some other infinitesimal characteristics of C(t). We will also consider certain non-crossing problem for d-dimensional Brownian motion, solution of which we do not yet know.

http://www.victoria.ac.nz/smsor/staff/estate-khmaladze.aspx

**Random walks and pseudo-branching processes**

Speaker: Dr. Mark Holmes

Affiliation: U. Auckland

When: Monday, 28 February 2011, 10:00 am to 11:00 am

Where: Statistics Seminar Room 303.222, Science Centre

We'll discuss a family of random walks called multi-excited random walks, some unusual population-type models, and the relationship between them.

http://www.stat.auckland.ac.nz/~mholmes/

**Prophetic constructions of branching and related processes**

Speaker: Prof. Tom Kurtz

Affiliation: U. Wisconsin (Madison)

When: Monday, 28 February 2011, 9:00 am to 10:00 am

Where: Statistics Seminar Room 303.222, Science Centre

A collection of well-known population models (branching processes, branching Markov processes, branching processes in random environments, etc.) is constructed in a manner that associates with each individual in the population a characteristic called a level. If the levels are known to an observer, then a great deal is known about the future behavior of individuals (e.g., the exact time of death). If the levels are not known, then the models evolve as the observer would expect from their classical descriptions. The constructions enable straight forward proofs of a variety of known and not-so-well-known results including limit theorems, conditioning arguments, and derivation of properties of genealogies.

http://www.math.wisc.edu/~kurtz/

**Saddlepoint approximations, likelihood asymptotics, and approximate conditional inference**

Speaker: Jared Tobin

Affiliation: Memorial U. Newfoundland

When: Thursday, 24 February 2011, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 303.222, Science Centre

Maximum likelihood methods may be inadequate for parameter estimation in models where many nuisance parameters are present. The modified profile likelihood (MPL) of Barndorff-Nielsen (1983) serves as a highly accurate approximation to the marginal or conditional likelihood, when either exist, and can be viewed as an approximate conditional likelihood when they do not. We examine the modified profile likelihood, its variants, and its connections with Laplace and saddlepoint approximations under both theoretical and pragmatic lenses.

http://mun.academia.edu/JaredTobin

**Particle representations for continuum models**

Speaker: Prof. Tom Kurtz

Affiliation: U. Wisconsin (Madison)

When: Tuesday, 15 February 2011, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 303.222, Science Centre

Many stochastic and deterministic models are derived as continuum limits of discrete stochastic systems as the size of the systems tends to infinity. Discrete "particles" are each assigned a small mass and the limiting "mass distribution," typically characterized as a solution of a deterministic or stochastic partial differential equation, gives the desired model. A number of examples will be described in which keeping the discrete particles in the limit provides a useful tool for justifying the limit and analyzing the limiting model. Examples include derivation of fluid models for internet protocols, models of stock prices set by infinitely many competing traders, and consistency of numerical schemes for filtering equations.

http://www.math.wisc.edu/~kurtz/

**Nonparametric tests in case of heterogeneous variances**

Speaker: Prof. Markus Neuhauser

Affiliation: Koblenz U. of Applied Sciences

When: Thursday, 10 February 2011, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 303.222, Science Centre

Statisticians are often faced with a situation where they need to compare the central tendencies of two samples. The standard tools of the t test and Wilcoxon rank-sum test are unreliable when the variances of the groups are different. The problem is particularly severe when sample sizes are different between groups. The unequal-variance t test (Welch test) may not be suitable for nonnormal data. Here, we propose the use of Brunner and Munzel's generalized Wilcoxon test. Alternative tests shall be discussed as well. It is demonstrated that a permutation test is possible even in case of heteroscedasticity, to allow for small sample sizes.

http://www.rheinahrcampus.de/Prof-Dr-Markus-Neuhaeuser.1804.0.html

**Enhancing Data Literacy: The contribution of the ESDS to teaching quantitative methods in the social sciences**

Speaker: Louise Corti

Affiliation: U. Essex

When: Wednesday, 9 February 2011, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 303.222, Science Centre

http://www.data-archive.ac.uk/about/staff?sid=13

**Propensity Score Matching in Cluster Randomized Trials**

Speaker: Prof. John Kalbfleisch

Affiliation: U. Michigan

When: Thursday, 27 January 2011, 4:00 pm to 5:00 pm

Where: Statistics Seminar Room 303.222, Science Centre

Cluster randomization trials with relatively few clusters have been widely used in recent years for evaluation of health care strategies. On average, randomized treatment assignment achieves alance in both known and unknown confounding factors between treatment groups, however, in practice investigators can only introduce a small amount of stratification and cannot balance on all the important variables simultaneously. The limitation arises especially when there are many confounding variables and in small studies. Such is the case in the INSTINCT trial designed to investigate the effectiveness of an education program in enhancing the tPA use in stroke patients. In this paper, we introduce a new randomization design, the balance match weighted (BMW) design, which applies the optimal matching with constraints technique to a prospective randomized design and aims of to minimize the mean squared error of the treatment effect estimator. A simulation study shows that, under various confounding scenarios, the BMW design can yield substantial reductions in the MSE for the treatment effect estimator compared to a completely randomized or matched-pair design. We illustrate these methods in proposing a design for the INSTINCT trial.

http://www.sph.umich.edu/iscr/faculty/profile.cfm?uniqname=jdkalbfl