Table of Contents

Summer Scholarships 2009-2010 (Complete List)

For general information regarding scholarships, please see the Scholarships and Research Grants webpage.

These Summer Scholarships are available to suitable undergraduate students.

Aim: To give students research experience and acquaint them with the research activities of the Department and with potential graduate supervisors.

Emolument: $5,000 non-taxable.

Duration: Full-time employment for eight to twelve weeks (400 hours).

Applications close Friday 28th August 2009.

Value-added measures to evaluate magnitude of change

This project is offered by the Woolf Fisher Research Centre, The University of Auckland.

A National project was set up to conduct the first nation-wide evaluation of Schooling Improvement Initiatives. One key goal was to examine the improvements in achievement across these initiatives. The first part of the summer scholarship involves a literature review on value-added measures appropriate for measuring educational improvements (e.g., cost-benefit analysis of educational programmes/studies, treatment of missing data and its influence on value-added studies, educational studies that have applied segmentation methods or other multivariate analysis methods for survey data in relation to student achievement). Based on the literature review, in the second part of the scholarship, the student will undertake a series of exploratory studies on the data (e.g., applying suitable multivariate analysis methods to the data in relation to the student achievement, studying the student-level absence/transience patterns and then relating the patterns to student achievement using a value-added specification from the predictors collected in the study).

This would suit students who have familiarity with SAS and R and have good marks in STATS 302 and/or STATS 330.

Contact: Mei Kuin Lai, mei.lai@auckland.ac.nz

Multilocus analysis of genome-wide association data

The Wellcome Trust Case Control Consortium genotyped cohorts of 2000 individuals with each of 7 common complex diseases (such as diabetes and hypertension) and 3000 control individuals on 500,000 genetic markers. They published analysis of these data using standard methods.

In this project, the student will take one of the disease cohorts and the controls and analyze the data with new methodologies (haplotypic analysis and/or identity by descent detection) with the goal of finding new associations between genetic variation and the disease, in order to improve understanding of genetic factors contributing to susceptibility to the disease. The student will perform a literature search in order to find published associations from other data sets on the same disease, and will compare these results from other studies with the results of his/her analysis on the Wellcome Trust data set.

Contact: Sharon Browning, browning@stat.auckland.ac.nz

Statistical analysis of sensory data relating to peoples ability to smell and characterise food/beverage flavours

This project is offered by Plant and Food Research.

You get your hands on real large scale data sets. We need your help to analyse and make sense of the information contained in the data.

This project involves full-time statistical analysis of previously collected data. It primarily involves the development of statistical protocols that can be repeated with similar types of data at a later stage. The developed routines must be easy to run in standard software, preferably SAS.

The successful candidate must be able to handle large scale data sets and be proficient in tasks such as descriptive statistics, and univariate analyses. Some experience in multivariate analyses, notably PCA/ Factor analyses, MANOVA, discriminant analysis and cluster analysis is required. There will also be issues to consider relating to variable selection and weighting of cases and variables. The successful applicant will also be expected to show initiative and suggest additional analyses that may uncover systematic patterns in the data.

The successful candidate will work in close collaboration with the scientists that collect the focal data, as well as with experienced biometricians.

Contact: Nihal de Silva, http://careers.plantandfood.com/jobseeker/, Job reference: 1186

Analysing large data-sets to compare gains in achievement across Schooling Improvement Initiatives

This project is offered by the Woolf Fisher Research Centre, The University of Auckland.

A National project was set up to conduct the first nation-wide evaluation of Schooling Improvement Initiatives. One key goal was to assess the achievement of students in these initiatives. The summer scholarship will involve examining the achievement patterns for reading and writing across 20 schooling improvement initiatives (up to 30 schools per cluster). In particular, this work strand will compare gains in achievement and achievement levels within and between Schooling Improvement Initiatives, and examine whether subgroups of interest (e.g., gender, ethnicity, students with differing achievement levels) gain and achieve at similar rates.

The student undertaking the scholarship will produce the summary statistics for the project based on methods taught in courses STATS 20X and STATS 210. The tools we will use are mainly Excel, R, and SAS.

Contact: Mei Lai, mei.lai@auckland.ac.nz

Analysis of a GC-mass spectroscopy dataset derived from the global metabolomic profiles recorded in various biofluids collected during severe surgical illness

This project is offered by the School of Biological Sciences, The University of Auckland.

Acute pancreatitis (AP) is a common inflammatory disease which presents itself in a wide range of severities from mild to very severe. While it can be managed by surgeons, it remains a challenging disease to treat and the causes for different severities are unknown. Metabolites, the chemical fingerprints left behind by specific cellular processes, may provide clues as to why a wide spectrum of severities is observed among sufferers. As part of an ongoing collaboration with the University’s Department of Surgery Pancreas Research Group, global metabolic profiles of control and pancreatitis conditioned biofluids are being collected (using gas chromatography coupled with mass spectrometry) to gather new mechanistic clues for understanding AP with a view to generating new targets for therapeutics. The aim of this project is to analyse this large dataset by using both univariate and multivariate statistical methods to identify key metabolomic changes in a target biofluid during the development of AP.

Key techniques/skills that the student will learn: Understanding of how GC-MS metabolomics data is generated; Application of univariate and multivariate statistical methods to GC-MS metabolomics data; Statistical analysis using SAS and the R software.

Prerequisites: Minimum Stage 2 statistics (although Stage 3 statistics is preferred). Experience with SAS and the R software. Basic knowledge of biochemistry or physiology would be an advantage but not absolute requirement. Enthusiasm is essential prerequisite!

Contact: Dr Kathy Ruggiero, k.ruggiero@auckland.ac.nz

How to make use of unknown peaks from GC-MS metabolomics data?

This project is offered by the School of Biological Sciences, The University of Auckland.

Metabolomics is the study of the chemical fingerprints left behind by specific cellular processes. Gas chromatography coupled with mass spectrometry (GC-MS) provides a powerful analytical technique for the identification and quantification of the metabolites, or compounds, within a cell, tissue and biofluid of an organism. During GC-MS, information is collected on each compound’s GC retention time and its specific fragmentation pattern, or mass spectrum. Around only 30% of compounds are identified by cross-referencing this information against a mass spectral library, severely limiting our ability to understand the biological functions of cellular processes. The aim of this project is, therefore, to explore methods which will enable us to separate ‘real’ mass spectral peaks from noise. These peaks will then be analysed using univariate and/or multivariate statistical techniques to explore how they change under different experimental conditions.

Key techniques/skills that the student will learn: Understanding of how GC-MS metabolomics data is generated; Application of statistics to GC-MS metabolomics data; Statistical analysis and programming using the R software.

Prerequisites: Minimum Stage 2 statistics (although Stage 3 statistics is preferred). Experience with the R software and some knowledge of biochemistry would be an advantage.

Contact: Dr Kathy Ruggiero, k.ruggiero@auckland.ac.nz

How muddy are Auckland’s reefs? Analysis of sediment trap data from the Hauraki Gulf

University of Auckland researchers have been monitoring subtidal reef communities and sedimentation levels on shallow reefs in the Hauraki Gulf for the Auckland Regional Council (ARC) since 1999. This provides a valuable long-term baseline to assess the effects of continued urbanization of coastal catchments on the adjacent reefs. Increased sedimentation from development is considered to be one of the major threats on coastal ecosystems. Sediment traps are used to estimate levels of sedimentation on the reef and relate to variation in the reef communities among sites over time. However, it is poorly understood how well sediment trap data actually relates to sedimentation on the reef versus other environmental drivers (e.g., rainfall, wind and waves). Gaining a better understanding of how sediment trap data relates to sedimentation and other environmental factors is necessary for management to evaluate the monitoring program and interpret changes in reef communities. This project will involve a spatial and temporal analysis of how sediment trap data relates to a suite of environmental factors including data on water quality (e.g., chlorophyll, turbidity, nutrients), rain fall and wave exposure. Analyses will also be carried out to investigate how these variables relate to variation in reef communities among sites in space and time. There may also be opportunities to work in the laboratory and in the field (potentially involving scuba diving) with other researchers. Contact: Nick Shears, nickshears@xtra.co.nz

The World Health Organisation situation updates

“The World Health Organisation (WHO) has (so far) published 59 'situtation updates' on the swine flu pandemic, each of which includes tables of counts giving the number of cases and the number of deaths in countries around the world.

The basic motivation for this project is to investigate the spread of swine flu around the world over time.

The project will have two main focuses:

  1. developing R code to automatically harvest forecasts from the WHO web site; and
  2. developing an effective visualisation of the spread of swine flu. In other words, it will have a focus on statistical computing and graphics.”

Permission is granted to use WHO data for research or education (http://www.who.int/about/copyright/en/).

The ideal student would have a familiarity with R and not be afraid of learning about new computer technologies. Students who have good marks in STATS 220 and/or STATS 380 would fit the bill.

Contact: Paul Murrell, paul@stat.auckland.ac.nz

Determining the foraging cycle of Scyphax ornatus

Scyphax ornatus is an isopod that lives above the high tide mark on beaches. It forages according to two cycles: the daily light-dark cycle, and the tidal cycle. Specifically, Scyphax will only forage when it is dark and when the tide is going out. Biologists are interested in whether the foraging cycle of the Scyphax is pre-programmed, or whether the individuals respond to stimuli on a daily basis. They have collected data for individuals that are experimentally manipulated by modifying the hours of daylight to which the Scyphax is exposed. The aim of this project will be to estimate the lengths of the resulting foraging cycles of individuals under several experimental conditions, and the associated uncertainty.

This will enable us to test hypotheses about whether the animals are responding to the experimental stimuli or operating according to a pre-programmed clock. We will use a combination of time series and state space models.

Contact: Rachel Fewster, fewster@stat.auckland.ac.nz

Applications of DNA finger printing in forensic science and parentage testing

The University of Auckland has a spin-off company, DNA Diagnostics which carries out paternity and family testing. This is based on DNA samples of blood or body cells from the set of individuals whose relatedness is being tested. Most of our genetic information comes with two copies; one from Mum, one from Dad. A male has a single Y chromosome which is inherited directly from his father. The statistics of Y chromosome data is very different from the more commonly used autosomal data.

Using Y filer technology, http://www.appliedbiosystems.com/yfilerdatabase/ DNA Diagnostics have assembled a New Zealand data set of Y chromosome data. We need somebody to analyse this data for the first time. New Zealand was the last land mass on Earth to be settled by humans and until the arrival of large numbers of Europeans about 150 years ago the New Zealand population had been separated from other populations for a large number of generations. As a consequence the genetic structure of the New Zealand population is both complex and very interesting. The only other substantial populations of comparable interest are found in South America with the admixture of the original inhabitants and substantial numbers of immigrants from Africa and Europe.

In this project we will produce the summary statistics for DNA Diagnostics and then investigate what this New Zealand data tells us about our ancestry. The tools we will use are Excel and R, and the ability to access web databases. The statistical methods are based on material in courses STAT 20x and STAT 210. This project requires a student with an interest in genetics.

Contact: Chris Triggs, triggs@stat.auckland.ac.nz

Revising the Package HyperbolicDist

The current version of the package HyperbolicDist grew out of software developed to fit the hyperbolic distribution to some data from the Department of Geography at Auckland University. Since that time I have gained much more experience in the development of software for distributions and with Dr Diethelm Wuertz of ETH Zurich have developed a standard approach to the design of distribution software. Unfortunately, HyperbolicDist does not conform to this design and I would like to modify it to conform to the design principles Dr Wuertz and I wish to follow for software which is part of the Rmetrics group of packages.

The changes which need to be made are fairly simple, principally the modification of argument lists for functions. Not much needs to be altered in the way of functionality. However despite the modifications being reasonably simple, there is quite a lot of software to be modified. Modifications to the documentation are required to go along with the software modifications.

The approach to be taken is fairly simple. Work through the functions in HyperbolicDist modifying them and the related documentation.

The student undertaking the project will gain a good grounding in R programming and development of R packages, including the use of editors and version control software. They will also develop extensive knowledge of the generalized hyperbolic distribution.

The student undertaking the project should have some knowledge of R and have an interest in programming.

Contact: David Scott, d.scott@auckland.ac.nz

Software for the Normal Laplace Distribution

The normal Laplace distribution is a flexible distribution with four parameters which has been used in a number of contexts by Professor Bill Reed from the University of Victoria, Canada. Currently there is no readily-available implementation of the distribution in software. The development of some software for this distribution will be of use for researchers and others interested in the distribution.

Implementation of the generalized normal Laplace distribution is not straightforward since no closed form exists for the density of the distribution. Numerical inversion of the characteristic function will be necessary for this case. Otherwise, development of functions for the normal Laplace and generalized normal Laplace distribution should follow in a similar fashion to other distributions. I have experience in this area due to my work on the hyperbolic distribution and development of my R packages HyperbolicDist and VarianceGamma.

The distribution is of interest for finance and it can be included in the Rmetrics software which is on R-Forge and CRAN.

The first step will be to implement routines for the normal Laplace distribution (not generalized normal Laplace) which will be straightforward but will introduce the student working on the project to the distribution and the technical requirements of this sort of work. After this we will work on the generalized normal Laplace. I do not expect that in this time frame we will be able to achieve a great amount but a start can be made and the software can be extended in future projects.

Contact: David Scott, d.scott@auckland.ac.nz

Finding times of significant change in animal abundance trends

Many biologists are interested in long-term trends in population abundance, especially for endangered species or ‘indicator’ species such as birds or fish. This project will look at ways of quantifying when the abundance trend has a significant change of direction, which might be due to a sudden environmental change or disease, and can spring a management alert that some intervention is needed.

Contact: Rachel Fewster, fewster@stat.auckland.ac.nz

The model discrimination properties of small 2-level screening designs

Screening experiments are used to identify active factors from a candidate set. Often small 2-level designs are used for this purpose. This project would explore how effective specific types of 2-level designs are at screening for active factors. The project would involve writing computer programs (using R) to evaluate these designs.

Contact: Arden Miller, miller@stat.auckland.ac.nz

A study of human diversity through forensic Y-STR data

Project will involve a lot of data extraction and processing as well as statistical analysis. This project would suit someone who has done one of STATS 220, STATS 779, STATS 380 or STATS 782 and wants to learn something about SQL, R, Perl, cluster analysis and forensic genetics.

Contact: James Curran, curran@stat.auckland.ac.nz

Popgen II

In 1995/1996 I wrote a programme which demonstrated graphically various population genetic phenomena such as inbreeding, drift, migration and mutation. The programme, called Popgen, ran under Windows 3.11 and Windows 95, and was written in C++.

I would like to revive Popgen and make it publically available for download.

In this project, the student would re-write and update Popgen in a modern (portable) language such as Java or C#. The skills needed are moderate to strong programming with at least the rudiments of computer graphics (i.e. you understand the concept of drawing on the screen, changing colours etc). From this project you will learn some population genetics, random number generation, and some ideas about writing efficient fast programs.

Contact: James Curran, curran@stat.auckland.ac.nz

Pricing options using GARCH models

Keywords: time-series, financial mathematics, R software.

The “traditional way” of pricing options is through the Black-Scholes formula. This formula is based on strong assumptions that are not met in practice. An alternative way of pricing options consists of using GARCH time-series models. The objective of the project is to:

  1. learn about GARCH time-series models
  2. use these models in R
  3. understand how they can be used to price options

The desired outcome of the project is a detailed report containing both a description of the theory and R code to carry out the computations.

Contact: Ivan Kojadinovic, ivan@stat.auckland.ac.nz

Associations of microalbuminuria in the general population

We have identified an unexplained halving of the prevalence of microalbuminuria in Maori and Pacific people in the general population between the late 1980’s and early 2000’s in Auckland. The aims of this research are to:

  1. investigate factors that may have contributed to the decline in prevalence of microalbuminuria in Maori, and Pacific adults, but not Europeans.
  2. investigate demographic, clinical and laboratory associations related to albuminuria.
  3. examine factors associated with increase/decrease in urinary albumin concentrations in adults who had urinary albumin measurements in both 1988-1990 and 1995-1997.

Methods: Data previously collected from 4,049 adults aged 35 to 74 years between 2002 and 2003 will be used to investigate whether factors previously identified as being associated with microalbuminuria (see section 4) have contributed to the unexpected decline in prevalence in microalbuminuria that has been observed between the 1988 Workforce Diabetes Survey (WDS) of 5,670 working adults aged 40 years and over (which would have included a healthy worker effect) and the 2002-2003 Auckland Diabetes, Heart and Health Survey (DHAH). The demographic, clinical and laboratory associations with albuminuria will be examined using the DHAH data. In addition, data from the 1995-1997 Workforce Diabetes Follow-up Survey (WDF) of 4,053 participants of the original WDS will be used to investigate factors associated with change in urinary albumin concentrations within an individual.

Significance: Maori and Pacific people have a 2 to 10 fold higher incidence of end-stage kidney failure. Dialysis treatment is costs up to $65,000/year in New Zealand depending on the modality of therapy utilized. Microalbuminuria is a modifiable risk factor associated with future development of kidney failure. A better understanding of the demographics, clinical and laboratory associations with albuminuria and the factors associated with change in urinary albumin concentrations within an individual may lead to interventions that will result in a reduction in the future requirement for dialysis. It is also important to establish whether or not there is a true decline in the prevalence of increased urinary albumin concentrations in these people, and to look at the associations that may be contributing to this.

Contact: Patricia Metcalf, metcalf@stat.auckland.ac.nz

Parameter estimation for selected univariate distributions

This project is to

  • Implement functions to compute the maximum likelihood estimates of a dozen selected univariate distributions including the Skellam, Weibull (and variants), poisson-lognormal, glog-normal, exponential-Poisson, truncated negative binomial;
  • Investigate the use of model-selection methods such as AIC and BIC;
  • Apply the models to specific data sets such as the 2008 World Fly Fishing Championships.

My existing theoretical framework enables the parameters to be naturally modelled as functions of explanatory variables. Time allowing, we will also implement a multivariate technique called multidimensional scaling (MDS). The project would be suitable for a student with good grades in statistical theory and R programming.

Contact: Thomas Yee, t.yee@auckland.ac.nz

Fisher scoring and mixture models

This project is to search for and document expected information matrices in the statistical literature, and then implement these within my VGAM package for R. For each of these, the derivation of initial values and random variate functions (dpqr) where possible are needed. The consequence of this project is that many multivariate distributions will be implemented within the useful framework provided by the supervisor. We will also implement mixture models such as two negative binomial distributions. The project would be suitable for a student with a good grasp of statistical theory (including logistic regression) and R programming.

Contact: Thomas Yee, t.yee@auckland.ac.nz

Ordinal ordination

Ordination is a multivariate technique for modelling multispecies-environmental data simultaneously. Currently there are no methods to handle ordinal species data, which is a very common form.

This project is to implement a new method for ordinal ordination recently developed by the supervisor. It will firstly involve converting general Fortran/Ratfor code to C. Applications to several animal and vegetation data sets will be made. Time permitting, the switch from LINPACK to LAPACK will be investigated and implemented. The project would be suitable for a student with very good programming skills (R and C, and ideally Fortran too but not necessary) and experience with logistic regression.

Contact: Thomas Yee, t.yee@auckland.ac.nz

New Zealand MetService

“The New Zealand MetService posts weather forecasts on the web, e.g., http://metservice.co.nz/public/localWeather/auckland.html, for many locations in New Zealand, for up to five days in advance.

The basic motivation for this project is to investigate the accuracy of these weather forecasts, e.g., how much does the forecasted temperature five days out differ from the observed temperature on the day.

The project will have two main focuses:

  1. developing R code to automatically harvest forecasts from the MetService web site; and
  2. developing an effective visualisation of the forecast accuracy. In other words, it will have a focus on statistical computing and graphics.”

Permission has been obtained from MetService to harvest data from their web site for a student project.

The ideal student would have a familiarity with R and not be afraid of learning about new computer technologies. Students who have good marks in STATS 220 and/or STATS 380 would fit the bill.

Contact: Paul Murrell, paul@stat.auckland.ac.nz

Chris Wild

Chris Wild has a number of projects for students with good STATS 310 grades and reasonably good R programming skills.

These projects will contribute to the real biostatistical research projects being undertaken by Profs Chris Wild, Alastair Scott and Alan Lee, or educational projects with Dr Maxine Pfannkuch.

They will include

  1. Investigating the performance of a variety of estimating equation methods for sample survey data.
  2. Investigating the performance of multiple imputation and other missing data techniques for regression data.
  3. Developing in R on-screen animations for statistical concepts.

Contact: Chris Wild, c.wild@auckland.ac.nz

Deciphering the patterns of variations of evolutionary rates along yeast genomes

The rates of evolution, i.e., the pace at which molecular sequences accumulate substitutions, vary extensively along genes and chromosomes. For the sake of simplicity, modern phylogenetic methods assume that rates can vary freely along genomes (the substitution rate of a given gene is considered as an independently and identically distributed random variable). However, experimental evidence suggests that genes that are involved in similar functions tend to cluster together on the genome. Such clustering is expected to impact on the autocorrelation of substitution rates along genomes, because genes involved in a common function need to evolve in a concerted fashion. The goal of this project is to investigate the patterns of autocorrelation of substitution rates along genomes. The statistical tools that will be developed during the first part of the project will then be applied to the analysis of a large yeast data set for which information on gene and chromosome positions available.

This project will be supervised by Stéphane Guindon (Department of Statistics). The student will be hosted by the Computational Evolution group and will therefore interact with graduate students in statistics, mathematics and computer sciences. Also, the student is expected to attend the seminars organised by the Bioinformatics Institute (School of Biological Sciences). A background in statistics, biology or bioinformatics is required. Programming skills are also expected.

Contact: Stéphane Guindon, guindon@stat.auckland.ac.nz

A graphical interface to the program PhyML

PhyML is a widely used software program that estimates phylogenetic trees from alignments of biological sequences. It is written in C ANSI and relies on a standard unix-type command line user interface. The goal of this project is to provide a graphical user interface to PhyML that could be used on most operating systems.

This project therefore requires basic programming skills but the student will also be given the opportunity to learn about maximum likelihood estimation of phylogenetic trees and molecular evolution.

Contact: Stéphane Guindon, guindon@stat.auckland.ac.nz

Web based simulation of random walks in random environments

I have some existing code, written in R, for generating random environments and also random walks in those random environments. I would like people to be able to do the same kinds of simulations interactively via my webpage. For an example for a different problem see http://www.stat.auckland.ac.nz/~mholmes/javastuff/urn_model/urn_model.html

This project therefore involves writing simulation code in an appropriate language (e.g. Java) as well as creating an appropriate web interface from which to run the simulations.

Contact: Mark Holmes, mholmes@stat.auckland.ac.nz

Mutual connectivity in a random graph model.

This project would suit a very able math student.

Contact: Mark Holmes, mholmes@stat.auckland.ac.nz

Statistics Education Research - Assessing Student Performance

In this summer school project the student would be introduced to some assessment aspects of statistics education research based on two current research projects of the supervisors, Maxine Pfannkuch and Stephanie Budgett.

In the research project on developing students’ statistical literacy the summer school student would

  • Search out existing tools, instruments, questions or tasks for assessing statistical reasoning
  • Determine which questions may be suitable for assessing statistical literacy as defined in the project
  • Develop some new questions for assessing statistical literacy
  • Pilot some questions

In the research project on building students’ informal statistical inferential reasoning the summer school student would

  • Be an independent marker for student assessment responses using given criteria
  • Statistically analyse pre- and post-test assessment responses of students

Contact: Stephanie Budgett, s.budgett@auckland.ac.nz

Analysis of large data sets from biomedical investigations

Research programmes in biomedical science, such as Nutrigenomics New Zealand, can generate very large data sets using new technologies.

In this project we will look at data on mice generated by nuclear magnetic resonance of blood plasma. We will compare a group of normal mice with mice with a particular genetic defect. In humans a similar genetic condition gives rise to Crohn’s Disease, an inflammatory disease of the digestive system. The statistical problem is how to prepare the raw data for statistical analysis. Even the data from the ‘normal’ mice is highly variable. Several methods to standardise the data have been suggested by biologists and chemists. In this project we will compare these methods and see which method leads to the most sensitive and precise comparisons of the normal and diseased mice.

The tools we will use are Excel and R, and the statistical methods are based on material in courses STATS 302 and STATS 330.

Contact: Chris Triggs, triggs@stat.auckland.ac.nz

Presenting DNA results in parentage testing using ethnicity information

The University of Auckland has a spin-off company, DNA Diagnostics, which carries out paternity and family testing. This is based on DNA samples of blood or body cells from the set of individuals whose relatedness is being tested. Most of our genetic information comes with two copies; one from Mum, one from Dad. In paternity testing we examine the DNA profiles of the mother, the child, and the man who is claimed to be the father of the child.

DNA Diagnostics has databases of New Zealand Europeans, Maori (New Zealand and Cook Islands), Pacific Islands (Samoan and Tongan), and New Zealand Asian (predominantly Chinese). From these databases we can estimate the frequency of a particular gene. These can vary markedly between ethnic groups.

In a particular paternity case the mother and putative father can declare their ethnicity. Should we take account of this or just the database which gives the most conservative result? In this project we will carry out a retrospective study for DNA Diagnostics. We will compute paternity indices for a sample of their old cases using each of their databases. This will allow us to estimate the size of the change if the ‘wrong’ ethnicity database is used. We will also be able to assess the robustness of the conclusions. This is of importance to DNA Diagnostics because they are frequently called on to defend their conclusions in the courts.

The tools we will use are Excel and R, and the ability to access web databases. The statistical methods are based on material in courses STATS 20x and STATS 210. This project requires a student with an interest in genetics.

Contact: Chris Triggs, triggs@stat.auckland.ac.nz

Modifying & archiving survey data sets for teaching purposes

This project is offered by The Department of Statistics and Centre of Methods and Policy Application in the Social Sciences (COMPASS), Faculty of Arts, The University of Auckland.

The New Zealand Social Science Data Service (NZSSDS) provides access to a number of survey data sets through the use of Nesstar software. One-way frequency tables and descriptive statistics are freely viewable while cross tabulations, basic regression analyses and data downloads are available through registration and the meeting of other criteria, terms & conditions.

The use of NZSSDS in teaching has been piloted, using a subset of data from the New Zealand Quality of Healthcare Survey. Students signed our standard ‘user undertaking’ agreement forms and accessed the data in a lab environment. However, ideal would be to completely avoid any issues of security and have data sets that were freely available for students to download and work with on their own – this would also support the use and learning of standard statistical packages that is going on anyway, at least in the statistics department.

To this end, data sets would likely need to be subsetted, and then ‘perturbed’ in other ways so as to make them ‘safe’ for release while still giving believable results.

In the first instance we would be targetting four surveys that have been discussed extensively in STATS 740 over the years:

  • Adult Oral Health in New Zealand, 1976
  • New Zealand Partner Relations Survey, 1991
  • New Zealand Quality of Healthcare Survey, 2000
  • National Primary Medical Care Survey, 2001/2.

The first three of these have already been archived in some form on NZSSDS, and the last would be expected to be there before the commencement of a studentship. This will provide the basis for teaching data sets to be archived – the metadata will be marked up and data viewable within the Nesstar Publisher software, in the use of which skills will need to be developed. It would also be most useful for this to be combined with writeup of documentation for its use for marking up and archiving data.

Contact: Peter Davis, pb.davis@auckland.ac.nz

Underpinning transparency in research: establishing a template for a research repository with real-world examples

This project is offered by The Department of Statistics and Centre of Methods and Policy Application in the Social Sciences (COMPASS), Faculty of Arts, The University of Auckland.

A new collaborative approach in research is to make traditional academic papers obsolete by creating research objects that contain all the material needed to understand a piece of research, including underlying data, metadata and research outputs. In practice, this combines a data repository with a research method making published research more transparent, reusable and reproducible.

The NZ Social Science Data Service (NZSSDS) is an ongoing initiative being undertaken by COMPASS. The vision is that data sets, all well documented with metadata for users, will be made publicly available for examination, basic analysis online and authorised download (via Nesstar software) to the wider research community. Key published papers will be documented along with computer code for data manipulation and analysis. There are already over 20 data sets available on the NZSSDS website. The scholarship will contribute to this system, so that the service can easily be used by people coming to the website.

  • Help package existing data sets and put them up on the NZSSDS website. This will require documenting each data set and bringing it up to standard.
  • Draw up guidelines as to how this job could be done so that in future someone could easily follow such instructions.
  • Check other data archives (e.g. ASSDA) to get a sense of what we are wanting to achieve so as to set the context and the standard.
  • Investigate other initiatives and how they relate to our work, and report that knowledge back to be incorporated in NZSSDS protocols.
  • Read background material, compile and collate analytical work (including SAS code) related to specific journal article(s) published by the Centre. Document and package in preparation for putting up on NZSSDS website.

The student will have good computing and statistical skills.

Contact: Peter Davis, pb.davis@auckland.ac.nz

Visualization of a statistical model of genetic data

In this project, the student will develop interactive software to display and explore a graphical model of genetic data. The student will learn about hidden Markov models, computer graphics, and efficient manipulation of large-scale data sets.

Requirement: Experience programming in Java (experience in computer graphics is not required).

Contact: Brian Browning, bbrowning@stat.auckland.ac.nz

How to Apply

These Summer Scholarships are available to suitable undergraduate students.

Aim: To give students research experience and acquaint them with the research activities of the Department and with potential graduate supervisors.

Emolument: $5,000 non-taxable.

Duration: Full-time employment for eight to twelve weeks (400 hours).

Applications close Friday 28th August 2009.

For general information regarding scholarships, please see the Scholarships and Research Grants webpage.

 
scholarships-2010-complete.txt · Last modified: 2009/09/04 15:49 by webmaster
 

Suggestion Box

You can help improve this website. If you've found something wrong with this page, whether broken, incorrect, or missing, then let us know so we can improve it.

If you need course advice or confidentiality, then please contact a Postgraduate Advisor.

Please tick as many options as apply:

Comments or any additional details: