Generally-Altered, -Inflated, -Truncated and -Deflated Regression,
With Application to Heaped and Seeped Counts
Zero-altered, -inflated and -truncated count regression are now
well established, especially for Poisson and binomial parents.
Recently these methods were extended to Generally-Altered,
-Inflated, -Truncated and -Deflated Regression (GAITD regression)
and implemented in the VGAM R package for three 1-parameter
families and one 2-parameter family. In GAITD regression the
four operators apply to general sets rather than {0}. Also,
the four operators may appear simultaneously in a single model.
Elements of the four mutually disjoint sets of support values
are called 'special'. Parametric and nonparametric variants are
proposed: the latter based on the multinomial logit model (MLM),
and the former on a finite mixture of the parent distribution
on nested or partitioned support. The resultant "GAITD Mix-MLM
combo" model has seven special value types. GAITD regression offers
much potential for the analysis of heaped (digit preference due
to self-reporting) and seeped data.
This project is consolidate the above and to investigate some
extensions. Some specific examples include:
1. Find new data sets from a wide range of fields exhibiting heaping
and seeping. Perform some analyses.
2. Find any bugs in the software. Suggest any improvements (such as
initial values) and additions.
3. Marginal effects: extend margeff() to compute the first
derivatives of the MLM terms.
4. Find data sets that are underdispersed with respect to the
Poisson. Apply the GT-Expansion method of analysis.
Ideally, a student working on this project would have strong
computational skills and a solid understanding of generalized linear
models (GLMs; e.g., STATS 330 & 310).
Contact: Thomas Yee (t.yee@auckland.ac.nz)