Wednesday, May 7, 2008

Application of Transformations in Parametric Inference

by Naomi Brownstein and Marianna Pensky University of Central Florida Journal of Statistics Education Volume 16, Number 1 (2008), www.amstat.org/publications/jse/v16n1/brownstein.html
Copyright © 2008 by Naomi Brownstein and Marianna Pensky all rights reserved.


Abstract
The objective of the present paper is to provide a simple approach to statistical inference using the method of transformations of variables. We demonstrate performance of this powerful tool on examples of constructions of various estimation procedures, hypothesis testing, Bayes analysis and statistical inference for the stress-strength systems. We argue that the tool of transformations not only should be used more widely in statistical research but should become a routine part of calculus-based courses of statistics. Finally, we provide sample problems for such a course as well as possible undergraduate reserach projects which utilize transformations of variables.

Keywords: Transformations of variables; Estimation; Testing; Stress-strength model; Bayesian inference.

For detail, download here (right click)

A Parametric Model for Ordinal Response Data, with Application to Estimating Age-specific Reference Intervals

by Patrick Royston (1)
(1) MRC Clinical Trials Unit, 222 Euston Road, NW1 2DA, London, UK

Abstract
A model for ordinal response data based on an underlying (but unobserved) Normal distribution is proposed. The model is particularly useful for highly discrete data with a large proportion of zero values. It is applied to the estimation of age-specific reference intervals in two substantive example datasets.

For detail, download here (right click)

A Meta-analysis of Case-control and Cohort Studies with interval-censored Exposure Data: Application to Chorionic Villus Sampling

Babette A. Brumback (1), Richard J. Cook (2) and Louise M. Ryan (3)
(1) Department of Biostatistics, University of Washington, Seattle, WA 98195-7232, USA brumback@biostat.washington.edu
(2) Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
(3) Department of Biostatistics, Harvard School of Public Health and the Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA
Abstract
Chorionic villus sampling (CVS) is a valued method of prenatal diagnosis that is often preferred over amniocentesis because it can be performed earlier, but which has also raised concern over a possible association with increased risk of terminal transverse limb deficiency (TTLD). We present and apply a meta-analytic method for estimating a combined dose–response effect from a series of case-control and cohort studies in which the exposure variable is interval-censored. Assuming coarsening at random for the interval-censoring, and calling upon the familiar result of Cornfield to pool case-control and cohort information on the association between a rare binary outcome and a multilevel exposure variable, we form a likelihood-based model to assess the effect of gestational age at the time of CVS on the presence or absence of a rare birth defect. Effect estimates are computed with a variant of the EM algorithm termed the method of weights, which enables the use of standard weighted regression software. Our findings suggest that CVS exposure at early gestational age leads to an increased risk of TTLD.

Keywords: Coarsening at random; Dose–response; Method of weights; Selection bias

For detail, download here (right click)

Generalized Linear Mixture Models for Handling Nonignorable Dropouts in Longitudinal Studies

by Garrett M. Fitzmaurice (1) and Nan M. Laird (1)
(1) Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 02115, USA fitzmaur@hsph.harvard.edu

Abstract
This paper presents a method for analysing longitudinal data when there are dropouts. In particular, we develop a simple method based on generalized linear mixture models for handling nonignorable dropouts for a variety of discrete and continuous outcomes. Statistical inference for the model parameters is based on a generalized estimating equations (GEE) approach (Liang and Zeger, 1986). The proposed method yields estimates of the model parameters that are valid when nonresponse is nonignorable under a variety of assumptions concerning the dropout process. Furthermore, the proposed method can be implemented using widely available statistical software. Finally, an example using data from a clinical trial of contracepting women is used to illustrate the methodology.

Keywords: Discrete data; Generalized estimating equations; Missing data; Nonresponse; Repeated measures

For detail, download here (right click)

Combining Diagnostic Test Results to Increase Accuracy

by Margaret Sullivan Pepe (1) and Mary Lou Thompson (2)
(1) Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, PO Box 19024, Seattle, WA 98109-1024, USA
(2) Department of Biostatistics, University of Washington Seattle, WA 98195, USA

Abstract
When multiple diagnostic tests are performed on an individual or multiple disease markers are available it may be possible to combine the information to diagnose disease. We consider how to choose linear combinations of markers in order to optimize diagnostic accuracy. The accuracy index to be maximized is the area or partial area under the receiver operating characteristic (ROC) curve. We propose a distribution-free rank-based approach for optimizing the area under the ROC curve and compare it with logistic regression and with classic linear discriminant analysis (LDA). It has been shown that the latter method optimizes the area under the ROC curve when test results have a multivariate normal distribution for diseased and non-diseased populations. Simulation studies suggest that the proposed non-parametric method is efficient when data are multivariate normal.
The distribution-free method is generalized to a smooth distribution-free approach to: (i) accommodate some reasonable smoothness assumptions; (ii) incorporate covariate effects; and (iii) yield optimized partial areas under the ROC curve. This latter feature is particularly important since it allows one to focus on a region of the ROC curve which is of most relevance to clinical practice. Neither logistic regression nor LDA necessarily maximize partial areas. The approaches are illustrated on two cancer datasets, one involving serum antigen markers for pancreatic cancer and the other involving longitudinal prostate specific antigen data.

Keywords: Biomarkers; Classification; Disease screening; ROC curve; Sensitivity; Specificity

For detail, download here (right click)

Assessing the Effect of an Influenza Vaccine in an Encouragement Design

by Keisuke Hirano (1), Guido W. Imbens (1), Donald B. Rubin (2) and Xiao-Hua Zhou (3)
(1) Department of Economics, University of California, Los Angeles, CA 90095, USA
(2) Department of Statistics, Science Center 709, Harvard University, Cambridge, MA 02138, USA
(3) Division of Biostatistics, Department of Medicine, University School of Medicine and Regenstrief Institute for Health Care, Indiana, Indianapolis, IN 46202, USA

Abstract
Many randomized experiments suffer from noncompliance. Some of these experiments, so-called encouragement designs, can be expected to have especially large amounts of noncompliance, because encouragement to take the treatment rather than the treatment itself is randomly assigned to individuals. We present an extended framework for the analysis of data from such experiments with a binary treatment, binary encouragement, and background covariates. There are two key features of this framework: we use an instrumental variables approach to link intention-to-treat effects to treatment effects and we adopt a Bayesian approach for inference and sensitivity analysis. This framework is illustrated in a medical example concerning the effects of inoculation for influenza. In this example, the analyses suggest that positive estimates of the intention-to-treat effect need not be due to the treatment itself, but rather to the encouragement to take the treatment: the intention-to-treat effect for the subpopulation who would be inoculated whether or not encouraged is estimated to be approximately as large as the intention-to-treat effect for the subpopulation whose inoculation status would agree with their (randomized) encouragement status whether or not encouraged. Thus, our methods suggest that global intention-to-treat estimates, although often regarded as conservative, can be too coarse and even misleading when taken as summarizing the evidence in the data for the effects of treatments.

Keywords: Bayesian analysis; Causal inference; Instrumental variables; Noncompliance; Rubin Causal Model; Potential outcomes; Treatment effects; Sensitivity analysis

For detail, download here (right click)

The Validation of Surrogate Endpoints in Meta-Analyses of Randomized Experiments

M. Buyse (1), G. Molenberghs (2), T. Burzykowski (2), D. Renard (2) and H. Geys (2)
(1) International Institute for Drug Development, 430 avenue Louise B14, B1050 Brussels, Belgium mbuyse@id2.be
(2) Center for Statistics, Limburgs Universitair Centrum, B3590 Diepenbeek, Belgium


Abstract
The validation of surrogate endpoints has been studied by Prentice (1989). He presented a definition as well as a set of criteria, which are equivalent only if the surrogate and true endpoints are binary. Freedman et al. (1992) supplemented these criteria with the so-called ‘proportion explained’. Buyse and Molenberghs (1998) proposed replacing the proportion explained by two quantities: (1) the relative effect linking the effect of treatment on both endpoints and (2) an individual-level measure of agreement between both endpoints. The latter quantity carries over when data are available on several randomized trials, while the former can be extended to be a trial-level measure of agreement between the effects of treatment of both endpoints. This approach suggests a new method for the validation of surrogate endpoints, and naturally leads to the prediction of the effect of treatment upon the true endpoint, given its observed effect upon the surrogate endpoint. These ideas are illustrated using data from two sets of multicenter trials: one comparing chemotherapy regimens for patients with advanced ovarian cancer, the other comparing interferon- with placebo for patients with age-related macular degeneration.

Keywords: Ovarian cancer; Macular degeneration; Random-effects model; Surrogate endpoint; Two-stage model; Validation

For detail, download here (right click)

Linear Regression Analysis of Censored Medical Costs

by D.Y. Lin
Department of Biostatistics, Box 357232, University of Washington, Seattle, WA 98195, USA danyu@biostat.washington.edu



Abstract
This paper deals with the problem of linear regression for medical cost data when some study subjects are not followed for the full duration of interest so that their total costs are unknown. Standard survival analysis techniques are ill-suited to this type of censoring. The familiar normal equations for the least-squares estimation are modified in several ways to properly account for the incompleteness of the data. The resulting estimators are shown to be consistent and asymptotically normal with easily estimated variance–covariance matrices. The proposed methodology can be used when the cost database contains only the total costs for those with complete follow-up. More efficient estimators are available when the cost data are recorded in multiple time intervals. A study on the medical cost for ovarian cancer is presented.


Keywords: Censoring; Cost analysis; Economic evaluation; Health economics; Incomplete data; Medical care; Survival analysis



For detail, download here (right click)

Should we take Measurements at an Intermediate Design Point?

by Andrew Gelman
Department of Statistics, Columbia University, New York, NY, 10027, USA gelman@stat.columbia.edu

Abstract
It is well known that, for estimating a linear treatment effect with constant variance, the optimal design divides the units equally between the two extremes of the design space. If the dose–response relation may be nonlinear, however, intermediate measurements may be useful in order to estimate the effects of partial treatments. We consider the decision of whether to gather data at an intermediate design point: do the gains from learning about nonlinearity outweigh the loss in efficiency in estimating the linear effect? Under reasonable assumptions about nonlinearity, we find that, unless sample size is very large, the design with no interior measurements is best, because with moderate total sample sizes, any nonlinearity in the dose–response will be difficult to detect. We discuss in the context of a simplified version of the problem that motivated this work—a study of pest-control treatments intended to reduce asthma symptoms in children.

Keywords: Asthma; Bayesian inference; Dose–response experimental design; Pest control; Statistical significance

For detail, download here (right click)

A score test for the linkage analysis of qualitative and quantitative traits based on identity

by descent data from sib-pairs Sandrine Dudoit1 and Terence P. Speed1
1 Department of Statistics, University of California, Berkeley, 367 Evans Hall, #3860, Berkeley, CA 94720-3860, USA sandrine@stat.berkeley.edu

Abstract
We propose a general likelihood-based approach to the linkage analysis of qualitative and quantitative traits using identity by descent (IBD) data from sib-pairs. We consider the likelihood of IBD data conditional on phenotypes and test the null hypothesis of no linkage between a marker locus and a gene influencing the trait using a score test in the recombination fraction between the two loci. This method unifies the linkage analysis of qualitative and quantitative traits into a single inferential framework, yielding a simple and intuitive test statistic. Conditioning on phenotypes avoids unrealistic random sampling assumptions and allows sib-pairs from differing ascertainment mechanisms to be incorporated into a single likelihood analysis. In particular, it allows the selection of sib-pairs based on their trait values and the analysis of only those pairs having the most informative phenotypes. The score test is based on the full likelihood, i.e. the likelihood based on all phenotype data rather than just differences of sib-pair phenotypes. Considering only phenotype differences, as in Haseman and Elston (1972) and Kruglyak and Lander (1995), may result in important losses in power. The linkage score test is derived under general genetic models for the trait, which may include multiple unlinked genes. Population genetic assumptions, such as random mating or linkage equilibrium at the trait loci, are not required. This score test is thus particularly promising for the analysis of complex human traits. The score statistic readily extends to accommodate incomplete IBD data at the test locus, by using the hidden Markov model implemented in the programs MAPMAKER/SIBS and GENEHUNTER (Kruglyak and Lander, 1995; Kruglyak et al., 1996). Preliminary simulation studies indicate that the linkage score test generally matches or outperforms the Haseman–Elston test, the largest gains in power being for selected samples of sib-pairs with extreme phenotypes.

Keywords: Linkage analysis; Complex traits; Qualitative and quantitative phenotypes; Identity by descent (IBD); Score test; Incomplete IBD data; Hidden Markov models; Haseman–Elston test

For detail, download here (right click)

Health Characteristics of the Asian Adult Population: United States, 2004–2006

by Patricia M. Barnes, M.A.; Patricia F. Adams; and Eve Powell-Griner, Ph.D., Division of Health Interview Statistics


Abstract
Objective—This report compares national estimates for selected health status indicators, health behaviors, health care utilization, health conditions, immunizations, and human immunodeficiency virus (HIV) testing status among selected non-Hispanic Asian adult subgroups. Comparison estimates for the non-Hispanic white, non-Hispanic black, non-Hispanic American Indian or Alaska Native (AIAN), and Hispanic adult populations are also presented.
Methods—The estimates in this report were derived from the Family Core and the Sample Adult Core components of the 2004–2006 National Health Interview Surveys (NHIS), conducted by the Centers for Disease Control and Prevention’s National Center for Health Statistics (NCHS). Estimates were generated and comparisons conducted using the SUDAAN statistical package to account for the complex sample design.
Results—In general, non-Hispanic Asian adults were least likely to be current smokers, be obese, have hypertension, delay or not receive medical care because of cost, be tested for HIV, or be in fair or poor health compared with non-Hispanic white, non-Hispanic black, non-Hispanic AIAN, or Hispanic adults.
Across non-Hispanic Asian subgroups, Vietnamese adults were least likely to have a bachelor’s degree or higher and most likely to be poor, be in fair or poor health, and abstain from alcohol use. Korean adults were most likely to be uninsured, be current smokers, and be without a usual place for health care. Japanese adults were most likely to be current moderate or heavier drinkers, and Filipino adults were most likely to be obese.

Key Words: Asian c health behaviors c health care utilization c conditions c mental health status c health status c immunizations c HIV test c National Health Interview
Survey

For detail, download here (right click)