关键词:
asymptotic variance
consistency
cutoff point
training and holdout samples
ERROR RATE
CONSISTENCY
VALIDATION
NORMALITY
EXISTENCE
SELECTION
MODELS
摘要:
Logistic regression is a tool used by researchers for constructing models that classify observations into two categories/groups. It relates a binary dependent variable to a set of covariates. Splitting the sample into two sub-samples, namely, the training and the holdout samples, allows the researchers to assess the predictive ability of their models. The training sample is used to estimate the vector of coefficients of the logistic model. Using this estimated vector of coefficients, the proportion of correct classifications of the holdout sample is reported in studies employing the splitting of the sample. In this paper, under the assumption that the ratio of the training sample size to the holdout sample size is approximately fixed, we prove that the proportion of correct classifications is consistent and asymptotically normal. We also provide an estimate for the asymptotic variance of this proportion, and we prove that this variance does not depend on the cutoff point used for classification. If the sample size is large, the asymptotic variance can be used by researchers to accompany the reported proportion of correct classifications, and both can be used to give a more complete picture of the predictive ability of their models. (C) 2001 Elsevier Science B.V. All rights reserved. MSC: primary 62F12; secondary 62J12.