|
Estimation methods: JMLE, PROX, XMLE |
Up Previous Next |
|
Winsteps implements three methods of estimating Rasch parameters from ordered qualitative observations: JMLE, PROX and XMLE. Estimates of the Rasch measures are obtained by iterating through the data. Initially all unanchored parameter estimates (measures) are set to zero. Then the PROX method is employed to obtain rough estimates. Each iteration through the data improves the PROX estimates until they are usefully good. Then those PROX estimates are the initial estimates for JMLE which fine-tunes them, again by iterating through the data, in order to obtain the final JMLE estimates. The iterative process ceases when the convergence criteria are met. These are set by MJMLE=, CONVERGE=, LCONV= and RCONV=. Depending on the data design, this process can take hundreds of iterations (Convergence: Statistics or Substance?). When only rough estimates are needed, force convergence by pressing Ctrl+F or by selecting "Finish iterating" on the File pull-down menu.
Extreme scores: (perfect, maximum possible scores, and zero, minimum possible scores) are dropped from the main estimation procedure. Their measures are estimated separately using EXTRSC=.
Missing data: most Rasch estimation methods do not require that missing data be imputed, or that there be case-wise or list-wise omission of data records with missing data. For datasets that accord with the Rasch model, missing data lower the precision of the measures and lessen the sensitivity of the fit statistics, but do not bias the measure estimates.
Likelihood: Using the current parameter estimates (Rasch measures), the probability of observing each data point is computed, assuming the data fit the model. The probabilities of all the data points are multiplied together to obtain the likelihood of the entire data set. The parameter estimates are then improved (in accordance with the estimation method) and a new likelihood for the data is obtained. The values of the parameters for which the likelihood of the data has its maximum are the "maximum likelihood estimates" (Ronald A. Fisher, 1922).
JMLE "Joint Maximum Likelihood Estimation" is also called UCON, "Unconditional maximum likelihood estimation". It was devised by Wright & Panchapakesan, www.rasch.org/memo46.htm. In this formulation, the estimate of the Rasch parameter (for which the observed data are most likely, assuming those data fit the Rasch model) occurs when the observed raw score for the parameter matches the expected raw score. "Joint" means that the estimates for the persons (rows) and items (columns) and rating scale structures (if any) of the data matrix are obtained simultaneously. The iterative estimation process is described at Iteration.
Advantages - these are implementation dependent, and are implemented in Winsteps: (1) independence from specific person and item distributional forms. (2) flexibility with missing data (3) the ability to analyze test lengths and sample sizes of any size (4) symmetrical analysis of person and item parameters so that transposing rows and columns does not change the estimates (5) flexibility with person, item and rating scale structure anchor values (6) flexibility to include different variants of the Rasch model in the same analysis (dichotomous, rating scale, partial credit, etc.) (7) unobserved intermediate categories of rating scales can be maintained in the estimation with exact probabilities. (8) all non-extreme score estimable (after elimination of extreme scores and rarely-observed Guttman subsets) (9) all persons with the same total raw scores on the same items have the same measures; all items with the same raw scores across the same persons have the same measures.
Disadvantages: (11) measures for extreme (zero, perfect) scores for persons or items require post-hoc estimation. (12) estimates are statistically inconsistent (13) estimation bias, particularly with small samples or short tests, inflates the logit distance between estimates. (14) chi-squares reported for fit tests (particularly global fit tests) may be somewhat inflated, exaggerating misfit to the Rasch model.
Comment on (8): An on-going debate is whether measures should be adjusted up or down based on the misfit in response patterns. With conventional test scoring and Rasch JMLE, a lucky guess counts as a correct answer exactly like any other correct answer. Unexpected responses can be identified by fit statistics. With the three-parameter-logistic item-response-theory (3-PL IRT) model, the score value of an unexpected correct answer is diminished whether it is a lucky guess or due to special knowledge. In Winsteps, responses to off-target items (the locations of lucky guesses and careless mistakes) can be trimmed with CUTLO= and CUTHI=, or be diminished using TARGET=Yes.
Comment on (13): JMLE exhibits some estimation bias in small data sets (for reasons, see XMLE below), but this rarely exceeds the precision (model standard error of measurement, SEM) of the measures. Estimation bias is only of concern when exact probabilistic inferences are to be made from short tests or small samples. It can be exactly corrected for paired-comparison data with PAIRED=Yes. For other data, It can be approximately corrected with STBIAS=Yes, but, in practice, this is not necessary (and sometimes not advisable).
PROX is the Normal Approximation Algorithm devised of Cohen (1979). This algorithm capitalizes on the similar shapes of the logistic and normal ogives. It models both the persons and the items to be normally distributed. The variant of PROX implemented in Winsteps allows missing data. The form of the estimation equations is: Ability of person = Mean difficulty of items encountered + log ( (observed score - minimum possible score on items encountered) / (maximum possible score on items encountered - observed score) ) * square-root ( 1 + (variance of difficulty of items encountered) / 2.9 )
In Winsteps, PROX iterations cease when the variance of the items encountered does not increase substantially from one iteration to the next.
Advantages - these are implementation dependent, and are implemented in Winsteps: (2)-(9) of JMLE Computationally the fastest estimation method.
Disadvantages (1) Person and item measures assumed to be normally distributed. (11)-(14) of JMLE
Other estimation methods in common use (but not implemented in Winsteps):
Gaussian least-squares finds the Rasch parameter values which minimize the overall difference between the observations and their expectations, Sum((Xni - Eni)^2) where the sum is overall all observations, Xni is the observation when person encounters item i, and Eni is the expected value of the observation according to the current Rasch parameter estimates. For Effectively, off-target observations are down-weighted, similar to TARGET=Yes in Winsteps.
Minimum chi-square finds the Rasch parameter values which minimize the overall statistical misfit of the data to the model, Sum((Xni - Eni)^2 / Vni) where Vni is the modeled binomial or multinomial variance of the observation around its expectation. Effectively off-target observations are up-weighted to make them less improbable.
Gaussian least-squares and Minimum chi-square: Advantages - these are implementation dependent: (1)-(8) All those of JMLE.
Disadvantages: (9) persons with the same total raw scores on the same items generally have different measures; items with the same raw scores across the same persons generally have different measures. (11)-(13) of JMLE (14) global fit tests uncertain.
CMLE. Conditional maximum likelihood estimation. Item difficulties are structural parameters. Person abilities are incidental parameters, conditioned out for item difficulty estimation by means of their raw scores. The item difficulty estimates are those that maximize the likelihood of the data given the person raw scores and assuming the data fit the model. The item difficulties are then used for person ability estimation using a JMLE approach.
Advantages - these are implementation dependent: (1), (6)-(9) of JMLE (3) the ability to analyze person sample sizes of any size (5) flexibility with item and rating scale structure anchor values (12) statistically-consistent item estimates (13) minimally estimation-biased item estimates (14) exact global fit statistics
Disadvantages: (2) limited flexibility with missing data (3) test length severely limited by mathematical precision of the computer (4) asymmetric analysis of person and item parameters so that transposing rows and columns changes the estimates (5) no person anchor values (11) of JMLE (13) estimation-biased of person estimates small but uncertain
EAP. Expected A Posteriori estimation derives from Bayesian statistical principles. This requires assumptions about the expected parameter distribution. An assumption is usually normality, so EAP estimates are usually more normally distributed than Winsteps estimates (which are as parameter-distribution-free as possible). EAP is not implemented in Winsteps.
MMLE. Marginal maximum likelihood estimation. Item difficulties are structural parameters. Person abilities are incidental parameters, integrated out for item difficulty estimation by imputing a person measure distribution. The item difficulties are then used for person ability estimation using a JMLE approach.
Advantages - these are implementation dependent: (3), (6)-(9) of JMLE (1) independence from specific item distributional forms. (2) flexibility with missing data extends to minimal length person response strings (5) flexibility with item and rating scale structure anchor values (11) extreme (zero, perfect) scores for persons are used for item estimation. (12) statistically-consistent item estimates (13) minimally estimation-biased item estimates (14) exact global fit statistics
Disadvantages: (1) specific person distribution required (4) asymmetric analysis of person and item parameters so that transposing rows and columns changes the estimates (5) no person anchor values (11) measures for extreme (zero, perfect) scores for specific persons or items require post-hoc estimation. (13) estimation-biased of person estimates small but uncertain
PMLE. Pairwise maximum likelihood estimation. Person abilities are incidental parameters, conditioned out for item difficulty estimation by means of pairing equivalent person observations. The item difficulties are then used for person ability estimation using a JMLE approach.
Advantages - these are implementation dependent: (1), (3), (6), (7) of JMLE (5) flexibility with item and rating scale structure anchor values (8) all persons with the same total raw scores on the same items have the same measure (12) statistically-consistent item estimates
Disadvantages: (11) of JMLE (2) reduced flexibility with missing data (4) asymmetric analysis of person and item parameters so that transposing rows and columns changes the estimates (5) no person anchor values (8) items with the same total raw scores across the same persons generally have different measures. (13) estimation-biased or item and person estimates small but uncertain (14) global fit tests uncertain. (15) uneven use of data in estimation renders standard errors and estimates less secure
WMLE. WMLE (or WLE) estimates are usually slightly more central than Winsteps estimates.Warm's (1989) Weighted Maximum Likelihood Estimation. Standard MLE estimates are the maximum values of the likelihood function and so statistical modes. Warm shows that the likelihood function is skewed, leading to an additional source of estimation bias. The mean likelihood estimate is less biased. Warm suggests an unbiasing correction that can be applied, in principle, to any MLE method, but there are computational constraints. Even when feasible, this fine tuning appears to be less than the relevant standard errors and have no practical benefit. It is not currently implemented in Winsteps.
XMLE, "Exclusory Maximum Likelihood Estimation", implements Linacre's (1989) XCON algorithm in Winsteps. Statistical "consistency" is the property that an estimation method will yield the "true" value of a parameter when there is infinite data. Statistical "estimation bias" is the degree to which an estimate differs from its "true" value with a finite amount of data. JMLE is statistically inconsistent under some conditions, and noticeably estimation-biased for short tests or small samples, because it includes the possibility of extreme scores in the estimation space, but cannot estimate them. The XMLE algorithm removes the possibility of extreme response vectors from the estimation space, to a first approximation. This makes XMLE consistent, and much less estimation-biased than JMLE. In fact XMLE is even less biased than CMLE for small samples, this is because CMLE only eliminates the possibility of extreme person response vectors, not the possibility of extreme item response vectors.
XMLE and JMLE use the same estimation methods. The difference is in the probability terms used in the estimation equations. For JMLE, for the dichotomous case, loge(Pni1 / Pni0 ) = Bn - Di where Pni1 is the probability that person n succeeds on item i. For XMLE, Rni1 = Pni1 - Product(Pmi1) - Product(Pnj1) + Product(Pmi1) * Product(Pnj1) where m =1,N and j=1,L, so that Product(Pmi1) is the likelihood of a perfect-score for person n, and Product(Pnj1) is the likelihood of the sample all succeeding on item i. Similarly, Rni0 = Pni0 - Product(Pmi0) - Product(Pnj0) + Product(Pmi0) * Product(Pnj0) So the JMLE estimation equation for person n or item i is based on Expected raw score = Sum(Rni1/(Rni1+Rni0)) for i or n
Example: Consider a two-item dichotomous test. Possible person scores are 0, 1, 2. Person scores of 0 and 2 are dropped from estimation as extreme. The remaining very large sample of N persons all score 1 success so all have the same measure, 0 logits for convenience. Twice as many successes are observed on item 1 as item 2. Under these conditions, in the estimation sample, a success on item 1 requires a failure on item 2 and vice-versa. So, according to the Rasch model, the logit distance between item 1 and item 2 = log (frequency of success on item 1 / frequency of success on item 2) = log(2). And the expected score on item 1 is 2/3 and on item 2 is 1/3.
JMLE considers observations of item 1 and item 2 to be independent of the total raw score, and computes the distance between item 1 and item 2 = log (frequency of success on item 1 / frequency of failure on item 1) - log (frequency of success on item 2 / frequency of failure 2) = log(2/1) - log(1/2) = 2 log(2), i.e., twice that of the direct Rasch model. This is the worst case of JMLE estimation bias and occurs with pairwise comparison data. For such data, this estimation-bias of 2 can be automatically corrected with PAIRED=Yes. As test length increases, the bias reduces and is considered to be non-consequential for test lengths of 10 items or more, Wright's Memo 45.
For XMLE, let's assume the item difficulties are -0.5 * log(2) and 0.5 * log(2). Set C = square-root(2). Then Pn11 = Pn20 = C / (1+C) and Pn10 = Pn21 = 1 / (1+C). Rn11 = C/(1+C) - C/(1+C) * 1/(1+C) - 0 + 0 (due to very large sample) = (C/(1+C))^2 and Rn10 = 1/(+C) - 1/(1+C) * C/(1+C) - 0 + 0 (again due to very large sample) = (1/(1+C))^2. Then expected score for a person on item 1 = Rn11/(Rn11+Rn10) = (C/(1+C))^2 / ( ((C/(1+C))^2 + (C/(1+C))^2 ) = C^2 / (C^2 + 1) = 2/3. Similarly, Rn21 = 1/3 - as required. And the logit distance between item 1 and item 2 is 0.5 * log(2) - 0.5 * log(2) = log(2) as required. There is no estimation bias in this example.
Considerations with XMLE=YES include: (1) Anchoring values changes the XMLE probabilities. Consequently, measures from a Table 20 score table do not match measures from the estimation run. Consequently, it may be necessary to estimate item calibrations with XMLE=YES. Then anchor the items and perform XMLE=NO. (2) Items and persons with extreme (zero and perfect) scores are deleted from the analysis. (3) For particular data structures, measures for finite scores may not be calculable.
Advantages - these are implementation dependent, and are implemented in Winsteps: (1)-(8) of JMLE (12) estimates are statistically consistent (13) estimation bias is small
Disadvantages: (11) measures for extreme (zero, perfect) scores for persons or items require post-hoc estimation, and even then may not be estimable (14) global fit tests uncertain
Cohen Leslie. (1979) Approximate Expressions for Parameter Estimates in the Rasch Model, The British Journal of Mathematical and Statistical Psychology, 32, 113-120
Fisher R.A. On the mathematical foundations of theoretical statistics. Proc. Roy. Soc. 1922 Vol. CCXXII p. 309-368
Warm T.A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427-450
|
Help for WINSTEPS® Rasch Measurement Software: www.winsteps.com.