﻿ Standard errors: model and real

# Standard errors: model and real

A standard error quantifies the precision of a measure or an estimate. It is the standard deviation of an imagined error distribution representing the possible distribution of observed values around their "true" theoretical value. This precision is based on information within the data. The quality-control fit statistics report on accuracy, i.e., how closely the measures or estimates correspond to a reference standard outside the data, in this case, the Rasch model.

Standard errors reported by Winsteps do not include sampling error nor the imprecision in the estimates of all the other persons or items. When estimating the standard error for a person or item, the other persons and items are treated as though their distributions exactly match their populations and their estimated values are their true values. The imprecision in the estimates due to sampling errors and basing person estimates on item estimates, and vice-versa, is usually an order of magnitude less than the reported standard errors.

Note: Survey-style "sample" standard errors and confidence intervals are equivalent to Rasch item-calibration standard errors. So

Survey sample 95% confidence interval on a dichotomous (binary) item reported with a proportion-correct-value as a %

= 1.96 * 100% / (item logit standard error * sample size)

Example: survey report gives: p = 90%, sample size=100, confidence interval (95%) = 90±6%

Winsteps: logit S.E. of item calibration = 1/sqrt(100*.9*.1) = ±.33 logits.

So survey C.I. % = ±1.96 * 100 /(.33 * 100) = ±6%

Standard Errors of Items

The size of a standard error of an estimate is most strongly influenced by the number of observations used to make the estimate. We need measurement precision (standard error size) adequate for the purpose for which we are using the measures.

Probably the only time we need to be concerned about item standard errors within a test is when we want to say "Item A is definitely more difficult than Item B". For this to be true, their measures need to be more than 3 S.E.s different.

When comparing item difficulties estimated from different datasets, we use the item standard errors to identify when differences between the item difficulties of the same item are probably due to chance, and when they may be due to a substantive change, such as item drift.

Model "Ideal" Standard Error

The highest possible precision for any measure is that obtained when every other measure is known, and the data fit the Rasch model. The model standard error is 1/square root (Fisher information). For well-constructed tests with clean data (as confirmed by the fit statistics), the model standard error is usefully close to, but slightly smaller than, the actual standard error. The "model" standard error is the "best case" error. It is the asymptotic value for JMLE. For dichotomous data this is, summed over items i=1,L for person n, or over person n=1,N for item i:

For polytomies (rating scales, partial credit, etc.), with categories j=0,m:

and, for the Rasch-Andrich thresholds,

where Pnik is the probability of observing category k for person n on item i.

Misfit-Inflated "Real" Standard Error

Wright and Panchapakesan (1969) www.rasch.org/memo46.htm discovered an important result for tests in which each examinee takes more than a handful of items, and each item is taken by more than a handful of examinees: the imprecision introduced into the target measure by using estimated measures for the non-target items and examinees is negligibly small. Consequently, in almost all data sets except those based on very short tests, it is only misfit of the data to the model that increases the standard errors noticeably above their model "ideal" errors. Misfit to the model is quantified by fit statistics. But, according to the model, these fit statistics also have a stochastic component, i.e., some amount of misfit is expected in the data. Discovering "perfect" data immediately raises suspicions! Consequently, to consider that every departure of a fit statistic from its ideal value indicates failure of the data to fit the model is to take a pessimistic position. What it is useful, however, is to estimate "real" standard errors by enlarging the model "ideal" standard errors by the model misfit encountered in the data.

Recent work by Jack Stenner shows that the most useful misfit inflation formula is

Real S.E. of an estimated measure = Model S.E. * Maximum [1.0, sqrt(INFIT mean-square)]

In practice, this "Real" S.E. sets an upper bound on measure imprecision. It is the "worst case" error. The actual S.E. lies between the "model" and "real" values. But since we generally try to minimize or eliminate the most aberrant features of a measurement system, we will probably begin by focusing attention on the "Real" S.E. as we establish that measurement system. Once we become convinced that the departures in the data from the model are primarily due to modeled stochasticity, then we may base our decision-making on the usually only slightly smaller "Model" S.E. values.

What about Infit mean-squares less than 1.0? These indicate overfit of the data to the Rasch model, but do not reduce the standard errors. Instead they flag data that is lacking in randomness, i.e., is too deterministic. Guttman data are like this. Their effect is to push the measures further apart. With perfect Guttman data, the mean-squares are zero, and the measures are infinitely far apart. It would seem that inflating the S.E.s would adjust for this measure expansion, but Jack Stenner's work indicates that this is not so. In practice, some items overfit and some underfit the model, so that the overall impact of low infit on the measurement system is diluted.

Standard Errors with Anchor Values

Anchored measures are shown in the Winsteps output Tables with "A". These are set with IAFILE=, PAFILE= and SAFILE=. Anchor values are exactly precise with zero standard error. But each anchor value is reported with a standard error. This is the standard error that the anchor value would have if it were the freely estimated maximum-likelihood value of the parameter.

Plausible Values

"Plausible values" are random draws from a parameter's posterior distribution. Here the posterior distribution is a normal distribution of N(mean=estimated measure, S.D.=standard error) for each parameter. Plausible values would be random draws from this distribution. The Excel formula to do this is =(Measure + S.E.*NORMSINV(RAND( ))) which can be input into an extra column in a PFILE= or IFILE= written to Excel.

Help for Winsteps Rasch Measurement Software: www.winsteps.com. Author: John Michael Linacre

## Just released in June 2017: Winsteps 4.0 with Table 45 Cumulative Plot

### New and on-going: Masterchef Australia 2017: Rasch Measurement of Cooks with Table 45

 Forum Rasch Measurement Forum to discuss any Rasch-related topic

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez
Winsteps Tutorials Facets Tutorials Rasch Discussion Groups

Coming Rasch-related Events
June 30 - July 29, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
July 31 - Aug. 3, 2017, Mon.-Thurs. Joint IMEKO TC1-TC7-TC13 Symposium 2017: Measurement Science challenges in Natural and Social Sciences, Rio de Janeiro, Brazil, imeko-tc7-rio.org.br
Aug. 7-9, 2017, Mon-Wed. In-person workshop and research coloquium: Effect size of family and school indexes in writing competence using TERCE data (C. Pardo, A. Atorressi, Winsteps), Bariloche Argentina. Carlos Pardo, Universidad Catòlica de Colombia
Aug. 7-9, 2017, Mon-Wed. PROMS 2017: Pacific Rim Objective Measurement Symposium, Sabah, Borneo, Malaysia, proms.promsociety.org/2017/
Aug. 10, 2017, Thurs. In-person Winsteps Training Workshop (M. Linacre, Winsteps), Sydney, Australia. www.winsteps.com/sydneyws.htm
Aug. 11 - Sept. 8, 2017, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Aug. 18-21, 2017, Fri.-Mon. IACAT 2017: International Association for Computerized Adaptive Testing, Niigata, Japan, iacat.org
Sept. 15-16, 2017, Fri.-Sat. IOMC 2017: International Outcome Measurement Conference, Chicago, jampress.org/iomc2017.htm
Oct. 13 - Nov. 10, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
April 13-17, 2018, Fri.-Tues. AERA, New York, NY, www.aera.net
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
<script type="text/javascript" src="http://www.rasch.org/events.txt"></script>

For more information, contact Winsteps.com by e-mail using the comment form above.
Our current URL is www.winsteps.com