Table 3.1 Summaries of persons and items

(controlled by REALSE=, UMEAN=, USCALE=, ISUBTOTAL=, PSUBTOTAL=)

This table summarizes the person, item and structure information.

Table 3.1: Gives summaries for all persons and items.

Table 3.2: Summary of rating categories, probability curves and category (confusion) matrix

Table 27.3: Gives subtotal summaries for items, controlled by ISUBTOT=

Table 28.3: Gives subtotal summaries for persons, controlled by PSUBTOT=

SUMMARY OF 34 MEASURED (NON-EXTREME) KID

-------------------------------------------------------------------------------

| TOTAL MODEL INFIT OUTFIT |

| SCORE COUNT MEASURE S.E. MNSQ ZSTD MNSQ ZSTD |

|-----------------------------------------------------------------------------|

| MEAN 10.0 18.0 -.18 1.01 1.00 -.15 .67 -.10 |

| SEM .4 .0 .34 .02 .17 .20 .22 .12 |

| P.SD 2.1 .0 1.94 .10 .95 1.14 1.25 .68 |

| S.SD 2.1 .0 1.97 .10 .96 1.16 1.27 .69 |

| MAX. 14.0 18.0 3.72 1.11 4.16 2.49 6.06 2.23 |

| MIN. 5.0 18.0 -4.29 .81 .18 -1.48 .08 -.73 |

|-----------------------------------------------------------------------------|

| REAL RMSE 1.18 TRUE SD 1.54 SEPARATION 1.30 KID RELIABILITY .63 |

|MODEL RMSE 1.01 TRUE SD 1.66 SEPARATION 1.63 KID RELIABILITY .73 |

| S.E. OF KID MEAN = .34 |

-------------------------------------------------------------------------------

MINIMUM EXTREME SCORE: 1 KID 2.9%

SUMMARY OF 35 MEASURED (EXTREME AND NON-EXTREME) KID

-------------------------------------------------------------------------------

| TOTAL MODEL INFIT OUTFIT |

| SCORE COUNT MEASURE S.E. MNSQ ZSTD MNSQ ZSTD |

|-----------------------------------------------------------------------------|

| MEAN 9.8 18.0 -.36 1.03 |

| SEM .4 .0 .38 .03 |

| P.SD 2.3 .0 2.19 .17 |

| S.SD 2.4 .0 2.22 .18 |

| MAX. 14.0 18.0 3.72 1.85 |

| MIN. 3.0 18.0 -6.58 .81 |

|-----------------------------------------------------------------------------|

| REAL RMSE 1.21 TRUE SD 1.83 SEPARATION 1.51 KID RELIABILITY .70 |

|MODEL RMSE 1.05 TRUE SD 1.93 SEPARATION 1.84 KID RELIABILITY .77 |

| S.E. OF KID MEAN = .38 |

-------------------------------------------------------------------------------

KID RAW SCORE-TO-MEASURE CORRELATION = 1.00

CRONBACH ALPHA (KR-20) KID RAW SCORE "TEST" RELIABILITY = .75 SEM = 1.17

STANDARDIZED (50 ITEM) RELIABILITY = .90

SUMMARY OF 14 MEASURED (NON-EXTREME) TAP

-------------------------------------------------------------------------------

| TOTAL MODEL INFIT OUTFIT |

| SCORE COUNT MEASURE S.E. MNSQ ZSTD MNSQ ZSTD |

|-----------------------------------------------------------------------------|

| MEAN 16.9 35.0 .00 .71 .96 .05 .67 -.06 |

| SEM 3.6 .0 .96 .06 .07 .18 .15 .14 |

| P.SD 12.9 .0 3.47 .21 .26 .63 .55 .52 |

| S.SD 13.4 .0 3.60 .22 .27 .66 .57 .54 |

| MAX. 32.0 35.0 4.80 1.07 1.56 1.23 2.06 1.05 |

| MIN. 1.0 35.0 -4.34 .45 .66 -1.03 .11 -.63 |

|-----------------------------------------------------------------------------|

| REAL RMSE .77 TRUE SD 3.38 SEPARATION 4.40 TAP RELIABILITY .95 |

|MODEL RMSE .74 TRUE SD 3.39 SEPARATION 4.56 TAP RELIABILITY .95 |

| S.E. OF TAP MEAN = .96 |

-------------------------------------------------------------------------------

MAXIMUM EXTREME SCORE: 3 TAP 16.7%

MINIMUM EXTREME SCORE: 1 TAP 5.6%

SUMMARY OF 18 MEASURED (EXTREME AND NON-EXTREME) TAP

-------------------------------------------------------------------------------

| TOTAL MODEL INFIT OUTFIT |

| SCORE COUNT MEASURE S.E. MNSQ ZSTD MNSQ ZSTD |

|-----------------------------------------------------------------------------|

| MEAN 19.0 35.0 -.75 .96 |

| SEM 3.3 .0 1.03 .12 |

| P.SD 14.0 .0 4.24 .51 |

| S.SD 14.2 .0 4.36 .52 |

| MAX. 35.0 35.0 6.12 1.85 |

| MIN. .0 35.0 -6.53 .45 |

|-----------------------------------------------------------------------------|

| REAL RMSE 1.10 TRUE SD 4.09 SEPARATION 3.71 TAP RELIABILITY .93 |

|MODEL RMSE 1.09 TRUE SD 4.10 SEPARATION 3.76 TAP RELIABILITY .93 |

| S.E. OF TAP MEAN = 1.03 |

-------------------------------------------------------------------------------

TAP RAW SCORE-TO-MEASURE CORRELATION = -.98

Global statistics: please see Table 44.

UMEAN=.0000 USCALE=1.0000

EXTREME AND NON-EXTREME SCORES	All items with estimated measures
NON-EXTREME SCORES ONLY	Items with non-extreme scores (omits items or persons with 0% and 100% success rates)
ITEM or PERSON COUNT	count of items or persons. "ITEM" is the name assigned with ITEM= : "PERSON" is the name assigned with PERSON=
MEAN MEASURE	average measure of items or persons.
SEM row	standard error of the mean statistic in the row above
REAL/MODEL S.E. column	standard errors of the measures (REAL = inflated for misfit). The S.E. column summarizes the S.E.s in the measurement Table. So S.E. column, S.D. row, is the S.D.s of the S.E.s in the measurement table. It is not the S.E. of the S.D. to its left in this Table.
REAL/MODEL RMSE	statistical "root-mean-square" average of the standard errors. This is the average conditional standard error of measurement CSEM for this sample.
TRUE P.SD (previously ADJ.SD)	The "true" population standard deviation is the observed population S.D. adjusted for measurement error (RMSE). This is an estimate of the measurement-error-free S.D.
REAL/MODEL SEPARATION	the separation coefficient: G = TRUE P.SD / RMSE Strata = (4*G + 1)/3
REAL/MODEL RELIABILITY	the measure reproducibility = ("True" item measure variance / Observed variance) = Separation ² / (1 + Separation ²)
S.E. MEAN	standard error of the mean measure of items or persons

For valid observations used in the estimation,

NON-EXTREME persons or items - summarizes persons (or items) with non-extreme scores (omits zero and perfect scores).

EXTREME AND NON-EXTREME persons or items - summarizes persons (or items) with all estimable scores (includes zero and perfect scores). Extreme scores (zero, minimum possible and perfect, maximum possible scores) have no exact measure under Rasch model conditions. Using a Bayesian technique, however, reasonable measures are reported for each extreme score, see EXTRSC=. Totals including extreme scores are reported, but are necessarily less inferentially secure than those totals only for non-extreme scores. Extreme persons and extreme items (minimum possible scores and maximum possible scores) have no infit nor outfit statistics, so those statistics are omitted from "extreme and non-extreme".

RAW SCORE is the raw score (number of correct responses excluding extreme scores, TOTALSCORE=N).

TOTAL SCORE is the raw score (number of correct responses including extreme scores, TOTALSCORE=Y).

COUNT is the number of responses made.

MEASURE is the estimated measure (for persons) or calibration (for items).

REAL/MODEL: REAL is computed on the basis that misfit in the data is due to departures in the data from model specifications. This is the worst-case situation. MODEL is computed on the basis that the data fit the model, and that all misfit in the data is merely a reflection of the stochastic nature of the model. This is the best-case situation.

S.E. is the standard error of the estimate.

INFIT is an information-weighted fit statistic, which is more sensitive to unexpected behavior affecting responses to items near the person's measure level.

MNSQ is the mean-square infit statistic with expectation 1. Values substantially below 1 indicate dependency in your data; values substantially above 1 indicate noise.

ZSTD is the infit mean-square fit statistic t standardized to approximate a theoretical mean 0 and variance 1 distribution. ZSTD (standardized as a z-score) is used of a t-test result when either the t-test value has effectively infinite degrees of freedom (i.e., approximates a unit normal value) or the Student's t-statistic distribution value has been adjusted to a unit normal value. When LOCAL=Y, then EMP is shown, indicating a local {0,1} standardization. When LOCAL=L, then LOG is shown, and the natural logarithms of the mean-squares are reported.

OUTFIT is an outlier-sensitive fit statistic, more sensitive to unexpected behavior by persons on items far from the person's measure level.

MNSQ is the mean-square outfit statistic with expectation 1. Values substantially less than 1 indicate dependency in your data; values substantially greater than 1 indicate the presence of unexpected outliers.

ZSTD is the outfit mean-square fit statistic t standardized to approximate a theoretical mean 0 and variance 1 distribution. ZSTD (standardized as a z-score) is used of a t-test result when either the t-test value has effectively infinite degrees of freedom (i.e., approximates a unit normal value) or the Student's t-statistic distribution value has been adjusted to a unit normal value. When LOCAL=Y, then EMP is shown, indicating a local {0,1} standardization. When LOCAL=L, then LOG is shown, and the natural logarithms of the mean-squares are reported.

MEAN is the average value of the statistic.

P.SD is its standard deviation assuming that this sample of the statistic is the entire population. It is not, the corrected sample S.D. = (P.SD / √ (Count of statistic) / (Count of statistic - 1)) 10

P.SD = Population standard deviation (when the sample is the entire population)

S.SD = Sample standard deviation (when the sample represents the population)

MAX. is its maximum value.

MIN. is its minimum value.

MODEL RMSE is computed on the basis that the data fit the model, and that all misfit in the data is merely a reflection of the stochastic nature of the model. This is a "best case" reliability, which reports an upper limit to the reliability of measures based on this set of items for this sample. This RMSE for the person sample is equivalent to the "Test SEM (Standard Error of Measurement)" of Classical Test Theory.

REAL RMSE is computed on the basis that misfit in the data is due to departures in the data from model specifications. This is a "worst case" reliability, which reports a lower limit to the reliability of measures based on this set of items for this sample.

RMSE is the square-root of the average error variance. It is the Root Mean Square standard Error computed over the persons or over the items. Here is how RMSE is calculated in Winsteps:
George ability measure = 2.34 logits. Standard error of the ability measure = 0.40 logits.
Mary ability measure = 3.62 logits. Standard error of the ability measure = 0.30 logits.
Error = 0.40 and 0.30 logits.
Square error = 0.40*0.40 = 0.16 and 0.30*0.30 = 0.09
Mean (average) square error = (0.16+0.09) / 2 = 0.25 / 2 = 0.125
RMSE = Root mean square error = sqrt (0.125) = 0.354 logits

TRUE P.SD is the population standard deviation of the estimates (assumed to be the population) after subtracting the error variance (attributable to their standard errors of measurement) from their observed variance.
(TRUE P.SD)² = (P.SD of MEASURE)² - (RMSE)²
The TRUE P.SD is an estimate of the unobservable exact standard deviation, obtained by removing the bias caused by measurement error.

SEPARATION coefficient is the ratio of the PERSON (or ITEM) TRUE P.SD, the "true" standard deviation, to RMSE, the error standard deviation. It provides a ratio measure of separation in RMSE units, which is easier to interpret than the reliability correlation. (SEPARATION coefficient)² is the signal-to-noise ratio, the ratio of "true" variance to error variance.

RELIABILITY is a separation reliability (separation index). The PERSON (or ITEM) reliability is equivalent to KR-20, Cronbach Alpha, and the Generalizability Coefficient. See much more at Reliability.

Real reliability while you are improving your results. This assumes misfit contradicts the Rasch model.

Model reliability when your results are as good as they can be. This assumes misfit is the randomness predicted by the Rasch model

S.E. OF MEAN is the standard error of the mean of the person (or item) measures for this sample.

MEDIAN is the median measure of the sample (in Tables 27, 28).

Message	Meaning for Persons or Items
MAXIMUM EXTREME SCORE	All non-missing responses are scored correct (perfect score) or in the top categories. Measures are estimated.
MINIMUM EXTREME SCORE	All non-missing responses are scored incorrect (zero score) or in the bottom categories. Measures are estimated.
LACKING RESPONSES	All responses are missing. No measures are estimated.
DELETED	Persons deleted with PDFILE= or PDELETE=. Items deleted with IDFILE= or IDELETE=
IGNORED	Entry numbers higher than highest reported entry number are deleted and not reported
CUTLO= CUTHI=	CUTLO= and CUTHI= values if these are active. They reduce the number of valid responses.

PERSON RAW SCORE-TO-MEASURE CORRELATION is the Pearson correlation between raw scores and measures, including extreme scores. When data are complete, this correlation is expected to be near 1.0 for persons.

CRONBACH ALPHA (KR-20) KID RAW SCORE "TEST" RELIABILITY is the conventional "test" reliability index. It reports an approximate test reliability based on the raw scores of this sample. It is reported for all the data, including persons and items with missing data. For incomplete data, the formula in your email applies exactly in Winsteps. When computing each person and item raw-score variance separately, Winsteps uses the its observed data only. Winsteps does not check whether persons or items are complete or incomplete. See more at Reliability. Cronbach Alpha is an estimate of the person-sample reliability (= person-score-order reproducibility). Classical Test Theory does not usually compute an estimate of the item reliability (= item-value-order reproducibility), but it could. Winsteps reports both person-sample reliability (=person-measure-order reproducibility) and item reliability (= item-measure-order-reproducibility). Cronbach Alpha is computed for both dichotomous and polytomous data. Cronbach Alpha is the same as KR-20 when the data are dichotomous and complete. KR-20 is not defined for polytomous data. Cronbach Alpha is influenced by missing data. Incomplete data are usually less reliable than complete data. Confusingly, CTT "item reliability" is a different reliability. It is the reliability of the person scores based on one item.

High Cronbach Alpha with low Rasch reliability indicates that the raw scores reproduce reliably, but their meaning on the latent variable does not. Example: a survey in which 65% of the persons respond "Agree" to every question. The score of "Agree" is reliably predictable. The meaning of "Agree" on the latent variable is fuzzily unreliable.

Lee J. Cronbach in "Essentials of Psychological Testing" (1970) is explicit that his focus is on "persons" and "test scores". Classical Test Theory follows his lead and only computes Cronbach Alpha for persons (or their equivalent). Alpha was originally incorporated into Winsteps following requests by CTT-orientated analysts and reviewers. But there is no reason why the same computation cannot be applied to items. In Winsteps, transpose the data (use the Output Files menu), analyze the transposed dataset, and report Table 3. Its Cronbach Alpha applies to the original items. If Alpha for the items is too low, then the likely reason is that the person sample size is too small. More persons -> higher item Alpha. More items -> higher person Alpha.

SEM this is the "standard error of measurement" (the averaged S.E. of the person raw-scores) reported by Classical Test Theory = raw score S.D. * √(1-Cronbach Alpha)

STANDARDIZED (50 ITEM) RELIABILITY is the reliability of this test for this sample if the test had 50 items, according to the Spearman-Brown Prediction Formula.

ITEM RAW SCORE-TO-MEASURE CORRELATION is the Pearson correlation between raw scores and measures, including extreme scores. When data are complete, this correlation is expected to be near -1.0 for items. This is because higher measure implies lower probability of success and so lower item scores.

Global fit: please see Table 44.

UMEAN=.000 USCALE=1.000 are the current settings of UMEAN= and USCALE=.

Help for Winsteps Rasch Measurement and Rasch Analysis Software: www.winsteps.com. Author: John Michael Linacre

Rasch Books and Publications: Winsteps and Facets
Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Rasch Models for Solving Measurement Problems (Facets), George Engelhard, Jr. & Jue Wang	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
As an Amazon Associate I earn from qualifying purchases. This does not change what you pay.

Coming Rasch-related Events
May 17 - June 21, 2024, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 12 - 14, 2024, Wed.-Fri.	1st Scandinavian Applied Measurement Conference, Kristianstad University, Kristianstad, Sweden http://www.hkr.se/samc2024
June 21 - July 19, 2024, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 5 - Aug. 7, 2024, Mon.-Wed.	2024 Inaugural Conference of the Society for the Study of Measurement (Berkeley, CA), Call for Proposals
Aug. 9 - Sept. 6, 2024, Fri.-Fri.	On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 4 - Nov. 8, 2024, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

Questions, Suggestions? Want to update Winsteps or Facets? Please email Mike Linacre, author of Winsteps mike@winsteps.com