﻿ Table 3.2+ Summary of dichotomous, rating scale or partial credit structures

# Table 3.2+ Summary of dichotomous, rating scale or partial credit structures

(controlled by STEPT3=, STKEEP=, MRANGE=, MTICK=, DISCRIM=)

The average measures and category fit statistics are how the response structure worked "for this sample" (which might have high or low performers etc.). For each observation in category k, there is a person of measure Bn and an item of measure Di. Then:

average measure = sum( Bn - Di ) / count of observations in category. These are not estimates of parameters.

The probability curves are how the response structure is predicted to work for any future sample, provided it worked satisfactorily for this sample.

Our logic is that if the average measures and fit statistics don't look reasonable for this sample, why should they in any future sample? If they look OK for this sample, then the probability curves tell us about future samples. If they don't look right now, then we can anticipate problems in the future.

Persons and items with extreme scores (maximum possible and minimum possible) are omitted from Table 3.2 because they provide no information about the relative difficulty of the categories. See Table 14.3 for their details

a) For dichotomies,

SUMMARY OF CATEGORY STRUCTURE.  Model="R"

FOR GROUPING "0" ACT NUMBER: 12  Go to museum

ACT DIFFICULTY MEASURE OF -1.14 ADDED TO MEASURES

-----------------------------------------------------------------------

|CATEGORY   OBSERVED|OBSVD SAMPLE|INFIT OUTFIT| COHERENCE       |ESTIM|

|LABEL SCORE COUNT %|AVRGE EXPECT|  MNSQ  MNSQ| M->C C->M  RMSR |DISCR|

|-------------------+------------+------------+-----------------+-----|

|  1   1      13  17|  -.45  -.06|   .83   .52| 100%  23%  .6686|     | 1 Neutral

|  2   2      62  83|  1.06   .98|   .78   .85|  86% 100%  .1725| 1.23| 2 Like

-----------------------------------------------------------------------

|MISSING      51  40|  -.54      |            |                 |     |

-----------------------------------------------------------------------

OBSERVED AVERAGE is mean of measures in category. It is not a parameter estimate.

M->C = Does Measure imply Category?

C->M = Does Category imply Measure?

ITEM MEASURE OF -1.07 ADDED TO MEASURES

When there is only one item in a grouping (the Partial Credit model), the item measure is added to the reported measures.

CATEGORY LABEL is the number of the category in your data set after scoring/keying.

CATEGORY SCORE is the ordinal value of the category used in computing raw scores - and in Table 20.
MISSING are missing responses.

OBSERVED COUNT and % is the count of occurrences of this category. Counts by data code are given in the distractor Tables, e.g., Table 14.3. % is percentage of non-missing data, except for MISSING% which is of all data.

OBSVD AVERGE is the average of the (person measures - item difficulties) that are modeled to produce the responses observed in the category. The average measure is expected to increase with category value. Disordering is marked by "*". This is a description of the sample, not a Rasch parameter. Only observations used for estimating the Andrich thresholds are included in this average (not observations in extreme scores.) For all observations, see Table 14.3 For each observation in category k, there is a person of measure Bn and an item of measure Di. Then: average measure = sum( Bn - Di ) / count of observations in category.

Disordered categories: since extreme scores are theoretically at infinity, their inclusion in the observed-average computation would tend to force the top and bottom categories to be ordered. For evaluating category disordering, we need response strings in which categories can be compared with each other, i.e., non-extreme response strings.

SAMPLE EXPECT is the expected value of the average measure for this sample. These values always advance with category. This is a description of the sample, not a Rasch parameter.

INFIT MNSQ is the average of the INFIT mean-squares associated with the responses in each category. The expected values for all categories are 1.0.

OUTFIT MNSQ is the average of the OUTFIT mean-squares associated with the responses in each category. The expected values for all categories are 1.0. This statistic is sensitive to grossly unexpected responses.

Note: Winsteps reports the MNSQ values in Table 3.2. An approximation to their standardized values can be obtained by using the number of observations in the category as the degrees of freedom, and then looking at the plot.

COHERENCE: the usefulness of M->C and C->M depends on the inferences you intend to make from your data.

M->C shows what percentage of the measures that were expected to produce observations in this category actually did. Do the measures imply the category?

Some users of Winsteps, particularly in medical applications, need to know

1) how accurately patient measure is predicted from single observations. This is C->M.
C->M is 10% means that we can predict the estimated measure from the observed category about 1 time in 10.

2) how accurately patient measure predicts single observations. This is M->C.
M->C is 50% means that we can predict the observed category from the estimated measure half the time.

In general, the wider the categories on the latent variable (more dispersed thresholds), the higher these percents will be.

Guttman's Coefficient of Reproducibility is the count-weighted average of the M->C, i.e.,  Reproducibility = sum across categories (COUNT * M->C) / sum(COUNT * 100)

C->M shows what percentage of the observations in this category were produced by measures corresponding to the category. Does the category imply the measures?

Computation: For each observation in the XFILE=,

"expected response value" - round this to the nearest category number = expected average category

if "expected average category" = "response value (after scoring and recounting)" then MC = 1, else, MC = 0

Compute average of MC for each observed category across all the relevant data for C->M

Compute average of MC for each expected category across all the relevant data for M->C

RMSR is the root-mean-square residual, summarizing the differences between observations in this category and their expectations (excluding observations in extreme scores).

ESTIM DISCR is an estimate of the local discrimination when the model is parameterized in the form: log-odds = aj (Bn - Di - Fj). Its expected value is 1.0.

RESIDUAL (when shown) is the residual difference between the observed and expected counts of observations in the category. Shown as % of expected, unless observed count is zero. Then residual count is shown. Only shown if residual count is >= 1.0. Indicates lack of convergence, structure anchoring, or large data set.

CATEGORY CODES are shown to the right from on CODES=. The original data code is shown. If several data codes are scored to the same value, only the first data code found in CODES= is shown. Change the reported data code by changing the order of the data codes on CODES= with matching changes in NEWSCORE= and IVALUE=.

CATEGORY LABELS are shown to the extreme right based on CFILE= and CLFILE=. The label from CFILE= is reported, if any. If none, then the label from CLFILE= is reported, if any.

Measures corresponding to the dichotomous categories are not shown, but can be computed using the Table at "What is a Logit?" and LOWADJ= and HIADJ=.

DICHOTOMOUS CURVES

P      -+--------------+--------------+--------------+--------------+-

R  1.0 +                                                             +

O      |                                                             |

B      |0                                                           1|

A      | 000000                                               111111 |

B   .8 +       00000                                     11111       +

I      |            0000                             1111            |

L      |                0000                     1111                |

I      |                    000               111                    |

T   .6 +                       000         111                       +

Y      |                          000   111                          |

.5 +                             ***                             +

O      |                          111   000                          |

F   .4 +                       111         000                       +

|                    111               000                    |

R      |                1111                     0000                |

E      |            1111                             0000            |

S   .2 +       11111                                     00000       +

P      | 111111                                               000000 |

O      |1                                                           0|

N      |                                                             |

S   .0 +                                                             +

E      -+--------------+--------------+--------------+--------------+-

-2             -1              0              1              2

KID [MINUS] TAP MEASURE

Dichotomous category probability curves. The curve for "1" is also the model Item Characteristic Curve (ICC), also called the Item Response Function (IRF). Rasch model dichotomous curves are all the same.

b) For rating scales and partial credit items, the structure calibration table lists:

SUMMARY OF CATEGORY STRUCTURE.  Model="R"

FOR GROUPING "0" ACT NUMBER: 1  Watch birds

ACT DIFFICULTY MEASURE OF -.89 ADDED TO MEASURES

-------------------------------------------------------------------

|CATEGORY   OBSERVED|OBSVD SAMPLE|INFIT OUTFIT|| ANDRICH |CATEGORY|

|LABEL SCORE COUNT %|AVRGE EXPECT|  MNSQ  MNSQ||THRESHOLD| MEASURE|

|-------------------+------------+------------++---------+--------|

|  0   0       3   4|  -.96  -.44|   .74   .73||  NONE   |( -3.66)| 0 Dislike

|  1   1      35  47|   .12   .30|   .70   .52||   -1.64 |   -.89 | 1 Neutral

|  2   2      37  49|  1.60  1.38|   .75   .77||    1.64 |(  1.88)| 2 Like

-------------------------------------------------------------------

OBSERVED AVERAGE is mean of measures in category. It is not a parameter estimate.

ITEM MEASURE OF -.89 ADDED TO MEASURES

When there is only one item in a grouping (the Partial Credit model), the item measure is added to the reported measures.

CATEGORY LABEL, the number of the category in your data set after scoring/keying.

CATEGORY SCORE is the value of the category in computing raw scores - and in Table 20.

OBSERVED COUNT and %, the count of occurrences of this category. Counts by data code are given in the distractor Tables, e.g., Table 14.3.

OBSVD AVERGE is the average of the measures that are model led to produce the responses observed in the category. The average measure is expected to increase with category value. Disordering is marked by "*". This is a description of the sample, not the estimate of a parameter. For each observation in category k, there is a person of measure Bn and an item of measure Di. Then: average measure = sum( Bn - Di ) / count of observations in category.

SAMPLE EXPECT is the expected value of the average measure for this sample. These values always advance with category. This is a description of the sample, not a Rasch parameter.

INFIT MNSQ is the average of the INFIT mean-squares associated with the responses in each category. The expected values for all categories are 1.0. Only values greater than 1.5 are problematic.

OUTFIT MNSQ is the average of the OUTFIT mean-squares associated with the responses in each category. The expected values for all categories are 1.0. This statistic is sensitive to grossly unexpected responses. Only values greater than 1.5 are problematic.

Note: Winsteps reports the MNSQ values in Table 3.2. An approximation to their standardized values can be obtained by using the number of observations in the category as the degrees of freedom, and then looking at the plot.

ANDRICH THRESHOLD, the calibrated measure of the transition from the category below to this category. This is an estimate of the Rasch-Andrich model parameter, Fj. Use this for anchoring in Winsteps. (This corresponds to Fj in the Di+Fj parametrization of the "Rating Scale" model, and is similarly applied as the Fij of the Dij=Di+Fij of the "Partial Credit" model.) The bottom category has no prior transition, and so that the measure is shown as NONE. This parameter, sometimes called the Step Difficulty, Step Calibration, Rasch-Andrich threshold, Tau or Delta, indicates how difficult it is to observe a category, not how difficult it is to perform it. The Rasch-Andrich threshold is expected to increase with category value. Disordering of these estimates (so that they do not ascend in value up the rating scale), sometimes called "disordered deltas", indicates that the category is relatively rarely observed, i.e., occupies a narrow interval on the latent variable, and so may indicate substantive problems with the rating (or partial credit) scale category definitions. These Rasch-Andrich thresholds are relative pair-wise measures of the transitions between categories. They are the points at which adjacent category probability curves intersect. They are not the measures of the categories. See plot below.

CATEGORY MEASURE, the sample-free measure corresponding to this category. ( ) is printed where the matching calibration is infinite. The value shown corresponds to the measure .25 score points (or LOWADJ= and HIADJ=) away from the extreme. This is the best basis for the inference: "ratings averaging x imply measures of y" or "measures of y imply ratings averaging x". This is implied by the Rasch model parameters. These are plotted in Table 2.2

"Category measures" answer the question "If there were a thousand people at the same location on the latent variable and their average rating was the category value, e.g., 2.0, then where would that location be, relative to the item?" This seems to be what people mean when they say "a performance at level 2.0". It is estimated from the Rasch expectation

.

To discover this ability location, we start with the Rasch model, log (Pnij / Pni(j-1) ) = Bn - Di - Fj, For known Di, Fj and trial Bn. This produces a set of {Pnij}.

Compute the expected rating score: Eni = sum (jPnij)  across the categories.

Adjust Bn' = Bn + (desired category - Eni) / (large divisor), until Eni = desired category, when Bn is the desired category measure.

Here is how to apply this formula. The purpose of this formula is to find the person ability measure corresponding to an expected score on an item.

a) Usually Winsteps has done this for us. Look at ISFILE= - the columns labeled "AT CAT", "BOT+0.25", "Top-0.25" show the person abilities, Bn, corresponding to an expected score equal to each category value of the rating scale, or a reasonable proxy for the extreme categories.

b) If you want to apply the formula yourself to find the expected value corresponding to an expected value of, say, 2.3 on an item with a 1-5 rating scale. Here's what to do.

1) Select the item, i, and the expected value, E, that you want.

2) Look at the Winsteps EFILE= output

If the expected value, E, is not there, then ...

c) set Bn = the ability with the Eni closest to the E you want. Note down Eni, the expected value corresponding to Bn in the EFILE=

3) Look up the item difficulty Di in Table 14, and the Andrich thresholds, {Fj}, in Table 3.2

4) Now start to apply the formula:

Bn' = Bn + (desired category - Eni) / (large divisor)

desired category = your value E

Bn and Eni are from step 2)

(large divisor) = 10 - this produces a small increment along the latent variable. Its size doesn't really matter. If this iterative process does not converge, try again with a ten times larger divisor.

We now have Bn'

5) Using Excel or similar, apply the Rasch model with Bn', Di, {Fj}, to obtain Eni' correspond to Bn'

6) If Eni' is not close enough to your desired E, then back to step 4)

with Bn = value of Bn', and Eni = value of Eni'

----------------------------------------------------------------------------------

|CATEGORY    STRUCTURE   |  SCORE-TO-MEASURE   | 50% CUM.| COHERENCE       |ESTIM|

| LABEL    MEASURE  S.E. | AT CAT. ----ZONE----|PROBABLTY| M->C C->M  RMSR |DISCR|

|------------------------+---------------------+---------+-----------------+-----|

|   0      NONE          |( -3.66) -INF   -2.63|         |   0%   0%  .9950|     | 0 Dislike

|   1       -2.54    .61 |   -.89  -2.63    .85|   -2.57 |  69%  83%  .3440| 1.11| 1 Neutral

|   2         .75    .26 |(  1.88)   .85  +INF |     .79 |  81%  72%  .4222| 1.69| 2 Like

----------------------------------------------------------------------------------

M->C = Does Measure imply Category?

C->M = Does Category imply Measure?

CATEGORY LABEL, the number of the category in your data set after scoring/keying.

STRUCTURE MEASURE, is the Rasch-Andrich threshold, the item measure add to the calibrated measure of this transition from the category below to this category. The bottom category has no prior transition, and so that the measure is shown as NONE. The Rasch-Andrich threshold is expected to increase with category value, but these can be disordered. "Dgi + Fgj" locations are plotted in Table 2.4, where "g" refers to the ISGROUPS= assignment. See Rating scale conceptualization.

For structures with only a single polytomous item, the Partial Credit Model, this is is the Rasch model parameter δ ij in the δ ij parametrization of the "Partial Credit" model.)
Item difficulty: Di = average(δ ij) for all thresholds j of item i.
Andrich threshold relative to the item difficulty: Fij = δ ij - Di
Andrich threshold relative to the latent variable: STRUCTURE MEASURE = δ ij = Di + Fij
When we map the location of Di on the latent variable (or inspect the algebra of the PCM model), we discover that Di is at the location on the latent variable where the highest and lowest categories of the item are equally probable. Conveniently, the sufficient statistic for Di is the item's raw score.

STRUCTURE S.E. is an approximate standard error of the Rasch-Andrich threshold measure.

SCORE-TO-MEASURE

These values are plotted in Table 21, "Expected Score" ogives. They are useful for quantifying category measures. This is implied by the Rasch model parameters. See Rating scale conceptualization.

AT CAT is the Rasch-full-point-threshold, the measure (on an item of 0 logit measure) corresponding to an expected score equal to the category label, which, for the rating (or partial credit) scale model, is where this category has the highest probability. See plot below.

( ) is printed where the matching calibration is infinite. The value shown corresponds to the measure .25 score points (or LOWADJ= and HIADJ=) away from the extreme.

--ZONE-- is the Rasch-half-point threshold, the range of measures from an expected score from 1/2 score-point below to the category to 1/2 score-point above it, the Rasch-half-point thresholds. Measures in this range (on an item of 0 measure) are expected to be observed, on average, with the category value. See plot below.

50% CUMULATIVE PROBABILITY gives the location of median probabilities, i.e. these are Rasch-Thurstone thresholds, similar to those estimated in the "Graded Response" or "Proportional odds" models. At these calibrations, the probability of observing the categories below equals the probability of observing the categories equal or above. The .5 or 50% cumulative probability is the point on the variable at which the category interval begins. This is implied by the Rasch model parameters. See Rating scale conceptualization.

COHERENCE

M->C shows what percentage of the measures that were expected to produce observations in this category actually did. Do the measures imply the category?
For RSM, everything is relative to each target item in turn. So,  the 69% for category 1 is a summary across the persons with abilities in the range, "zone", from -2.63  to  .85 logits relative to each item difficulty, which are all the observations where (-2.63 ≤ (Bn-Di)  ≤  .85). Of these, 69% are in category 1. Depending on the targeting of the persons and the items, there could be many eligible observations or only a few.

Guttman's Coefficient of Reproducibility is the count-weighted average of the M->C, i.e., Reproducibility = sum across categories (COUNT * M->C) / sum(COUNT * 100)

C->M shows what percentage of the observations in this category were produced by measures corresponding to the category. Does the category imply the measures?

RMSR is the root-mean-square residual, summarizing the differences between observations in this category and their expectations (excluding observations in extreme scores).

ESTIM DISCR (when DISCRIM=Y) is an estimate of the local discrimination when the model is parameterized in the form: log-odds = aj (Bn - Di - Fj)

OBSERVED - EXPECTED RESIDUAL DIFFERENCE (when shown) is the residual difference between the observed and expected counts of observations in the category. This indicates that the Rasch estimates have not converged to their maximum-likelihood values. These are shown if at least one residual percent >=1%.

residual difference % = (observed count - expected count) * 100 / (expected count)

residual difference value = observed count - expected count

1. Unanchored analyses: These numbers indicate the degree to which the reported estimates have not converged. Usually performing more estimation iterations reduces the numbers.

2. Anchored analyses: These numbers indicate the degree to which the anchor values do not match the current data.

For example,

(a) iteration was stopped early using Ctrl+F or the pull-down menu option.

(b) iteration was stopped when the maximum number of iterations was reached MJMLE=

(c) the convergence criteria LCONV= and RCONV= are not set small enough for this data set.

(d) anchor values (PAFILE=, IAFILE= and/or SAFILE=) are in force which do not allow maximum likelihood estimates to be obtained.

ITEM MEASURE ADDED TO MEASURES, is shown when the rating (or partial credit) scale applies to only one item, e.g., when ISGROUPS=0. Then all measures in these tables are adjusted by the estimated item measure.

CATEGORY PROBABILITIES: MODES - Andrich Thresholds at intersections

P      ++---------+---------+---------+---------+---------+---------++

R  1.0 +                                                             +

O      |                                                             |

B      |00                                                         22|

A      |  0000                                                 2222  |

B   .8 +      000                                           222      +

I      |         000                                     222         |

L      |            00                                 22            |

I      |              00                             22              |

T   .6 +                00                         22                +

Y      |                  00       1111111       22                  |

.5 +                    0  1111       1111  2                    +

O      |                    1**               **1                    |

F   .4 +                  11   00           22   11                  +

|               111       00       22       111               |

R      |             11            00   22            11             |

E      |          111                0*2                111          |

S   .2 +       111                  22 00                  111       +

P      |   1111                  222     000                  1111   |

O      |111                  2222           0000                  111|

N      |              2222222                   0000000              |

S   .0 +22222222222222                                 00000000000000+

E      ++---------+---------+---------+---------+---------+---------++

-3        -2        -1         0         1         2         3

PUPIL  [MINUS] ACT    MEASURE

Curves showing how probable is the observation of each category for measures relative to the item measure. Ordinarily, 0 logits on the plot corresponds to the item measure, and is the point at which the highest and lowest categories are equally likely to be observed. The plot should look like a range of hills. Categories which never emerge as peaks correspond to disordered Rasch-Andrich thresholds. These contradict the usual interpretation of categories as a being sequence of most likely outcomes.

Null, Zero, Unobserved Categories

STKEEP=YES and Category 2 has no observations:

+------------------------------------------------------------------

|CATEGORY   OBSERVED|OBSVD SAMPLE|INFIT OUTFIT|| ANDRICH |CATEGORY|

|LABEL SCORE COUNT %|AVRGE EXPECT|  MNSQ  MNSQ||THRESHOLD| MEASURE|

|-------------------+------------+------------++---------+--------+

|  0   0     378  20|  -.67  -.73|   .96  1.16||  NONE   |( -2.01)|

|  1   1     620  34|  -.11  -.06|   .81   .57||    -.89 |   -.23 |

|  2   2       0   0|            |   .00   .00||  NULL   |    .63 |

|  3   3     852  46|  1.34  1.33|  1.00  1.64||     .89 |(  1.49)|

|  4          20   1|            |            ||  NULL   |        |

+------------------------------------------------------------------

Category 2 is an incidental (sampling)zero. The category is maintained in the response structure.

Category 4 has been dropped from the analysis because it is only observed in extreme scores.

(1) Add one or more dummy person records to the data file with non-extreme scores, but with category 4 for the relevant item or rating scale. Rasch-analysis your data + dummy records, and save the Andrich thresholds: Winsteps "Output Files menu", SFILE=sf.txt

(2) Reanalyze your data. Omit the dummy records. Include SAFILE=sf.txt to anchor the Andrich thresholds at their (1) values.

This does an analysis with reasonable Andrich Thresholds for the the items, but only uses the actual data for the sufficient statistics used to estimate the Di and the theta values.

STKEEP=NO and Category 2 has no observations:

+------------------------------------------------------------------

|CATEGORY   OBSERVED|OBSVD SAMPLE|INFIT OUTFIT|| ANDRICH |CATEGORY|

|LABEL SCORE COUNT %|AVRGE EXPECT|  MNSQ  MNSQ||THRESHOLD| MEASURE|

|-------------------+------------+------------++---------+--------+

|  0   0     378  20|  -.87 -1.03|  1.08  1.20||  NONE   |( -2.07)|

|  1   1     620  34|   .13   .33|   .85   .69||    -.86 |    .00 |

|  3   2     852  46|  2.24  2.16|  1.00  1.47||     .86 |(  2.07)|

+------------------------------------------------------------------

Category 2 is a structural (unobservable) zero. The category is eliminated from the response structure.

Category Matrix : Confusion Matrix : Matching Matrix

-------------------------------------------------------------------------------

| Category Matrix : Confusion Matrix : Matching Matrix                        |

|             Predicted Scored-Category Frequency                             |

|Obs Cat Freq|           0           1           2           3 |        Total |

|------------+-------------------------------------------------+--------------|

|          0 |       12.19       10.22        2.57         .02 |        25.00 |

|          1 |       10.53       13.68        6.67         .11 |        31.00 |

|          2 |        1.93        6.57        8.63         .87 |        18.00 |

|          3 |         .33         .52         .15         .00 |         1.00 |

|------------+-------------------------------------------------+--------------|

|      Total |       24.98       30.99       18.03        1.01 |        75.00 |

-------------------------------------------------------------------------------

When CMATRIX=Yes, a category matrix (confusion matrix, matching matrix) is produced for each rating scale group defined in ISGROUPS=. In this example, there are four categories, 0, 1, 2, 3 observed in the data for the item. The row totals show the observed frequency of each category. According to the Rasch model, there is a probability that each observation will be in any category. These probabilities are summed for each category, and the model-predicted counts are shown in each column. When the data are strongly deterministic (Guttman-like) , then the major diagonal of the matrix will dominate.

In this matrix, there is one observation of category 3 in the data (see Row Total for category 3). Since category frequencies are sufficient statistics for the maximum likelihood estimates, one observation of category 3 is predicted in these data (see Column Total for category 3). However, the Category Matrix tells us that the observation of 3 is predicted to have been a 1 (Row 3, Column 1). And an observation of category 3 would be predicted to replace an observation of 2 (Row 2, Column 3). We can compare this finding with the report in Table 10.6. The observation of 3 has an expected value near 1.

TABLE 10.6

MOST UNEXPECTED RESPONSES

----------------------------------------------------

| DATA |OBSERVED|EXPECTED|RESIDUAL|ST. RES.|MEASDIFF|

|------+--------+--------+--------+--------+--------+

|    3 |      3 |    .83 |   2.17 |   3.23 |  -2.05 |

Category Misfit

Usually any MNSQ (mean-square, red box in figure) less than 2.0 is OK for practical purposes. A stricter rule would be 1.5. Overfit  (values less than 1.0) are almost never a problem.

A bigger problem than category MNSQ misfit is the disordering of the "observed averages" (blue box in Figure). These contradict the Rasch axiom that "higher measure -> higher score on the rating scale".

Also large differences between the "observed average" and the "expected average" (green box in Figure). These indicate that the misfit in the category is systematic in some way.

In principle, an "expected value" is what we would see if the data fit the Rasch model. The "observed" value is what we did see. When the "observed" and "expected" are considerably misaligned, then the validity of the data (as a basis for constructing additive measures) is threatened. However, we can usually take some simple, immediate actions to remedy these defects in the data.

Usually, a 3-stage analysis suffices:

1. Analyze the data. Identify problem areas.

2. Delete problem areas from the data  (PDFILE=, IDFILE=, EDFILE=, CUTLO=, etc.). Reanalyze the data. Produce item and rating-scale anchor files (IFILE=if.txt, SFILE=sf.txt) which contain the item difficulties from the good data.

3. Reanalyze all the data using the item and rating-scale anchor files (IAFILE=if.txt, SAFILE=sf.txt) to force the "good" item difficulties to dominate the problematic data when estimating the person abilities from all the data.

Category MnSq fit statistics

For all observations in the data:

Xni is the observed value

Eni is the expected value of Xni

Wni is the model variance of the observation around its expectation

Pnik is the probability of observing Xni=k

Category Outfit statistic for category k:

[sum ((k-Eni)²/Wni) for all Xni=k] / [sum (Pnik * (k-Eni)²/Wni) for all Xni]

Category Infit statistic for category k:

[sum ((k-Eni)²) for all Xni=k] / [sum (Pnik * (k-Eni)²) for all Xni]

Where does category 1 begin?

When describing a rating-scale to our audience, we may want to show the latent variable segmented into rating scale categories:

0-----------------01--------------------12------------------------2

There are 3 widely-used ways to do this:

1. "1" is the segment on the latent variable from where categories "0" and "1" are equally probable to where categories "1" and "2" are equally probably. These are the Rasch-Andrich thresholds (ANDRICH THRESHOLD) for categories 1 and 2.

2. "1" is the segment on the latent variable from where categories "0" and "1+2" are equally probable to where categories "0+1" and "2" are equally probably. These are the Rasch-Thurstone thresholds (50%  CUM. PROBABILITY) for categories 1 and 2.

3. "1" is the segment on the latent variable from where the expected score on the item is "0.5" to where the expected score is 1.5. These are the Rasch-half-point thresholds (ZONE) for category 1.

Alternatively, we may want a point on the latent variable correspond to the category:

----------0-------------------1------------------------2-----------

4. "1" is the point on the latent variable where the expected average score is 1.0. This is the Rasch-Full-Point threshold (AT CAT.) for category 1.

Help for Winsteps Rasch Measurement Software: www.winsteps.com. Author: John Michael Linacre

The Languages of Love: draw a map of yours!

 Forum Rasch Measurement Forum to discuss any Rasch-related topic

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments, George Engelhard, Jr. & Stefanie Wind Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez
Winsteps Tutorials Facets Tutorials Rasch Discussion Groups

Coming Winsteps & Facets Events
May 22 - 24, 2018, Tues.-Thur. EALTA 2018 pre-conference workshop (Introduction to Rasch measurement using WINSTEPS and FACETS, Thomas Eckes & Frank Weiss-Motz), https://ealta2018.testdaf.de
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 27 - 29, 2018, Wed.-Fri. Measurement at the Crossroads: History, philosophy and sociology of measurement, Paris, France., https://measurement2018.sciencesconf.org
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
July 25 - July 27, 2018, Wed.-Fri. Pacific-Rim Objective Measurement Symposium (PROMS), (Preconference workshops July 23-24, 2018) Fudan University, Shanghai, China "Applying Rasch Measurement in Language Assessment and across the Human Sciences" www.promsociety.org
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

Our current URL is www.winsteps.com