Disordered rating or partial credit structures (Categories or Rasch-Andrich thresholds)

Summary: If the average measures for the person sample advance across categories for an item or rating scale, then the categories support the Rasch axiom that:  "higher scores on the item <-> higher measures for the sample".

 

The chief purpose for collapsing categories is to enable inferences to be made at the item-category-level:

"as person measures increase, each category in turn is more probable than any one of the others to be observed".

If you want to make this statement, please collapse categories adjacent to disordered thresholds (and lose some measurement information). Otherwise, not.

 

"Disordering" is a contentious issue in Rasch analysis. It is often misunderstood by novice analysts, the type of analysts eager to get papers published and their feet firmly on the academic/professional ladder.  More experienced analysts would tend to produce more ambivalent findings, suggesting alternate interpretations and actions. This type of finding is less likely to be submitted/accepted for publication because it appears to be wishy-washy. My own recommendation is usually that "threshold disordering" is a minor problem, (only relevant if category-level inferences are to be drawn from the data about individuals,) provided that "category disordering" (disordering of the substantive meanings of the categories) is not observed in the data. Unfortunately novice analysts may confuse "threshold disordering" with "category disordering" and so make incorrect statements about the data and the rating scales that generate it.

 

In my experience, category disordering is observed when

(1) raters are asked to rate in more categories than they can discriminate, e.g., "on a scale from 0-100, rate the person's cheerfulness".

(2) category definitions are not clearly ordered, e.g.,

1=never, 2=rarely, 3=occasionally, 4=sometimes, 5=often, 6=frequently, 7=always

(3) arbitrary rules distort the rating-scale, e.g., "if the behavior is not observed by the rater or not allowed or not possible, then rate the person 1=never". So that "does the person use the stairs" is rated "1" for all persons in a facility without stairs.

 

"Threshold disordering" occurs when a category corresponds to a narrow interval of the latent variable, e.g., "almost always" in

1=never, 2=sometimes, 3=often, 4=almost always, 5=always

Here the threshold between categories 3 and 4 will be disordered, even if the raters can clearly discriminate the 5 different levels.

 


 

Effect of collapsing categories on measurement

 

In general, reducing the number of categories tends to improve the fit of the data to the model at the expense of losing some of the statistical information in the ratings. You will probably see the impact of the loss of information in the Person Reliability value.

 

For instance, if the person reliability drops from 0.9 to 0.8, we can use the Spearman-Brown Prophecy Formula to tell us what the loss of information is equivalent to in terms of items lost. If the original number of items is, say, 10, then reducing the number of categories in the rating scale is equivalent to reducing the number of items to:

Items =  10 * 0.8 * (1-0.9) / ( (1-0.8)*0.9) = 4.4, so items lost = 10 - 4.4 = 5.6

 

In general, we expect: items lost = (original item count)*(original category count - new category count) / (original category count - 1)

 

This only matters if the effect of the reduction in categories is to make the Person Reliability too small to discriminate the desired number of ability strata in the target person population. www.rasch.org/rmt/rmt63i.htm

 


 

There is considerable debate in the Rasch community about the meaning of rating (or partial credit) scales and polytomies which exhibit "disorder". Look at Table 3.2, distractor/option analysis. Two types of disorder have been noticed:

 

(i) Disorder in the "average measures" of the categories can imply disorder in the category definitions.

 

 

In this example, from Linacre, J.M. (1999) Category Disordering vs. Step Disordering, Rasch Measurement Transactions 13:1 p. 675, "FIMÔ Level" categories have been deliberately disordered in the data. It is seen that this results in disordering of the "average measures" or "observed averages", the average abilities of the people observed in each category, and also large mean-square fit statistics. The "Rasch-Andrich thresholds", also called "step calibrations", "step difficulties", "step measures", "deltas", "taus", etc., remain ordered.

 

(ii) Disordered Rasch-Andrich thresholds imply less frequently observed intermediate categories, i.e., that they correspond to narrow intervals on the latent variable.

 

 

In this example, the FIM categories are correctly ordered, but the frequency of level 2 has been reduced by removing some observations from the data. Average measures and fit statistics remain well behaved. The disordering in the Andrich thresholds now reflects the relative infrequency of category 2. This infrequency is pictured in plot of probability curves which shows that category 2 is never a modal category in these data. The Andrich Threshold values do not indicate whether measurement would be improved by collapsing levels 1 and 2, or collapsing levels 2 and 3, relative to leaving the categories as they stand.

 

 

Example: Here are the category probability curves for  a Likert rating scale: 1=Strongly disagree, 2=Disagree, 3=Neutral, 4=Agree, 5=Strongly Agree. 3="Neutral" is relatively rarely observed. It has disordered thresholds. The point of equal probability between categories 3 and 4, the Rasch-Andrich threshold for category 4, is less than the The point of equal probability between categories 2 and 3, the Rasch-Andrich threshold for category 3.

 

        CATEGORY PROBABILITIES: MODES - Andrich Thresholds at intersections

P      -+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-

R  1.0 +                                                             +

O      |111111                                                5555555|

B      |      111                                          555       |

A      |         111                                     55          |

B   .8 +            1                                  55            +

I      |             11                               5              |

L      |               1                            55               |

I      |                1                          5                 |

T   .6 +                 11                       5                  +

Y      |                   1                     5                   |

    .5 +                    1 22222        444  5                    +

O      |                    2*     22    44   4*                     |

F   .4 +                  22  1      2  4     5 44                   +

       |                 2     1      24     5    4                  |

R      |               22       1     42    5      44                |

E      |              2          1   4  2  5         44              |

S   .2 +            22            13*3333*5            44            +

P      |         222             33*1   55*33            44          |

O      |      222             33344  1 5   2 33            444       |

N      |222222           33333444   55*11   22233333          4444444|

S   .0 +**********************555555     111111**********************+

E      -+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-

 

There are several options:

1.Do not change the rating scale. We do this when recoding the categories in any way would confuse our audience, or the threshold disordering is not important for our inferences. For example: scoring Olympic Ice-Skating in Exam15.txt. This has a very long rating-scale for a small sample of skaters, so that there are many disordered thresholds and unobserved categories.

2.Rewrite the rating scale in the attitude survey without category 3: 1=Strongly disagree, 2=Disagree, 3=Agree, 4=Strongly Agree. This changes the meaning of all the categories, so we must then re-administer the survey.

3.Recode "Neutral" as missing (not administered) data. We do this when we think "Neutral" means "I do not want to tell you what I really think."

4.Recode "Neutral" as "Disagree". We do this when we want to be sure that "Agree" means that respondents truly agree.

5.Recode "Neutral" as "Agree". We do this when we want to be sure that "Disagree" means that respondents truly disagree.

6.Recode "Neutral" as the nearer of "Disagree" and "Agree" according to the empirical data. Please look at the "Observed Average Measures" (OBSVD AVRGE) for each category, and recode "Neutral" accordingly. In this example, Neutral (.12) is nearer to Disagree (-.96) than to Agree (1.60)

---------------------------------------

|CATEGORY   OBSERVED|OBSVD  | ANDRICH |

|LABEL SCORE COUNT %|AVRGE  |THRESHOLD|

|-------------------+-------+---------+

...

|  2   2     453  32|  -.96 |   -1.37 | 2 Disagree

|  3   3      35   2|   .12 |    0.58 | 3 Neutral

|  4   4     537  37|  1.60 |   -0.42 | 4 Agree

...

---------------------------------------


Help for Winsteps Rasch Measurement Software: www.winsteps.com. Author: John Michael Linacre

For more information, contact info@winsteps.com or use the Contact Form
 

Facets Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation download
Winsteps Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation download

State-of-the-art : single-user and site licenses : free student/evaluation versions : download immediately : instructional PDFs : user forum : assistance by email : bugs fixed fast : free update eligibility : backwards compatible : money back if not satisfied
 
Rasch, Winsteps, Facets online Tutorials

 

Forum Rasch Measurement Forum to discuss any Rasch-related topic

Click here to add your email address to the Winsteps and Facets email list for notifications.

Click here to ask a question or make a suggestion about Winsteps and Facets software.

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments, George Engelhard, Jr. & Stefanie Wind Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez
Winsteps Tutorials Facets Tutorials Rasch Discussion Groups

 


 

 
Coming Rasch-related Events
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
Jan. 22-24, 2018, Mon-Wed. In-person workshop: Rasch Measurement for Everybody en español (A. Tristan, Winsteps), San Luis Potosi, Mexico. www.ieia.com.mx
April 10-12, 2018, Tues.-Thurs. Rasch Conference: IOMW, New York, NY, www.iomw.org
April 13-17, 2018, Fri.-Tues. AERA, New York, NY, www.aera.net
May 22 - 24, 2018, Tues.-Thur. EALTA 2018 pre-conference workshop (Introduction to Rasch measurement using WINSTEPS and FACETS, Thomas Eckes & Frank Weiss-Motz), https://ealta2018.testdaf.de
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 27 - 29, 2018, Wed.-Fri. Measurement at the Crossroads: History, philosophy and sociology of measurement, Paris, France., https://measurement2018.sciencesconf.org
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
July 25 - July 27, 2018, Wed.-Fri. Pacific-Rim Objective Measurement Symposium (PROMS), (Preconference workshops July 23-24, 2018) Fudan University, Shanghai, China "Applying Rasch Measurement in Language Assessment and across the Human Sciences" www.promsociety.org
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Sept. 3 - 6, 2018, Mon.-Thurs. IMEKO World Congress, Belfast, Northern Ireland www.imeko2018.org
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

 

 

Our current URL is www.winsteps.com

Winsteps® is a registered trademark
 


 
Concerned about aches, pains, youthfulness? Mike and Jenny suggest Liquid Biocell