Rater misbehavior |
This is for 32-bit Facets 3.87. Here is Help for 64-bit Facets 4
The fit statistics in Facets help us to detect many types of misbehavior. For instance, central tendency usually makes raters too predictable, so that their infit and outfit mean-square statistics are noticeably less than 1.0. Also, if you model each rater to have a unique rating scale, by using # instead of ? for the rater facet in the Models= specification, then you will see in Table 8 that the rater has an unusually high number of ratings in the central categories.
Is there too much rater bias identified? Are there are some persons with idiosyncratic profiles who should be eliminated before the rater bias analysis is taken too seriously?
You also need to identify how big the bias has to be before it makes a substantive difference. Perhaps "Obs-Exp Average Difference" needs to be at least 0.5 score-points.
Then you have to decide what type of rater agreement you want.
Do you want the raters to agree exactly with each other on the ratings awarded? The "rater agreement %".
Do you want the raters to agree about which performances are better and which are worse? Correlations
Do you want the raters to have the same leniency/severity? "1 - Separation Reliability" or "Fixed Chi-squared"
Do you want the raters to behave like independent experts? Rasch fit statistics
Numerous types of rater misbehavior are identified in the literature. Here are some approaches to identifying them. Please notify us if you discover useful ways to identify misbehavior.
A suggested procedure:
(a) Model all raters to share a common understanding of the rating scale:
Models = ?,?,?,R9 ; the model for your facets and rating scale
Interrater= 2 ; 2 or whatever is the number of your rater facet
In the rater facet report (Table 7):
How much difference in rater severity/leniency is reported? Are there outliers?
Are rater fit statistics homogeneous?
Does inter-rater agreement indicate "scoring machines" or "independent experts"?
In the rating scale report (Table 8):
Is overall usage of the categories as expected?
(b) Model each rater to have a personal understanding of the rating scale:
Models = ?,#,?,R9 ; # marks the rater facet
Interrater = 2 ; 2 or whatever is the number of your rater facet
In the rating scale report (Table 8):
For each rater: is overall usage of the categories as expected?
Are their specific problems, e.g., high or low frequency categories. unobserved categories, average category measures disordered?
(c) Look for rater-item interactions, and rater-demographic interactions:
Models =
?,?B,?,?B,R9 ; Facet 4 is a demographic facet (e.g., gender, sex): rater-gender interaction (bias)
?,?B,?B,?,R9 ; Facet 3 is the items: rater-item interaction (bias)
*
In the bias/interaction report (Table 14):
Are any raters showing large and statistically significant interactions?
Known rater misbehaviors:
1. Leniency/Severity/Generosity.
This is usually parameterized directly in the "Rater" facet, and measures are automatically adjusted for it.
2. Extremism/Central Tendency.
Tending to award ratings in the extreme, or in the central, categories.
This can be identified by modeling each rater to have a separate rating scale (or partial credit). Those with very low central probabilities exhibit extremism. Those with very high central probabilities exhibit central tendency.
3. Halo/"Carry Over" Effects.
One attribute biases ratings with respect to other attributes. This requires that we know the order in which ratings are assigned for each person. If we know this, then we can measure all the persons and raters using only the rating of the first item rated. Then anchor everything, including anchoring the other items at the difficulty of the first item. In this anchored analysis of all the data, the raters with the lowest mean-squares are the ones most likely to have a halo effect.
4. Response Sets.
The ratings are not related to the ability of the subjects.
Anchor all persons at the same ability, usually 0. Raters who best fit this situation are most likely to be exhibiting response sets.
5. Playing it safe.
The rater attempts to give a rating near the other raters, rather than independently.
Specify the Inter-rater= facet and monitor the "agreement" percentage. The rater also tends to overfit.
6. Instability.
Rater leniency changes from situation to situation.
Include the "situation" as a dummy facet (e.g., rating session), and investigate rater-situation interactions using "B" in the Models= statements.
Help for Facets Rasch Measurement and Rasch Analysis Software: www.winsteps.com Author: John Michael Linacre.
Facets Rasch measurement software.
Buy for $149. & site licenses.
Freeware student/evaluation Minifac download Winsteps Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation Ministep download |
---|
Forum: | Rasch Measurement Forum to discuss any Rasch-related topic |
---|
Questions, Suggestions? Want to update Winsteps or Facets? Please email Mike Linacre, author of Winsteps mike@winsteps.com |
---|
State-of-the-art : single-user and site licenses : free student/evaluation versions : download immediately : instructional PDFs : user forum : assistance by email : bugs fixed fast : free update eligibility : backwards compatible : money back if not satisfied Rasch, Winsteps, Facets online Tutorials |
---|
Coming Rasch-related Events | |
---|---|
May 17 - June 21, 2024, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 12 - 14, 2024, Wed.-Fri. | 1st Scandinavian Applied Measurement Conference, Kristianstad University, Kristianstad, Sweden http://www.hkr.se/samc2024 |
June 21 - July 19, 2024, Fri.-Fri. | On-line workshop: Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com |
Aug. 5 - Aug. 7, 2024, Mon.-Wed. | 2024 Inaugural Conference of the Society for the Study of Measurement (Berkeley, CA), Call for Proposals |
Aug. 9 - Sept. 6, 2024, Fri.-Fri. | On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com |
Oct. 4 - Nov. 8, 2024, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Jan. 17 - Feb. 21, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
May 16 - June 20, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 20 - July 18, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com |
Oct. 3 - Nov. 7, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Our current URL is www.winsteps.com
Winsteps® is a registered trademark