Rater misbehavior

Is there too much bias identified? Are there are some persons with idiosyncratic profiles who should be eliminated before the rater bias analysis is taken too seriously?

 

You also need to identify how big the bias has to be before it makes a substantive difference. Perhaps "Obs-Exp Average Difference" needs to be at least 0.5 score-points.

 

Then you have to decide what type of rater agreement you want.

 

Do you want the raters to agree exactly with each other on the ratings awarded? The "rater agreement %".

 

Do you want the raters to agree about which performances are better and which are worse? Correlations

 

Do you want the raters to have the same leniency/severity? "1 - Separation Reliability" or "Fixed Chi-square"

 

Do you want the raters to behave like independent experts? Rasch fit statistics

 

Numerous types of rater misbehavior are identified in the literature. Here are some approaches to identifying them. Please notify us if you discover useful ways to identify misbehavior.

 

A suggested procedure:

(a) Model all raters to share a common understanding of the rating scale:

Models = ?,?,?,R9 ; the model for your facets and rating scale

Interrater = 2  ; 2 or whatever is the number of your rater facet

In the rater facet report (Table 7):

 How much difference in rater severity/leniency is reported? Are there outliers?

 Are rater fit statistics homogeneous?

 Does inter-rater agreement indicate "scoring machines" or "independent experts"?

In the rating scale report (Table 8):

 Is overall usage of the categories as expected?

 

(b) Model each rater to have a personal understanding of the rating scale:

Models = ?,#,?,R9 ; # marks the rater facet

Interrater = 2  ; 2 or whatever is the number of your rater facet

In the rating scale report (Table 8):

 For each rater: is overall usage of the categories as expected?

Are their specific problems, e.g., high or low frequency categories. unobserved categories, average category measures disordered?

 

(c) Look for rater-item interactions, and rater-demographic interactions:

Models =

?,?B,?,?B,R9 ; Facet 4 is a demographic facet (e.g., gender, sex): rater-gender interaction (bias)

?,?B,?B,?,R9 ; Facet 3 is the items: rater-item interaction (bias)

*

In the bias/interaction report (Table 14):

Are any raters showing large and statistically significant interactions?

 

Known rater misbehaviors:

1. Leniency/Severity/Generosity.

This is usually parameterized directly in the "Rater" facet, and measures are automatically adjusted for it.

 

2. Extremism/Central Tendency.

Tending to award ratings in the extreme, or in the central, categories.

This can be identified by modeling each rater to have a separate rating scale (or partial credit). Those with very low central probabilities exhibit extremism. Those with very high central probabilities exhibit central tendency.

 

3. Halo/"Carry Over" Effects.

One attribute biases ratings with respect to other attributes.

Anchor all items at the same difficulty, usually 0. Raters who best fit this situation are most likely to be exhibiting "halo" effect.

 

4. Response Sets.

The ratings are not related to the ability of the subjects.

Anchor all persons at the same ability, usually 0. Raters who best fit this situation are most likely to be exhibiting response sets.

 

5. Playing it safe.

The rater attempts to give a rating near the other raters, rather than independently.

Specify the Inter-rater= facet and monitor the "agreement" percentage. The rater also tends to overfit.

 

6. Instability.

Rater leniency changes from situation to situation.

Include the "situation" as a dummy facet (e.g., rating session), and investigate rater-situation interactions using "B" in the Models= statements.


Help for Facets Rasch Measurement Software: www.winsteps.com Author: John Michael Linacre.
 

The Languages of Love: draw a map of yours!

For more information, contact info@winsteps.com or use the Contact Form
 

Facets Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation download
Winsteps Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation download

State-of-the-art : single-user and site licenses : free student/evaluation versions : download immediately : instructional PDFs : user forum : assistance by email : bugs fixed fast : free update eligibility : backwards compatible : money back if not satisfied
 
Rasch, Winsteps, Facets online Tutorials

 

Forum Rasch Measurement Forum to discuss any Rasch-related topic

Click here to add your email address to the Winsteps and Facets email list for notifications.

Click here to ask a question or make a suggestion about Winsteps and Facets software.

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments, George Engelhard, Jr. & Stefanie Wind Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez
Winsteps Tutorials Facets Tutorials Rasch Discussion Groups

 

Coming Winsteps & Facets Events
May 22 - 24, 2018, Tues.-Thur. EALTA 2018 pre-conference workshop (Introduction to Rasch measurement using WINSTEPS and FACETS, Thomas Eckes & Frank Weiss-Motz), https://ealta2018.testdaf.de
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 27 - 29, 2018, Wed.-Fri. Measurement at the Crossroads: History, philosophy and sociology of measurement, Paris, France., https://measurement2018.sciencesconf.org
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
July 25 - July 27, 2018, Wed.-Fri. Pacific-Rim Objective Measurement Symposium (PROMS), (Preconference workshops July 23-24, 2018) Fudan University, Shanghai, China "Applying Rasch Measurement in Language Assessment and across the Human Sciences" www.promsociety.org
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

 

Our current URL is www.winsteps.com

Winsteps® is a registered trademark
 

John "Mike" L.'s Wellness Report: I'm 72, take no medications and, March 2018, my doctor is annoyed with me - I'm too healthy!
According to Wikipedia, the human body requires about 30 minerals, maybe more. There are 60 naturally-occurring minerals in the liquid Mineral Supplement which I take daily.