Table 7 Agreement Statistics

When inter-rater= is used to specify a rater facet, then a count of the situations in which ratings are given in identical circumstances by different raters is made.

 

If exact inter-rater statistics are required, please do a special run of Facets in which all unwanted facets are Xed out, so that matching only occurs on facets relevant to agreement. For instance, if "rater gender" is irrelevant to agreement, then X out that facet in the Models= specifications.

 

The percent of times those ratings are identical is reported, along with its expected value. This supports an investigation as to whether raters are rating as "independent experts" or as "rating machines". The report is:

 

Table 7.3.1  Reader Measurement Report  (arranged by MN).

------------------------------------------------------------------------------------------------

| Obsvd   Obsvd  Obsvd  Fair-M|        Model | Infit      Outfit   | Exact Agree. |            |

| Score   Count Average Avrage|Measure  S.E. |MnSq ZStd  MnSq ZStd | Obs %  Exp % | Nu Reader  |

------------------------------------------------------------------------------------------------

|   1524    288     5.3   5.26|   -.30   .05 | 1.2   2    1.2   2  |  28.2   20.9 |  8 8       |

|   1455    288     5.1   5.00|   -.16   .05 |  .5  -7     .5  -7  |  30.8   21.6 |  4 4       |

....

------------------------------------------------------------------------------------------------

RMSE (Model)  .05 Adj S.D.  .19  Separation  4.02  Strata  5.69  Reliability  .94

......

Inter-Rater agreement opportunities: 60480  Exact agreements: 17838 = 29.5%  Expected: 13063.2 = 21.6%

------------------------------------------------------------------------------------------------

 

Exact Agree. is exact agreements under identical rating conditions. Agreement on qualitative levels relative to the lowest observed qualitative level.

So, imagine all your ratings are 4,5,6 and all my ratings are 1,2,3.

If we use the (shared) Rating Scale model. Then we will have no exact agreements.

But if we use the (individual) Partial Credit model, #, then we agree when you rate a 4 (your bottom observed category) and I rate a 1 (my bottom observed category). Similarly, your 5 agrees with my 2, and your 6 agrees with my 3.

If you want "exact agreement" to mean "exact agreement of data values", then please use the Rating Scale model statistics.

 

Obs % = Observed % of exact agreements between raters on ratings under identical conditions.

Exp % = Expected % of exact agreements between raters on ratings under identical conditions, based on Rasch measures.
If Obs % ≈  Exp % then the raters may be behaving like independent experts.
If Obs % » Exp % then the raters may be behaving like "rating machines".

 

Here is the computation for "Expected Agreement %". We pair up another rater with the target rater who rated the same ratee on the same item of the same task of the same ......, so the raters rated the same performance under identical circumstance.

 

Then, for each rater we have an observed rating. They agree or not. The percentage of times raters agree with the target rater is the "Observed Agreement%"

 

For each rater we also have an (average) expected rating based on the Rasch measures. The (average) expected ratings will not agree unless the raters have the same leniency/severity measure.

 

But we also have the Rasch-model-based probabilities for each category of the rating scale for each rater. Suppose this is a 1,2,3 (3-category) rating scale.

 

Rater A

Rater B

Expected agreement between Raters A and B

(assuming they are rating independently)

probability of category 1 = 10%

probability of category 2 = 40%

probability of category 3 = 50%

probability of category 1 = 20%

probability of category 2 = 60%

probability of category 3 = 20%

Category 1 10%*20% = 2%

Category 2 40%*60% = 24%

Category 3 50%*20% = 10%

Expected agreement in any category = 2+24+10% = 36%

 

This expected-agreement computation is performed over all pairs of raters and averaged to obtain the reported "Expected Agreement %".

 

Higher than expected agreement indicates statistical local dependence among the raters. This biases all the standard errors towards zero. An approximate guideline is:
"True" Standard error = "Reported Standard Error" * Maximum( 1, sqrt (Exact agreements / Expected)) for all elements.

In this example, the inflator for the S.E.'s of all elements of all facets approximates sqrt( 17838/13063.2) = 1.17.

 

Alternatively, deflate the reported person-facet reliability, R, in accordance with the extent to which the raters are not independent. Based on the Spearman-Brown prophecy formula, an approximation is:
T = (100 - observed exact agreement%) / (100 - expected exact agreement%)
deflated reliability = T * R / ( (1-R) + T * R)

 

Example: 100 raters with a wide range of rater severity/leniency:

 

Exact agreements

781=18.8%

Expected

577.5=13.9%

 

With this large spread of rater severities, the prediction is that only 13.9% of the observations will show the raters giving the same rating under the same conditions. This accords with the wide range of severities.

There is somewhat more agreement than this in the data, 18.8%. This is typical of the psychology of rater behavior. We are conditioned from baby-hood to agree with what we conceive to be the expectations of others. This behavior continues even for expert raters. Subconsciously they continue to have a mental pressure to agree with the expectations of others. In this case, that pressure has increased observed agreement from 13.9% to 18.8%.

Whether you report this depends on the purpose for your paper. If it is an investigation into rater behavior, then this provides empirical evidence for a psychological conjecture. If your paper is a validity study of the instrument, then this aspect is probably too obscure to be meaningful for your audience.

 

See more at Inter-rater Reliability and Inter-rater correlations

 

 


Help for Facets Rasch Measurement Software: www.winsteps.com Author: John Michael Linacre.
 

For more information, contact info@winsteps.com or use the Contact Form
 

Facets Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation download
Winsteps Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation download

State-of-the-art : single-user and site licenses : free student/evaluation versions : download immediately : instructional PDFs : user forum : assistance by email : bugs fixed fast : free update eligibility : backwards compatible : money back if not satisfied
 
Rasch, Winsteps, Facets online Tutorials

 

Forum Rasch Measurement Forum to discuss any Rasch-related topic

Click here to add your email address to the Winsteps and Facets email list for notifications.

Click here to ask a question or make a suggestion about Winsteps and Facets software.

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments, George Engelhard, Jr. & Stefanie Wind Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez
Winsteps Tutorials Facets Tutorials Rasch Discussion Groups

 

Coming Rasch-related Events
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
Jan. 22-24, 2018, Mon-Wed. In-person workshop: Rasch Measurement for Everybody en español (A. Tristan, Winsteps), San Luis Potosi, Mexico. www.ieia.com.mx
April 10-12, 2018, Tues.-Thurs. Rasch Conference: IOMW, New York, NY, www.iomw.org
April 13-17, 2018, Fri.-Tues. AERA, New York, NY, www.aera.net
May 22 - 24, 2018, Tues.-Thur. EALTA 2018 pre-conference workshop (Introduction to Rasch measurement using WINSTEPS and FACETS, Thomas Eckes & Frank Weiss-Motz), https://ealta2018.testdaf.de
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 27 - 29, 2018, Wed.-Fri. Measurement at the Crossroads: History, philosophy and sociology of measurement, Paris, France., https://measurement2018.sciencesconf.org
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
July 25 - July 27, 2018, Wed.-Fri. Pacific-Rim Objective Measurement Symposium (PROMS), (Preconference workshops July 23-24, 2018) Fudan University, Shanghai, China "Applying Rasch Measurement in Language Assessment and across the Human Sciences" www.promsociety.org
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Sept. 3 - 6, 2018, Mon.-Thurs. IMEKO World Congress, Belfast, Northern Ireland www.imeko2018.org
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

 

Our current URL is www.winsteps.com

Winsteps® is a registered trademark
 

Concerned about aches, pains, youthfulness? Mike and Jenny suggest Liquid Biocell