Table 7 Reliability and Chi-square Statistics

Up  Previous  Next

Table 7 also provides summary statistics by facet.

 

Table 7.3.1  Reader Measurement Report  (arranged by MN).

----------------------------------------------------------------------------------------------------------------

|  Obsvd  Obsvd  Obsvd  Fair-M|        Model | Infit      Outfit   |Estim.| Exact Agree. |                     |

|  Score  Count Average Avrage|Measure  S.E. | MnSq ZStd  MnSq ZStd|Discrm| Obs %  Exp % | Nu Reader           |

----------------------------------------------------------------------------------------------------------------

|   460.8    96.0   4.8   4.73|    .00   .08 | 1.00  -.1   .99  -.2|      |              | Mean (Count: 12)    |

|    29.5      .0    .3    .32|    .19   .00 |  .23  1.8   .22  1.7|      |              | S.D. (Population)   |

|    30.8      .0    .3    .33|    .20   .00 |  .24  1.9   .23  1.8|      |              | S.D. (Sample)       |

----------------------------------------------------------------------------------------------------------------

Model, Populn: RMSE .08  Adj (True) S.D. .17  Separation 2.17  Reliability (not inter-rater) .82

Model, Sample: RMSE .08  Adj (True) S.D. .18  Separation 2.28  Reliability (not inter-rater) .84

Model, Fixed (all same) chi-square: 66.2  d.f.: 11  significance (probability): .00

Model,  Random (normal) chi-square: 9.4  d.f.: 10  significance (probability): .49

Rater agreement opportunities: 384  Exact agreements: 108 = 28.1%  Expected: 82.6 = 21.5%

----------------------------------------------------------------------------------------------------------------

 

Mean =        arithmetic average
Count =        number of elements reported
S.D. (Populn)is the standard deviation when this sample comprises the entire population

S.D. (Sample)        is the standard deviation when this sample is a random sample from the population.

If there are "more like this" elements in a facet beyond the current elements: use the Sample statistics, e.g., candidates, items (usually), tasks, ....

If the element list includes every possible element for the facet: use the Population statistics, e.g., grade levels, genders, ...

With extremesincluding elements with extreme (zero and perfect) scores
Without extremesexcluding elements with extreme (zero and perfect) scores
Model        Estimated as though all noise is due to model-predicted stochasticity (i.e., the best-case situation)
Real        Estimated as though all unpredicted noise is contradicting model expectations (i.e., the worst-case situation
RMSE        root mean square standard error for all non-extreme measures.
Adj (True) S.D.sample standard deviation of the estimates after adjusting for measurement error
SeparationAdj "true" S.D. / RMSE, a measure of the spread of the estimates relative to their precision. The signal-to-noise ratio is the "true" variance/error variance = Separation². See also Separation.
Reliability (not inter-rater)Rasch equivalent to the KR-20 or Cronbach Alpha statistic, i.e., the ratio of "True variance" to "Observed variance". This shows how different the measures are, which may or may not indicate how "good" the test is. High (near 1.0) person and item reliabilities are preferred. This reliability is somewhat the opposite of an interrater reliability, so low (near 0.0) judge and rater reliabilities are preferred. See also Reliability.

 

Fixed (all same) chi-square: A test of the "fixed effect" hypothesis: "Can this set of elements be regarded as sharing the same measure after allowing for measurement error?" The chi-square value and degrees of freedom (d.f.) are shown. The significance is the probability that this "fixed" hypothesis is the case. Depending on the sub-Table, this tests the hypothesis: "Can these items be thought of as equally difficult?" The precise statistical formulation is:

       wi = 1/SE²i        for i=1,L, where L is the number of items, and Di is the difficulty/easiness of item i.

       chi-square = Σ(wi.D²i) - Σ( wi.Di)²/ Σwi  with d.f. = L-1

Or this tests the hypothesis: "Can these raters be thought of as equally lenient?" Is there a statistically significant rater effect?

       The precise statistical formulation is:

       wj = 1/SE²j        for j=1,J, where J is the number of raters, and Cj is the leniency/severity of rater j.

       chi-square = Σ(wj.C²j) - Σ( wj.Cj)²/ Σwj  with d.f. = J-1

And so on ....

 

Random (normal) chi-square: A test of the "random effects" hypothesis: "Can this set of elements be regarded as a random sample from a normal distribution?" The significance is the probability that this "random" hypothesis is the case. This tests the hypothesis: "Can these persons (items, raters, etc.) be thought of as sampled at random from a normally distributed population?" The precise statistical formulation is:

       var(D) = Σ(Di-Dmean)²/(L-1) - ( ΣSE²i)/L

       wi = 1/(var(D)+SE²i)

       chi-square = Σ(wi.D²i) - ( Σwi.Di)²/ Σwi  with d.f. = L-2

 

Rater agreement opportunities: see Table 7 Agreement statistics


Help for Facets Rasch Measurement Software: www.winsteps.com.