Table 30.1 Differential item functioning DIF pairwise

Table 30 supports the investigation of item bias, Differential Item Functioning (DIF), i.e., interactions between individual items and types of persons.



30.1 is best for pairwise comparisons, e.g., Females vs. Males.

30.2 DIF report (measure list: person class within item)

30.3 DIF report (measure list: item within person class)

30.4 DIF report (item-by-person class chi-squares)

30.5 Within-class fit report (person class within item)

30.6 Within-class fit report (item within person class)

30.7 Item measure profiles for classes of persons

Excel DIF Plots


In Table 30.1 - the hypothesis is "this item has the same difficulty for two groups"
In Table 30.2, 30.3 - the hypothesis is "this item has the same difficulty as its average difficulty for all groups"

In Table 30.4 - the hypothesis is "this item has no overall DIF across all groups"


Table 30.1 reports a probability and a size for DIF statistics. Usually we want:

1. probability so small that it is unlikely that the DIF effect is merely a random accident

2. size so large that the DIF effect has a substantive impact on scores/measures on the test


A general thought: Significance tests, such as DIF tests, are always of doubtful value in a Rasch context, because differences can be statistically significant, but far too small to have any impact on the meaning, or practical use, of the measures. So we need both statistical significance and substantive difference before we take action regarding bias, etc.


Table 30.1 is a pairwise DIF (bias) analysis: this is testing "item difficulty for Group A vs. item difficulty for Group B". Table 30.1 makes sense if there are only two groups, or there is one majority reference group.


Tables 30.2 and 30.3 are a global DIF (bias) analysis: this is testing "item difficulty for Group A vs. item difficulty for all groups combined." Tables 30.2 and 30.3 make sense when there are many small groups, e.g., age-groups in 5 year increments from 0 to 100.


DIF results are considerably influenced by sample size, so if you have only two person-groups, go to Table 30.1. If you have lots of person-groups go to Table 30.2


Specify DIF= for person classifying indicators in person labels. Item bias and DIF are the same thing. The widespread use of "item bias" dates to the 1960's, "DIF" to the 1980's. The reported DIF is corrected to test impact, i.e., differential average performance on the whole test. Use ability stratification to look for non-uniform DIF using the selection rules. Tables 30.1 and 30.2 present the same information from different perspectives.


From the Output Tables menu, the DIF/DPF dialog is displayed.


Table 31 supports person bias, Differential Person Functioning (DPF), i.e., interactions between individual persons and classifications of items.


Table 33 reports bias or interactions between classifications of items and classifications of persons.


In these analyses, persons with extreme scores are excluded, because they do not exhibit differential ability across items. For background discussion, see DIF and DPF considerations.


Example output:

You want to examine item bias (DIF) between Females and Males. You need a column in your Winsteps person label that has two (or more) demographic codes, say "F" for female and "M" for male (or "0" and "1" if you like dummy variables) in column 9.


Table 30.1 is best for pairwise comparisons, e.g., Females vs. Males.

DIF class specification is: DIF=@GENDER



| KID   Obs-Exp   DIF   DIF   KID   Obs-Exp   DIF   DIF      DIF    JOINT  Rasch-Welch   Mantel-Haenszel Size Active TAP          |

| CLASS Average MEASURE S.E.  CLASS Average MEASURE S.E.  CONTRAST  S.E.   t  d.f. Prob. Chi-squ Prob. CUMLOR Slices Number  Name |


| F        .00   -6.59E  .00  M        .00   -6.59E  .00       .00   .00   .00   0 1.000                                  1 1-4   |

| F        .04   -5.24> 1.90  M       -.04   -3.87   .90     -1.37  2.10  -.65  28 .5194   .0000 1.000             7      4 1-3-4 |

| F        .01   -1.67   .68  M       -.01   -1.48   .70      -.19   .97  -.19  31 .8468   .1316 .7167   -.06      7     10 2-4-3-|


| M        .00   -6.59E  .00  F        .00   -6.59E  .00       .00   .00   .00  30 1.000                                  1 1-4   |


Width of Mantel-Haenszel slice: MHSLICE = .010 logits


The most important numbers in Table 30.1: The DIF CONTRAST is the difference in difficulty of the item between the two groups. This should be at least 0.5 logits for DIF to be noticeable. "Prob." shows the probability of observing this amount of contrast by chance, when there is no systematic item bias effect. For statistically significance DIF on an item, Prob. ≤ .05.


DIF class specification defines the columns used to identify DIF classifications, using DIF= and the selection rules.

For summary statistics on each class, use Table 28.


Reading across the Table 30.1 columns:

PERSON CLASS identifies the CLASS of persons. PERSON is specified with PERSON=, e.g., the first here is CLASS is "A".


Obs-Exp Average is the average difference between the observed and expected responses for the Class on the Item. When this is positive, the Class has higher ability than expected or the item is easier than expected.


DIF estimates with the  the iterative-logit (Rasch-Welch) method:

DIF MEASURE is the difficulty of this item for this class, with all else held constant, e.g., -.40 is the local difficulty for Class A of Item 1. The more difficult, the higher the DIF measure. The measures are conveniently listed in the Excel file for the DIF plots, or copy them from the Table into Excel.
For the raw scores corresponding to these measures, see Table 30.2
-.52> reports that this measure corresponds to an extreme maximum person-class score. EXTRSCORE= controls extreme score estimate.
1.97< reports that this measure corresponds to an extreme minimum person-class score. EXTRSCORE= controls extreme score estimate.
-6.91E reports that this measure corresponds to an item with an extreme score, which cannot exhibit DIF

DIF S.E. is the standard error of the DIF MEASURE. A value of ".00" indicates that DIF cannot be observed in these data.

PERSON CLASS identifies the CLASS of persons, e.g., the second CLASS is "D".

DIF MEASURE is the difficulty of this item for this class, with all else held constant, e.g., -.52 is the local difficulty for Class D of Item 1. > means "extreme maximum score".

DIF S.E. is the standard error of the second DIF MEASURE

DIF CONTRAST is the "effect size" in logits (or USCALE= units), the difference between the two DIF MEASURE, i.e., size of the DIF across the two classifications of persons, e.g., -.40 - -.52 = .11 (usually in logits). A positive DIF contrast indicates that the item is more difficult for the first, left-hand-listed CLASS.
If you want a sample-based effect size, then
effect size = DIF CONTRAST / (person sample measure S.D.)

JOINT S.E. is the standard error of the DIF CONTRAST = sqrt(first DIF S.E.² + second DIF S.E.²), e.g., 2.50 = sqrt(.11² + 2.49²)
Welch t gives the DIF significance as a Welch's (Student's) t-statistic » DIF CONTRAST / JOINT S.E. The t-test is a two-sided test for the difference between two means (i.e., the estimates) based on the standard error of the means (i.e., the standard error of the estimates). The null hypothesis is that the two estimates are the same, except for measurement error.

d.f. is the joint degrees of freedom, computed according to Welch-Satterthwaite. When the d.f. are large, the t statistic can be interpreted as a unit-normal deviate, i.e., z-score.

INF means "the degrees of freedom are so large they can be treated as infinite", i.e., the reported t-value is a unit normal deviate.

Prob. is the two-sided probability of Student's t. See t-statistics.


MantelHanzel reports Mantel-Haenszel (1959) DIF test for dichotomies or Mantel (1963) for polytomies using MHSLICE=. Statistics are reported when computable.

Chi-squ. is the Mantel-Haenszel for dichotomies or Mantel for polytomies chi-square with 1 degree of freedom.

Prob. is the probability of observing these data (or worse) when there is no DIF based on a chi-square value with 1 d.f.

Size CUMLOR (cumulative log-odds ratio in logits) is an estimate of the DIF (scaled by USCALE=). When the size is not estimable, +. and -. indicate direction.

Active Slices is a count of the estimable stratified cross-tabulations used to compute MH. MH is sensitive to score frequencies. If you have missing data, or only small or zero counts for some raw scores, the MH statistic can go wild or not be estimable. Please try different values of MHSLICE= (thin and thick slicing) to see how robust the MH estimates are.


ITEM Number is the item entry number. ITEM is specified by ITEM=

Name is the item label.


Below "----", each line in the Table is repeated with the CLASSes reversed.


ETS DIF Category

with DIF Contrast and DIF Statistical Significance

C = moderate to large

|DIF| ≥  0.64 logits

prob( |DIF| ≤ 0.43 logits ) ≤ .05 (2-sided)

approximately: |DIF| > 0.43 logits + 2 * DIF S.E.

B = slight to moderate

|DIF| ≥ 0.43 logits

prob( |DIF| = 0 logits ) ≤ .05 (2-sided)

approximately: |DIF| > 2 * DIF S.E

A = negligible



C-, B- = DIF against focal group; C+, B+ = DIF against reference group

ETS (Educational Testing Service) use Delta δ units.

1 logit = 2.35 Delta δ units. 1 Delta δ unit = 0.426 logits.

Zwick, R., Thayer, D.T., Lewis, C. (1999) An Empirical Bayes Approach to Mantel-Haenszel DIF Analysis. . Journal of Educational Measurement, 36, 1, 1-28


For meta-analysis, the DIF Effect Size = DIF Contrast / S.D. of the "control" CLASS (or the pooled CLASSes). The S.D. for each CLASS is shown in Table 28.


Example: The estimated item difficulty for Females, the DIF MEASURE, is 2.85 logits, and for males the DIF MEASURE is 1.24 logits. So the DIF CONTRAST, the apparent bias of the item against Females, is 1.61 logits. An alternative interpretation is that the Females are 1.61 logits less able on the item than the males.


                             Males          Females

Item 13: +---------+---------+-+-------+-------+-+>> difficulty increases

         -1        0          1.24     +2     2.85   DIF measure

                               +---------------> = 1.61 DIF contrast

Help for Winsteps Rasch Measurement Software: Author: John Michael Linacre

The Languages of Love: draw a map of yours!

For more information, contact or use the Contact Form

Facets Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation download
Winsteps Rasch measurement software. Buy for $149. & site licenses. Freeware student/evaluation download

State-of-the-art : single-user and site licenses : free student/evaluation versions : download immediately : instructional PDFs : user forum : assistance by email : bugs fixed fast : free update eligibility : backwards compatible : money back if not satisfied
Rasch, Winsteps, Facets online Tutorials


Forum Rasch Measurement Forum to discuss any Rasch-related topic

Click here to add your email address to the Winsteps and Facets email list for notifications.

Click here to ask a question or make a suggestion about Winsteps and Facets software.

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments, George Engelhard, Jr. & Stefanie Wind Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez
Winsteps Tutorials Facets Tutorials Rasch Discussion Groups



Coming Winsteps & Facets Events
May 22 - 24, 2018, Tues.-Thur. EALTA 2018 pre-conference workshop (Introduction to Rasch measurement using WINSTEPS and FACETS, Thomas Eckes & Frank Weiss-Motz),
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
June 27 - 29, 2018, Wed.-Fri. Measurement at the Crossroads: History, philosophy and sociology of measurement, Paris, France.,
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps),
July 25 - July 27, 2018, Wed.-Fri. Pacific-Rim Objective Measurement Symposium (PROMS), (Preconference workshops July 23-24, 2018) Fudan University, Shanghai, China "Applying Rasch Measurement in Language Assessment and across the Human Sciences"
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets),
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),



Our current URL is

Winsteps® is a registered trademark

John "Mike" L.'s Wellness Report: I'm 72, take no medications and, March 2018, my doctor is annoyed with me - I'm too healthy!
According to Wikipedia, the human body requires about 30 minerals, maybe more. There are 60 naturally-occurring minerals in the liquid Mineral Supplement which I take daily.