Table 30.1 Differential item functioning DIF pairwise 
Table 30 supports the investigation of item bias, Differential Item Functioning (DIF), i.e., interactions between individual items and types of persons.
Table
30.1 is best for pairwise comparisons, e.g., Females vs. Males.
30.2 DIF report (measure list: person class within item)
30.3 DIF report (measure list: item within person class)
30.4 DIF report (itembyperson class chisquares)
30.5 Withinclass fit report (person class within item)
30.6 Withinclass fit report (item within person class)
30.7 Item measure profiles for classes of persons
Excel DIF Plots
In Table 30.1  the hypothesis is "this item has the same difficulty for two groups"
In Table 30.2, 30.3  the hypothesis is "this item has the same difficulty as its average difficulty for all groups"
In Table 30.4  the hypothesis is "this item has no overall DIF across all groups"
Table 30.1 reports a probability and a size for DIF statistics. Usually we want:
1. probability so small that it is unlikely that the DIF effect is merely a random accident
2. size so large that the DIF effect has a substantive impact on scores/measures on the test
A general thought: Significance tests, such as DIF tests, are always of doubtful value in a Rasch context, because differences can be statistically significant, but far too small to have any impact on the meaning, or practical use, of the measures. So we need both statistical significance and substantive difference before we take action regarding bias, etc.
Table 30.1 is a pairwise DIF (bias) analysis: this is testing "item difficulty for Group A vs. item difficulty for Group B". Table 30.1 makes sense if there are only two groups, or there is one majority reference group.
Tables 30.2 and 30.3 are a global DIF (bias) analysis: this is testing "item difficulty for Group A vs. item difficulty for all groups combined." Tables 30.2 and 30.3 make sense when there are many small groups, e.g., agegroups in 5 year increments from 0 to 100.
DIF results are considerably influenced by sample size, so if you have only two persongroups, go to Table 30.1. If you have lots of persongroups go to Table 30.2
Specify DIF= for person classifying indicators in person labels. Item bias and DIF are the same thing. The widespread use of "item bias" dates to the 1960's, "DIF" to the 1980's. The reported DIF is corrected to test impact, i.e., differential average performance on the whole test. Use ability stratification to look for nonuniform DIF using the selection rules. Tables 30.1 and 30.2 present the same information from different perspectives.
From the Output Tables menu, the DIF/DPF dialog is displayed.
Table 31 supports person bias, Differential Person Functioning (DPF), i.e., interactions between individual persons and classifications of items.
Table 33 reports bias or interactions between classifications of items and classifications of persons.
In these analyses, persons with extreme scores are excluded, because they do not exhibit differential ability across items. For background discussion, see DIF and DPF considerations.
Example output:
You want to examine item bias (DIF) between Females and Males. You need a column in your Winsteps person label that has two (or more) demographic codes, say "F" for female and "M" for male (or "0" and "1" if you like dummy variables) in column 9.
Table 30.1 is best for pairwise comparisons, e.g., Females vs. Males.
DIF class specification is: DIF=@GENDER

 KID ObsExp DIF DIF KID ObsExp DIF DIF DIF JOINT RaschWelch MantelHaenszel Size Active TAP 
 CLASS Average MEASURE S.E. CLASS Average MEASURE S.E. CONTRAST S.E. t d.f. Prob. Chisqu Prob. CUMLOR Slices Number Name 

 F .00 6.59E .00 M .00 6.59E .00 .00 .00 .00 0 1.000 1 14 
 F .04 5.24> 1.90 M .04 3.87 .90 1.37 2.10 .65 28 .5194 .0000 1.000 7 4 134 
 F .01 1.67 .68 M .01 1.48 .70 .19 .97 .19 31 .8468 .1316 .7167 .06 7 10 243

 M .00 6.59E .00 F .00 6.59E .00 .00 .00 .00 30 1.000 1 14 

Width of MantelHaenszel slice: MHSLICE = .010 logits
The most important numbers in Table 30.1: The DIF CONTRAST is the difference in difficulty of the item between the two groups. This should be at least 0.5 logits for DIF to be noticeable. "Prob." shows the probability of observing this amount of contrast by chance, when there is no systematic item bias effect. For statistically significance DIF on an item, Prob. ≤ .05.
DIF class specification defines the columns used to identify DIF classifications, using DIF= and the selection rules.
For summary statistics on each class, use Table 28.
Reading across the Table 30.1 columns:
PERSON CLASS identifies the CLASS of persons. PERSON is specified with PERSON=, e.g., the first here is CLASS is "A".
ObsExp Average is the average difference between the observed and expected responses for the Class on the Item. When this is positive, the Class has higher ability than expected or the item is easier than expected.
DIF estimates with the the iterativelogit (RaschWelch) method:
DIF MEASURE is the difficulty of this item for this class, with all else held constant, e.g., .40 is the local difficulty for Class A of Item 1. The more difficult, the higher the DIF measure. The measures are conveniently listed in the Excel file for the DIF plots, or copy them from the Table into Excel.
For the raw scores corresponding to these measures, see Table 30.2
.52> reports that this measure corresponds to an extreme maximum personclass score. EXTRSCORE= controls extreme score estimate.
1.97< reports that this measure corresponds to an extreme minimum personclass score. EXTRSCORE= controls extreme score estimate.
6.91E reports that this measure corresponds to an item with an extreme score, which cannot exhibit DIF
DIF S.E. is the standard error of the DIF MEASURE. A value of ".00" indicates that DIF cannot be observed in these data.
PERSON CLASS identifies the CLASS of persons, e.g., the second CLASS is "D".
DIF MEASURE is the difficulty of this item for this class, with all else held constant, e.g., .52 is the local difficulty for Class D of Item 1. > means "extreme maximum score".
DIF S.E. is the standard error of the second DIF MEASURE
DIF CONTRAST is the "effect size" in logits (or USCALE= units), the difference between the two DIF MEASURE, i.e., size of the DIF across the two classifications of persons, e.g., .40  .52 = .11 (usually in logits). A positive DIF contrast indicates that the item is more difficult for the first, lefthandlisted CLASS.
If you want a samplebased effect size, then
effect size = DIF CONTRAST / (person sample measure S.D.)
JOINT S.E. is the standard error of the DIF CONTRAST = sqrt(first DIF S.E.² + second DIF S.E.²), e.g., 2.50 = sqrt(.11² + 2.49²)
Welch t gives the DIF significance as a Welch's (Student's) tstatistic » DIF CONTRAST / JOINT S.E. The ttest is a twosided test for the difference between two means (i.e., the estimates) based on the standard error of the means (i.e., the standard error of the estimates). The null hypothesis is that the two estimates are the same, except for measurement error.
d.f. is the joint degrees of freedom, computed according to WelchSatterthwaite. When the d.f. are large, the t statistic can be interpreted as a unitnormal deviate, i.e., zscore.
INF means "the degrees of freedom are so large they can be treated as infinite", i.e., the reported tvalue is a unit normal deviate.
Prob. is the twosided probability of Student's t. See tstatistics.
MantelHanzel reports MantelHaenszel (1959) DIF test for dichotomies or Mantel (1963) for polytomies using MHSLICE=. Statistics are reported when computable.
Chisqu. is the MantelHaenszel for dichotomies or Mantel for polytomies chisquare with 1 degree of freedom.
Prob. is the probability of observing these data (or worse) when there is no DIF based on a chisquare value with 1 d.f.
Size CUMLOR (cumulative logodds ratio in logits) is an estimate of the DIF (scaled by USCALE=). When the size is not estimable, +. and . indicate direction.
Active Slices is a count of the estimable stratified crosstabulations used to compute MH. MH is sensitive to score frequencies. If you have missing data, or only small or zero counts for some raw scores, the MH statistic can go wild or not be estimable. Please try different values of MHSLICE= (thin and thick slicing) to see how robust the MH estimates are.
ITEM Number is the item entry number. ITEM is specified by ITEM=
Name is the item label.
Below "", each line in the Table is repeated with the CLASSes reversed.
ETS DIF Category 
with DIF Contrast and DIF Statistical Significance 

C = moderate to large 
DIF ≥ 0.64 logits 
prob( DIF ≤ 0.43 logits ) ≤ .05 (2sided) approximately: DIF > 0.43 logits + 2 * DIF S.E. 
B = slight to moderate 
DIF ≥ 0.43 logits 
prob( DIF = 0 logits ) ≤ .05 (2sided) approximately: DIF > 2 * DIF S.E 
A = negligible 
 
 
C, B = DIF against focal group; C+, B+ = DIF against reference group 

ETS (Educational Testing Service) use Delta δ units. 1 logit = 2.35 Delta δ units. 1 Delta δ unit = 0.426 logits. 

Zwick, R., Thayer, D.T., Lewis, C. (1999) An Empirical Bayes Approach to MantelHaenszel DIF Analysis. . Journal of Educational Measurement, 36, 1, 128 
For metaanalysis, the DIF Effect Size = DIF Contrast / S.D. of the "control" CLASS (or the pooled CLASSes). The S.D. for each CLASS is shown in Table 28.
Example: The estimated item difficulty for Females, the DIF MEASURE, is 2.85 logits, and for males the DIF MEASURE is 1.24 logits. So the DIF CONTRAST, the apparent bias of the item against Females, is 1.61 logits. An alternative interpretation is that the Females are 1.61 logits less able on the item than the males.
Males Females
Item 13: +++++++>> difficulty increases
1 0 1.24 +2 2.85 DIF measure
+> = 1.61 DIF contrast
