I am currently trying to implement a CAT for reasoning. After having read a lot, I am still not sure about some of the steps for item bank calibration, so I thought i could get some advice here.
What I did:
- developed items using an item generator
- set up a (sequential) online test to collect initial data from the target population
- collected data in 10 blocks each consisting of 16 different items and 4 anchor items
- cleaned the data (no missing values, no click-thorughs etc.)
--> for every block there is data from about 100 to 400 participants
What I plan on doing now:
- test for rasch model fit in every block
- test for DIF and remove bad items from every block
- chain-link the remaining items using a common-item non equivalent groups design
This procedure should result in a calibrated item bank that I could then feed to my CAT, right?
Would you do it the same way or is something wrong or missing? What would be suitable variables for testing for DIF? I thought about gender, age and level of education.
Thank you very much for your advice!