|
FORMAT= reformat data |
Up Previous Next |
|
Enables you to process awkwardly formatted data! But MFORMS= is easier
FORMAT= is rarely needed when there is one data line per person.
Place the data in a separate file, then the Winsteps screen file will show the first record before and after FORMAT=
Control instructions to pick out every other character for 25 two-character responses, then a blank, and then the person label: XWIDE=1 data=datafile.txt format=(T2,25(1A,1X),T90,1A,Tl1,30A)
This displays on the Winsteps screen:
Opening: datafile.txt Input Data Record before FORMAT=: 1 2 3 4 5 6 7 1234567890123456789012345678901234567890123456789012345678901234567890 ---------------------------------------------------------------------- 01xx 1x1 10002000102020202000201010202000201000200ROSSNER, MARC DANIEL Input Data Record after FORMAT=: 1x11102012222021122021020 L ^I ^N^P
^I is Item1= column ^N is the last item according to NI= ^P is Name1= column
FORMAT= enables you to reformat one or more data record lines into one new line in which all the component parts of the person information are in one person-id field, and all the responses are put together into one continuous item-response string. A FORMAT= statement is required if 1) each person's responses take up several lines in your data file. 2) if the length of a single line in your data file is more than 10000 characters. 3) the person-id field or the item responses are not in one continuous string of characters. 4) you want to rearrange the order of your items in your data record, to pick out sub-tests, or to move a set of connected forms into one complete matrix. 5) you only want to analyze the responses of every second, or nth, person.
FORMAT= contains up to 512 characters of reformatting instructions, contained within (..), which follow special rules. Instructions are:
nA read in n characters starting with the current column, and then advance to the next column after them. Processing starts from column 1 of the first line, so that 5A reads in 5 characters and advances to the sixth column.
nX means skip over n columns. E.g. 5X means bypass this column and the next 4 columns.
Tc go to column c. T20 means get the next character from column 20. T55 means "tab" to column 55, not "tab" passed 55 columns (which is TR55).
TLc go c columns to the left. TL20 means get the next character the column which is 20 columns to the left of the current position.
TRc go c columns to the right. TR20 means get the next character the column which is 20 columns to the right of the current position.
/ means go to column 1 of the next line in your data file.
n(..) repeat the string of instructions within the () exactly n times.
, a comma is used to separate the instructions.
Set XWIDE=2 and you can reformat your data from original 1 or 2 column entries. Your data will all be analyzed as XWIDE=2. Then:
nA2 read in n pairs of characters starting with the current column into n 2-character fields of the formatted record. (For responses with a width of 2 columns.)
A1 read in n 1-character columns, starting with the current column, into n 2-character fields of the formatted record.
Always use nA1 for person-id information. Use nA1 for responses entered with a width of 1-character when there are also 2-character responses to be analyzed. When responses in 1-character format are converted into 2-character field format (compatible with XWIDE=2), the 1-character response is placed in the first, left, character position of the 2-character field, and the second, right, character position of the field is left blank. For example, the 1-character code of "A" becomes the 2-character field "A ". Valid 1-character responses of "A", "B", "C", "D" must be indicated by CODES="A B C D " with a blank following each letter.
ITEM1= must be the column number of the first item response in the formatted record created by the FORMAT= statement. NAME1= must be the column number of the first character of the person-id in the formatted record.
Example 1: Each person's data record file is 80 characters long and takes up one line in your data file. The person-id is in columns 61-80. The 56 item responses are in columns 5-60. Codes are "A", "B", "C", "D". No FORMAT= is needed. Data look like: xxxxDCBDABCADCDBACDADABDADCDADDCCDADDCAABCADCCBBDADCACDBBADCZarathrustra-Xerxes
Without FORMAT= XWIDE=1 response width (the standard) ITEM1=5 start of item responses NI=56 number of items NAME1=61 start of name NAMLEN=20 length of name CODES=ABCD valid response codes
With FORMAT= Reformatted record will look like: DCBDABCADCDBACDADABDADCDADDCCDADDCAABCADCCBBDADCACDBBADCZarathrustra-Xerxes XWIDE=1 response width (the standard) FORMAT=(4X,56A,20A) skip unused characters ITEM1=1 start of item responses NI=56 number of items NAME1=57 start of name NAMLEN=20 length of name CODES=ABCD valid response codes
Example 2: Each data record is one line of 80 characters. The person-id is in columns 61-80. The 28 item responses are in columns 5-60, each 2 characters wide. Codes are " A", " B", " C", " D". No FORMAT= is necessary. Data look like: xxxx C D B A C B C A A D D D D C D D C A C D C B A C C B A CZarathrustra-Xerxes Without FORMAT= XWIDE=2 response width ITEM1=5 start of item responses NI=28 number of items NAME1=61 start of name NAMLEN=20 length of name CODES=" A B C D" valid response codes
With FORMAT= Columns of reformatted record: 1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-6-7-8-90123456789012345678 C D B A C B C A A D D D D C D D C A C D C B A C C B A CZarathrustra-Xerxes XWIDE=2 response width FORMAT=(4X,28A2,20A1) skip unused characters ITEM1=1 start of item responses in formatted record NI=28 number of items NAME1=29 start of name in "columns" NAMLEN=20 length of name CODES=" A B C D" valid response codes
Example 3: Each person's data record is 80 characters long and takes one line in your data file. Person-id is in columns 61-80. 30 1-character item responses, "A", "B", "C" or "D", are in columns 5-34, 13 2-character item responses, "01", "02" or "99", are in 35-60. xxxxDCBDABCADCDBACDADABDADCDADDCCA01990201019902010199020201Zarathrustra-Xerxes. becomes on reformatting: Columns: 1234567890123456789012345678901-2-3-4-5-6-7-8-9-0-1-2-3-45678901234567890123 DCBDABCADCDBACDADABDADCDADDCCA01990201019902010199020201Zarathrustra-Xerxes
XWIDE=2 analyzed response width FORMAT=(4X,30A1,13A2,20A1) skip unused ITEM1=1 start of item responses in formatted record NI=43 number of items NAME1=44 start of name NAMLEN=20 length of name CODES="A B C D 010299" valid responses ^ 1-character code followed by blank
Example 4: The person-id is 10 columns wide in columns 15-24 and the 50 1-column item responses, "A", "B", "C", "D", are in columns 4000-4019, then in 4021-50. Data look like: xxxxxxxxxxxxxxJohn-Smithxxxx....xxxDCBACDADABCADCBCDABDxBDCBDADCBDABDCDDADCDADBBDCDABB becomes on reformatting: John-SmithDCBACDADABCADCBCDABDBDCBDADCBDABDCDDADCDADBBDCDABB FORMAT=(T15,10A,T4000,20A,1X,30A) NAME1=1 start of person name in formatted record NAMLEN=10 length of name (automatic) ITEM1=11 start of items in formatted record NI=50 50 item responses CODES=ABCD valid response codes
Example 5: There are five records or lines in your data file per person. There are 100 items. Items 1-20 are in columns 25-44 of first record; items 21-40 are in columns 25-44 of second record, etc. The 10 character person-id is in columns 51-60 of the last (fifth) record. Codes are "A", "B", "C", "D". Data look like: xxxxxxxxxxxxxxxxxxxxxxxxACDBACDBACDCABACDACD xxxxxxxxxxxxxxxxxxxxxxxxDABCDBACDBACDCABACDA xxxxxxxxxxxxxxxxxxxxxxxxACDBACDBACDCABACDACD xxxxxxxxxxxxxxxxxxxxxxxxDABCDBACDBACDCABACDA xxxxxxxxxxxxxxxxxxxxxxxxABCDBACDBACDCABACDADxxxxxxMary-Jones
becomes: ACDBACDBACDCABACDACDDABCDBACDBACDCABACDAACDBACDBACDCABACDACDDABCDBACDBACDCABACDAABCDBACDBACDCABACDADMary-Jones
FORMAT=(4(T25,20A,/),T25,20A,T51,10A) ITEM1=1 start of item responses NI=100 number of item responses NAME1=101 start of person name in formatted record NAMLEN=10 length of person name CODES=ABCD valid response codes
Example 6: There are three lines per person. In the first line from columns 31 to 50 are 10 item responses, each 2 columns wide. Person-id is in the second line in columns 5 to 17. The third line is to be skipped. Codes are "A ", "B ", "C ", "D ". Data look like: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx A C B D A D C B A Dxxxxxxxx xxxxJoseph-Carlosxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
becomes: Columns: 1-2-3-4-5-6-7-8-9-0-1234567890123 A C B D A D C B A DJoseph-Carlos
FORMAT=(T31,10A2,/,T5,13A1,/) ITEM1=1 start of item responses NI=10 number of items XWIDE=2 2 columns per response NAME1=11 starting "A" of person name NAMLEN=13 length of person name CODES='A B C D ' valid response codes
If the third line isn't skipped, format a redundant extra column in the skipped last line. Replace the first control variable in this with: FORMAT=(T31,10A2,/,T5,13A1,/,A1) last A1 unused
Example 7: Pseudo-random data selection You have a file with 1,000 person records. This time you want to analyze every 10th record, beginning with the 3rd person in the file, i.e., skip two records, analyze one record, skip seven records, and so on. The data records are 500 characters long. XWIDE = 1 FORMAT = (/,/,500A,/,/,/,/,/,/,/) or XWIDE = 2 FORMAT = (/,/,100A2,300A1,/,/,/,/,/,/,/) ; 100 2-character responses, 300 other columns
Example 8: Test A, in file EXAM10A.TXT, and TEST B, in EXAM10B.TXT, are both 20 item tests. They have 5 items in common, but the distractors are not necessarily in the same order. The responses must be scored on an individual test basis. Also the validity of each test is to be examined separately. Then one combined analysis is wanted to equate the tests and obtain bankable item difficulties. For each file of original test responses, the person information is in columns 1-25, the item responses in 41-60.
The combined data file specified in EXAM10C.TXT, is to be in RFILE= format. It contains
Person information 30 characters (always) Item responses Columns 31-64
The identification of the common items is: Test Item Number (=Location in item string) Both: 1 2 3 4 5 6-20 21-35 A: 3 1 7 8 9 2,4-6,10-20 B: 4 5 6 2 11 1,3,7-10,12-20
I. From Test A, make a response (RFILE=) file rearranging the items with FORMAT=.
; This file is EXAM10A.TXT &INST TITLE="Analysis of Test A" RFILE=EXAM10AR.TXT ; The constructed response file for Test A NI=20 FORMAT=(25A,T43,A,T41,A,T47,3A,T42,A,T44,3A,T50,11A) ITEM1=26 ; Items start in column 26 of reformatted record CODES=ABCD# ; Beware of blanks meaning wrong! ; Use your editor to convert all "wrong" blanks into another code, ; e.g., #, so that they will be scored wrong and not ignored as missing. KEYFRM=1 ; Key in data record format &END Key 1 Record CCBDACABDADCBDCABBCA BANK 1 TEST A 3 ; first item name . BANK 20 TEST A 20 END NAMES Person 01 A BDABCDBDDACDBCACBDBA . Person 12 A BADCACADCDABDDDCBACA
The RFILE= file, EXAM10AR.TXT, is:
Person 01 A 00001000010010001001 Person 02 A 00000100001110100111 . Person 12 A 00100001100001001011
II. From Test B, make a response (RFILE=) file rearranging the items with FORMAT=. Responses unique to Test A are filled with 15 blank responses to dummy items.
; This file is EXAM10B.TXT &INST TITLE="Analysis of Test B" RFILE=EXAM10BR.TXT ; The constructed response file for Test B NI=35 FORMAT=(25A,T44,3A,T42,A,T51,A,T100,15A,T41,A,T43,A,T47,4A,T52,9A) ; Blanks are imported from an unused part of the data record to the right! ; T100 means "go beyond the end of the data record" ; 15A means "get 15 blank spaces" ITEM1=26 ; Items start in column 26 of reformatted record CODES=ABCD# ; Beware of blanks meaning wrong! KEYFRM=1 ; Key in data record format &END Key 1 Record CDABCDBDABCADCBDBCAD BANK 1 TEST B 4 . BANK 5 TEST B 11 BANK 6 TEST A 2 . BANK 20 TEST A 20 BANK 21 TEST B 1 . BANK 35 TEST B 20 END NAMES Person 01 B BDABDDCDBBCCCCDAACBC . Person 12 B BADABBADCBADBDBBBBBB
The RFILE= file, EXAM10BR.TXT, is:
Person 01 B 10111 010101001000100 Person 02 B 00000 010000000001000 . Person 11 B 00010 001000000000100 Person 12 B 00000 000101000101000
III. Analyze Test A's and Test B's RFILE='s together:
; This file is EXAM10C.TXT &INST TITLE="Analysis of Tests A & B (already scored)" NI=35 ITEM1=31 ; Items start in column 31 of RFILE= CODES=01 ; Blanks mean "not in this test" DATA=EXAM10AR.TXT+EXAM10BR.TXT ; Combine data files
; or, first, at the DOS prompt, ; C:> COPY EXAM10AR.TXT+EXAM10BR.TXT EXAM10AB.TXT(Enter) ; then, in EXAM10C.TXT, ; DATA=EXAM10AB.TXT
PFILE=EXAM10CP.TXT ; Person measures for combined tests IFILE=EXAM10CI.TXT ; Item calibrations for combined tests tfile=* ; List of desired tables 3 ; Table 3.1 for summary statistics, 3.2, ... 10 ; Table 10 for item structure * PRCOMP=S ; Principal components/contrast analysis with standardized residuals &END BANK 1 TEST A 3 B 4 . BANK 35 TEST B 20 END NAMES
Shortening FORMAT= statements If the required FORMAT= statement exceeds 512 characters, consider using this technique:
Relocate an entire item response string, but use an IDFILE= to delete the duplicate items, i.e., replace them by blanks. E.g., for Test B, instead of FORMAT=(25A, T44,3A,T42,A,T51,A, T100,15A, 41,A,T43,A,T47,4A,T52,9A) NI=35
Put Test 2 as items 21-40 in columns 51 through 70: FORMAT=(25A, T44,3A,T42,A,T51,A, T100,15A, T41,20A) NI=40
Blank out (delete) the 5 duplicated items with an IDFILE= containing: 24-26 22 31 |
Help for WINSTEPS® Rasch Measurement Software: www.winsteps.com.