Note that all of these programs have now been rewritten in R and are available as part of the psych package at the personality-project.org/r repository. VSS-ALPHA-ICLUST-IRT is a set of routines meant for the practical problems of scale construction and analysis. These routines are designed to identify the optimal number of factors (Very Simple Structure -- VSS) and the item-clusters that best define these dimensions. The resulting clusters are meant to be as independent and reliable as possible given the particular set of items. (Item Cluster Analysis -- ICLUST). Once item clusters or scales have been defined, the program will find scale scores as unweighted linear sums of items and report basic scale and item statistics (ALPHA). The quality of these scales and characteristics of the items can also be examined using rudimentary (1 and 2 parameter) Item Response Theory procedures. These are under development and are included in the current release. Additional routines for basic data manipulation are included to facilitate the practical problems of data cleaning, merging and editing.
Other programs are more appropriate for complex analyses such as factor analysis or principal component analysis (SYSTAT or SPSS) or structural equation modeling (EQS or LISREL). SPSS reliability offers many of the features of the Alpha procedures. Excel is recommend for complex data manipulation.
The purpose of this package of routines is to facilitate the construction of personality inventories for the typical problem encountered by most personality researchers: a set of items has been given that are thought to measure a number of independent constructs. The question then becomes one of identifying the most interpretable number of constructs and to construct scales from the items to measure them. Although standard factor analytic techniques are available to solve these problems, factor analysis of items is problematic. VSS and ICLUST are designed for the practical problem of deciding how many factors/clusters to extract from a set of items. ALPHA scores scales and reports the standard item and scale statistics. Analyses of scales using item response theory (IRT) techniques takes into account subject as well as item parameters.
Warning. This program is semi "bullet-proof" in that most error conditions will lead to a graceful exit with a brief explanation of the problem. However, unexpected errors will just cause a system crash rather than gracefully ask for help. Important files should be saved before using this program.
For help using these programs, email William Revelle at Northwestern University. The interested user may download a Mac version of VSS-Alpha-ICLUST-IRT. The source code for these procedures is written in Lightspeed Pascal for the Macintosh. I am happy to send it to anyone who is truly interested. Although development and testing is done on a G3 running Mac OS 8.1., the ocde is not optimized for Power PCs, and the programs will run on all Macs with System 7.1 or better that have at least a 68030 chip.
These notes are meant to be an introduction to the program and are not a tutorial in the process of scale construction or item analysis. The assumption is made that the user is moderately familar with data analytic concepts and terms and has a passing familarity with computer terminology.
Options and new procedures are added in response to requests (and suggestions) from users and are done when time allows. Syggestions for new options are welcome.
(This 250 item limit is actually because of the 256 character limit of string manipulations on the Mac. For very large data sets, make sure that you don't have more than 256 characters/line. It is possible to break longer lines into parts using BBEDIT.)
Save the file as a TEXT file: (SAVE AS if using Excel or Word).
Two sample lines:
01 1 1211213414231411131332211312111144121111112112111433 01 2 1111411411331111111111112414114144141121124111311441Two important points: every subject's data should have a carriage return at the end and there should be no blank lines. .
Although not necessary for the program, it is useful to enter the data in a fixed pitch font (e.g.,MONACO 9). That way, the columns will line up and it is possible to do some checking of the data while you are entering the data.
Note that the program is column sensitive. Blanks are treated as informative (they take up one column). Although some people find it useful to include spaces while entering the data, spaces technically count as items and should be removed before analysis proceeds.
Note that the data files to be read should not include extraneous blank lines.
It is possible to include free field input (rather than string oriented input) by checking the free-field option. In other words, if your data are more than a single digit per item you should have spaces (or tabs) between your fields. Such a case might be if data were entered in Excel or some other spreadsheet program and then saved as either space or tab delimited text. Use the "free field" option in the options menu.
Data checking can be done using the Compare file option which compares two files line by line for identical input (i.e., if the data have been entered twice).
To run the scoring routine, select Score_Alpha from the FILE menu. A series of dialog boxes will follow.
In general, names for the following files are requested:
1 I like to go to lively parties 2 I am nervous in the presence of others. ... 87 I sometimes talk about things without thinking.
impulsivity scale (with five items, one scored negatively) 5 1 2 -3 4 5Up to 50 scales can be scored at the same time. Sample keys for several scales are
Energetic Arousal 10 17 29 -59 -51 -28 23 20 64 18 55 Tense Arousal 10 11 3 32 69 27 -50 -26 -57 -43 -24 Positive Affect 10 12 17 41 53 40 63 15 14 34 52 negative affect 10 3 45 13 62 35 70 37 48 4 25 energetic arousal positive items 7 17 29 23 20 64 18 55 energetic arousal negative items 3 -59 -51 -28 tense arousal positive items 5 11 3 32 69 27 tense arousal negative items 5 -50 -26 -57 -43 -24
To allow for convenient keying of multiple inventories from the same data file, the first column for a set of items is requested as a parameter.
Thus, in a file containing 3 columns of identification, 36 columns of one questionnaire, and 50 columns of another questionnaire, it is possible to ask to score the second questionnaire by specifying that the first item starts at item (column) 40. To score items that are written over several lines, select the multi-line option in the options menu. Note that this is only appropriate if carriage returns were entered within each subject's data.
Missing data are replaced with the mean for the item. This can lead to strange estimates if subjects have a great deal of missing data. An option allows for reports of the number of missing values/scale/subject. Only one missing value is allowed. Typical values or "." or "9".
Because some investigators are concerned about between subject differences in the use of scale extremities, there is an option at input to standardize items within subjects. This has the effect of removing any general between subject factor as well.
Options include having data spread across several lines (multi-lines), reverse scoring of the items (reverse items), and free field input. (Reverse scoring is convenient if the data were entered yes=0 and 1=no and the scales are to be in the direction of the total agreement with the scale.) Output from the scoring program is stored on a file in a tab delimited format compatible with Excel or Word for further processing. (i.e. to sort item_scale correlations into rank order, etc.)
To just describe a set of data without bothering to find scale scores, select DESCRIBE_ITEMS in the ALPHA menu. If you then want to find scores, select the Score option in the ALPHA menu. (Score_Alpha does both describe and score).
Utility programs allow you to:
Also note that the if item-scale correlations are corrected for scale unreliability that for poor scales this will sometimes lead to item-scale correlations > 1.0.
The VSS criterion is calculated based upon the inter-item correlation matrix and a set of factor pattern matrices. The input procedures are similar to those for Score_Alpha. (i.e., the basic data files and data structures are the same. )
To find the VSS criterion, it is first necessary to run Systat or some other equivalently powerful stats program to generate a factor pattern matrix. This matrix should be stored in an Editor File (e.g., a Word file saved as text). Several different Systat Factor output files can be combined into one file, in the following format:
COL1 0.538 -0.196 -0.403 0.354 COL2 -0.145 -0.050 0.548 -0.333 COL3 -0.098 0.168 0.014 -0.835 COL4 0.191 0.216 -0.351 -0.148 COL5 0.645 -0.127 -0.284 0.292 ....(This is taken directly from Systat, using Word as the text editor, each output file was edited down until just the factor patterns were left.)
COL35 0.106 0.617 -0.317 -0.182 COL36 0.726 0.186 -0.046 0.019 COL1 0.644 -0.148 -0.407 COL2 -0.299 -0.020 0.571 COL3 -0.534 0.380 0.116 ...
COL34 -0.491 COL35 0.100 COL36 -0.495
VSS will first prompt for four different files:
The program will read the first set of factor patterns from the file, after prompting for how many factors to read. VSS will continue reading more patterns until told to stop by entering 0 factors. This allows for a comparison of different solutions and/different rotations.
1 I like to go to lively parties 2 I am nervous in the presence of others. ... 87 I sometimes talk about things without thinking.
The algorithm is the conventional hierarchical algorithm, with the addition of multiple stopping rules:
The output includes item-cluster correlations (sorted by the absolute magnitude of the correlation), cluster intercorrelations, cluster reliabilities, and goodness of fits of alternative solutions. In addition, scoring keys suitable for cutting and pasting into a "keys" flle for use by Alpha are included.
Note. There is one bug that is currently aluding me that leads to incorrect estimates of beta when solutions are constrained to be limited to clusters of a certain size. Stay tuned.
An item response package is currently being developed and tested. Input is identical to Alpha (including scoring keys for estimating multiple scales). IRT statisitics are reported for multilevel responses (from 0-9) in either a positive or negative keyed direction. One parameter estimates seem to work quite well, two parameter estimates are still under test.
1 I like to go to lively parties 2 I am nervous in the presence of others. ... 87 I sometimes talk about things without thinking.
impulsivity scale (with five items, one scored negatively) 5 1 2 -3 4 5Up to 50 scales can be scored at the same time. Sample keys for several scales are
Energetic Arousal 10 17 29 -59 -51 -28 23 20 64 18 55 Tense Arousal 10 11 3 32 69 27 -50 -26 -57 -43 -24 Positive Affect 10 12 17 41 53 40 63 15 14 34 52 negative affect 10 3 45 13 62 35 70 37 48 4 25 energetic arousal positive items 7 17 29 23 20 64 18 55 energetic arousal negative items 3 -59 -51 -28 tense arousal positive items 5 11 3 32 69 27 tense arousal negative items 5 -50 -26 -57 -43 -24
To allow for convenient keying of multiple inventories from the same data file, the first column for a set of items is requested as a parameter.
The output from the IRT routines include estimates of item difficulty for each level of the multiple responses, estimates of item discrimination, and estimates of person location (ability). These routines are still under development and the output is perhaps less clear than desired.
The original ICLUST/ALPHA programs were developed in Fortran at the University of Michigan in the the late 1960's using a IBM 360 ?. They were subsequently converted to CDC Fortran for the Northwestern Cyber 6600/7600 series machines. VSS was written in Fortran for the Cyber.
The current program is a complete rewrite of all of those programs to take advantage of the power of the Macintosh. The code is written in Lightspeed Pascal to be consistent with other programs developed at the Personality, Motivation and Cognition Laboratory at NU.
The program was developed under system 7.1 and subsequent revisions, and can come in two versions. (for Mac+/SEs and for 68030/68040 machines such as the SE30 or Mac IIci or Quadra). Although not PPC native it will run on powermacs. (Note that the version on the server is called VSS-Alpha-Iclust-IRT and is meant for Mac IIs and SE 30s and newer machines). A 350 variable by 260 subject cluster problem takes about 90 seconds on a G3/292. The distribution version of the program uses 10Meg and works on problems of at least 400 variables. This should be increased to 12Meg for a 600 variable problem. Memory can be reduced to 6 Meg for small (<100 variable) problems. To do this, go to the Information window.
Comments and additions to this set of brief notes are appreciated. Comments and suggestions for improvements to the program are also appreciated.
Source code is available to the interested user. Send me a note describing your needs and plans on using the code. revelle@northwestern.edu
Revelle, W. (1978). ICLUST: A cluster analytic approach for exploratory and confirmatory scale construction. Behavior Research and Instrumentation, 10, 739-742. (Mac version available)
Revelle, W. (1979). Hierarchical cluster analysis and the internal structure of tests. Multivariate Behavioral Research, 14, 57-74.
Revelle, W., & Rocklin, T. (1979). Very Simple Structure: an alternative procedure for estimating the optimal number of interpretable factors. Multivariate Behavioral Research, 14, 403-414.