Item Response Theory
In psychometrics, item response theory (IRT) also known as latent trait theory, strong true score theory, or modern mental test theory, is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is based on the application of related mathematical models to testing data. Because it is generally regarded as superior to classical test theory, it is the preferred method for the development of high-stakes tests such as the Graduate Record Examination and Graduate Management Admission Test.
The name item response theory is due to the focus of the theory on the item, as opposed to the test-level focus of classical test theory, by modeling the response of an examinee of given ability to each item in the test. The term item is used because many test questions are not actually questions; they might be multiple choice questions that have incorrect and correct responses, but are also commonly statements on questionnaires that allow respondents to indicate level of agreement (a rating or likert scale), or patient symptoms scored as present/absent. IRT is based on the idea that the probability of a correct/keyed response to an item is a mathematical function of person and item parameters. The person parameter is called latent trait or ability; it may, for example, represent a person's intelligence or the strength of an attitude. Item parameters include difficulty (location), discrimination (slope or correlation), and pseudoguessing (lower asymptote).
A major point of contention is the use of the guessing, or pseudo-chance, parameter. The IRT approach recognizes that guessing is present in multiple choice examinations, and will therefore typically employ the a guessing parameter to account for this. In contrast, the Rasch approach assumes guessing adds random noise to the data. As the noise is randomly distributed, provided sufficient items are tested, the rank-ordering of persons along the latent trait by raw score will not change, but will simply undergo a linear rescaling. The presence of random guessing will not therefore affect the relationships between Rasch person measures, although a larger number of items may be needed to achieve the desired level of reliability and separation. A form of guessing correction is available within Rasch measurement by excluding all responses where person ability and item difficulty differ by preset amounts, so persons are not tested on items where guessing or unlucky mistakes are likely to affect results. However, if guessing is not random, arising through poorly written distractors that address an irrelevant trait, for example, then more sophisticated identification of pseudo-chance responses is needed to correct for guessing. Rasch fit statistics allow identification of unlikely responses which may be excluded from the analysis if they are attributed to guessing. This obviously assumes that the researcher is able to identify whether a student guessed or not by simply examining the patterns of responses in the data, so is typically used in analysis of distractor effectiveness in pilot administrations of operational tests or validation of research instruments, where exclusion of outlying persons is normal practice, rather than operational testing, where legal concerns typically dictate the use of rescaled raw scores without correction for guessing or misfit. If misfitting responses are retained, the Rasch model typically results in some items misfitting the model, and, if the number of misfitting items is excessive, there is a data-model mismatch, which has been a major criticism of the approach for decades. Three-parameter IRT, by contrast, achieves data-model fit by selecting a model that fits the data. Unsurprisingly, such methods result in better data-model fit, but, as a model is not specified in advance for confirmation, such an exploratory approach sacrifices the use of fit statistics as a diagnostic tool to confirm whether the theorized model is an acceptable description of the latent trait. Two and three-parameter models will still report fit statistics, but the exploratory nature of the analysis means that they are irrelevant as a tool for confirmatory analysis.
See also: http://en.wikipedia.org/wiki/Item_response_theory
Source: Wikipedia (All text is available under the terms of the GNU Free Documentation License and Creative Commons Attribution-ShareAlike License.)