Psychological - W2 - Chapter 5 - Reliability (DN) Part 1
A deck of 25 flashcards covering key concepts in the field of reliability, specifically focusing on Classical Test Theory and Domain Sampling Theory.
alternate forms
are simply DIFFERENT VERSIONS of a TEST that have been constructed to be as similar as possible to the original
e. g., hard copy - online - oral etc.
a measure of reliability across time
does not have same mean & variance as original test so not as good as parallel forms
p. 151
Key Terms
alternate forms
are simply DIFFERENT VERSIONS of a TEST that have been constructed to be as similar as possible to the original
e. g., hard copy - online - o...
alternate-forms reliability
an estimate of the extent to which the ALTERNATE (different) FORMS of a test have been affected by ITEM SAMPLING ERROR, or OTHER ERROR
a degr...
average proportional distance (APD)
a measure used to evaluate the INTERNAL CONSISTENCY of a test
focuses on the DEGREE of DIFFERENCE that exists between ITEM SCORES
typic...
classical test theory (CTT)
also known as ‘true score theory’ & ‘true score model’
system of assumptions about measurement
the composition of a TEST SCORE is m...
coefficient α (alpha)
developed by Cronbach (1951); elaborated on by others.
also referred to as CRONBACH’S ALPHA and ALPHA
a statistic widely employed in TE...
coefficient of equivalence
the estimate of the degree of relationship that exists BETWEEN various FORMS of a TEST
can be evaluated with an alternate-forms or parallel f...
Related Flashcard Decks
Study Tips
- Press F to enter focus mode for distraction-free studying
- Review cards regularly to improve retention
- Try to recall the answer before flipping the card
- Share this deck with friends to study together
| Term | Definition |
|---|---|
alternate forms | are simply DIFFERENT VERSIONS of a TEST that have been constructed to be as similar as possible to the original e. g., hard copy - online - oral etc. a measure of reliability across time does not have same mean & variance as original test so not as good as parallel forms p. 151 |
alternate-forms reliability | an estimate of the extent to which the ALTERNATE (different) FORMS of a test have been affected by ITEM SAMPLING ERROR, or OTHER ERROR a degree of a test’s reliability across time p. 151-152, 161 |
average proportional distance (APD) | a measure used to evaluate the INTERNAL CONSISTENCY of a test focuses on the DEGREE of DIFFERENCE that exists between ITEM SCORES typically calculated for a GROUP of TESTTAKERS p. 157-158 |
classical test theory (CTT) | also known as ‘true score theory’ & ‘true score model’ system of assumptions about measurement the composition of a TEST SCORE is made up of a relatively stable component which is what the test/individual item is designed to measure PLUS a component that is ERROR. p. 123 (164-166, 280-281) |
coefficient α (alpha) | developed by Cronbach (1951); elaborated on by others. also referred to as CRONBACH’S ALPHA and ALPHA a statistic widely employed in TEST CONSTRUCTION the preferred statistic for obtaining INTERNAL CONSISTENCY RELIABILITY only requires ONE administration of the test assists in deriving an ESTIMATE of RELIABILITY; more technically, it is equal to the MEAN of ALL SPLIT-HALF RELIABILITIES suitable for use on tests with NON-DICHOTOMOUS ITEMS unlike Pearson r (-1 to +1), COEFFICIENT ALPHA ranges from 0-1 because it is used to gauge SIMILARITY of data sets so 0 = absolutely NO SIMILARITY Answer: p.157 |
coefficient of equivalence | the estimate of the degree of relationship that exists BETWEEN various FORMS of a TEST can be evaluated with an alternate-forms or parallel forms COEFFICIENT OF STABILITY (these are both known as the COEFFICIENT OF EQUIVALENCE) p.151 |
coefficient of generalisability | represents an estimate of the INFLUENCE of particular FACETS on the test score e. g., - Is the score affected by group as opposed to one on one administration? or - Is the score affected by the time of day the test is administered? p. 168 |
coefficient of inter-scorer reliability | the estimate of the degree of CONSISTENCY AMONG SCORERS in the scoring of a test this is the COEFFICIENT of CORRELATION for inter-scorer consistency (reliability) p. 159 |
coefficient of stability | the estimate of a test-retest reliability taken when the interval between tests is GREATER than SIX MONTHS this is a significant estimate as the passage of time can be a source of ERROR VARIANCE i.e., the more time passed, the greater likelihood of a lower reliability coefficient p.151 |
confidence interval | a RANGE or BAND of test scores that is likely to contain the ‘TRUE SCORE’ p.177 |
content sampling | the VARIETY of SUBJECT MATTER contained in the test ITEMS. one source of variance in the measurement process is the VARIATION among items WITHIN a test or BETWEEN tests i. e., the way in which a test is CONSTRUCTED is a source of ERROR VARIANCE also referred to as ITEM SAMPLING p.147 |
criterion-referenced test | way of DERIVING MEANING from test scores by evaluating an individual’s score with reference to a SET STANDARD (CRITERION) also referred to as “domain-referenced testing” & “content-referenced testing and assessment” DISTINCTION: CONTENT-REFERENCED interpretations are those where the score is directly interpreted in terms of performance AT EACH POINT on the achievement continuum being measured - while CRITERION-REFERENCED interpretations are those where the score is DIRECTLY INTERPRETED in terms of performance at ANY GIVEN POINT on the continuum of an EXTERNAL VARIABLE. p.139-141 (163-164, 243) |
decision study | conducted on the conclusion of a generalizability study designed to EXPLORE the UTILITY & VALUE of TEST SCORES in making DECISIONS. p. 168 |
dichotomous test item | a TEST ITEM or QUESTION that can be answered with ONLY one of two responses e.g., true/false or yes/no p. 169 |
discrimination | In IRT the DEGREE to which an ITEM DIFFERENTIATES among people with HIGHER or LOWER levels of the TRAIT, ABILITY or whatever is being measured by a test p. 169 |
domain sampling theory | while Classical Test Theory seeks to estimate the proportion of a test score due to ERROR Domain Sampling Theory seeks to estimate the proportion of a test score that is due to specific sources of variation under defined conditions (i.e., context/domain) in DST, the test’s RELIABILITY is looked upon as an OBJECTIVE MEASURE of how precisely the test score assesses the DOMAIN from which the test DRAWS a SAMPLE of the three TYPES of ESTIMATES of RELIABILITY; measures of INTERNAL CONSISTENCY are the most compatible with DST p. 166 & 167 |
dynamic characteristic | a TRAIT, STATE, or ABILITY presumed to be EVER-CHANGING as a function of SITUATIONAL and COGNITIVE EXPERIENCES; contrast with static characteristic p. 162 |
error variance | error from IRRELEVANT, RANDOM sources - ERROR VARIANCE plus TRUE VARIANCE = TOTAL VARIANCE p.126,146 |
estimate of inter-item consistency | the degree of correlation among ALL items on a scale the CONSISTENCY or HOMOGENEITY of ALL items on a test estimated by techniques such as the SPLIT-HALF RELIABILITY method p.152 - 154 |
facet | include things like the number of items on a test, the amount of training the test scorers have had & the purpose of the test administration p. 167 |
generalizability study | examines how GENERALIZABLE SCORES from a PARTICULAR test are if the test is administered in DIFFERENT SITUATIONS i.e., it examines how much of an IMPACT DIFFERENT FACETS of the UNIVERSE have on a test score p.167, 168 |
generalizability theory | based on the idea that a person’s test scores VARY from testing to testing because of variables in the TESTING SITUATION test score in its context - DN encourages test users to describe details of a particular test situation or (UNIVERSE) leading to a particular test score a ‘UNIVERSE SCORE’ replaces a ‘TRUE SCORE’ Cronbach (1970) & colleagues p. 167 |
heterogeneity | the degree to which a test measures DIFFERENT FACTORS i.e, the test contains items that measure MORE THAN ONE TRAIT (FACTOR) (also NONHOMOGENEOUS) p.154 |
homogeneity | When a test contains ITEMS that MEASURE a SINGLE TRAIT i.e., the DEGREE to which a test measures a SINGLE FACTOR - i.e., the extent to which items in a scale are UNIFACTORIAL the more HOMOGENEOUS a test, the more INTER-ITEM CONSISTENCY it is expected to have higher Internal Consistency than a HETEROGENEOUS TEST homogeneity is desirable as it provides straightforward INTERPRETATION (i.e., similar scores -= similar abilities on variable of interest) p. 154-155 |