Psychology /Psychological Testing: Chapter 5: Reliability

Psychological Testing: Chapter 5: Reliability

Psychology96 CardsCreated about 2 months ago

This flashcard set explains the concept of reliability in psychological measurement, including how test scores consist of true scores and error, and introduces the reliability coefficient as a measure of consistency in test results. It emphasizes the importance of understanding variance in observed scores to ensure accurate assessment.

Print Embed Import Report

Reliability

Consistency in measurement; the total variance in an observed distribution of test scores equals the sum of the true variance plus the error variance

Tap to flipTap or swipe ↕ to flip

Space↑↓

←→Swipe ←→Navigate

SSpeak

FFocus

1/96

Key Terms

Term

Definition

Reliability

Consistency in measurement; the total variance in an observed distribution of test scores equals the sum of the true variance plus the error varian...

Reliability Coefficient

Index of reliability; proportiion that indicates the ratio between the true score variance on a test and the total variance

Concept of Reliability

X = T+ E
X = Observed score
T = True score
E = Error

True Score Model

Also true that the magnitude of the presence of a certain psychological trait as measured b a test of that trait will be due to the true amount of ...

Variance

Statistic useful in describing sources of a test score variability; useful because it can be broken down into components

True Variance

Variance from true differences

Related Flashcard Decks

Psychology

Psychologie du Développement

Ce jeu de cartes couvre les concepts clés, l'histoire, et les méthodes de la psychologie du développement, y compris les contributions de figures influentes et les théories du développement.

10 cards

View Deck

Psychology

2023-2025 Year 12 A-Level Psychology: Psychopathology: The Cognitive Explanation

These flashcards explore how the cognitive approach explains depression as being linked to internal mental processes. They clarify that negative and irrational thoughts don’t directly cause depression but instead increase vulnerability to it, as proposed by cognitive psychologists Ellis and Beck.

50 cards

View Deck

Psychology

2023-2025 Year 12 A-Level Psychology: Psychopathology: The Behavioural, Emotional

These flashcards describe the three main categories of psychological characteristics — behavioural, cognitive, and emotional — and identify specific behavioural and cognitive symptoms of depression, such as disrupted sleep or eating patterns, poor concentration, negative thinking patterns, and black-and-white thinking.

23 cards

View Deck

Psychology

2023-2025 Year 12 A-Level Psychology: Psychopathology: The Behavioural Treatments Of Phobias

These flashcards explain behavioural methods used to treat phobias, including systematic desensitisation and flooding. They highlight how these therapies use classical conditioning principles to help individuals unlearn maladaptive responses by breaking the association between the conditioned stimulus and the fear response.

37 cards

View Deck

Psychology

2023-2025 Year 12 A-Level Psychology: Psychopathology: The Behavioural Explanation Of Phobias

These flashcards outline how the behaviourist approach explains phobias as learned behaviours. They describe the two-process model, which involves classical and operant conditioning, showing how fear responses are acquired through association and maintained through reinforcement.

30 cards

View Deck

Psychology

2023-2025 Year 12 A-Level Psychology - Research Methods: Sampling Techniques - KU

A Target Population is the large group of individuals that a researcher is interested in studying. It represents everyone who fits the criteria for the research, from which a smaller sample is usually selected for the actual study.

34 cards

View Deck

Study Tips

Press F to enter focus mode for distraction-free studying
Review cards regularly to improve retention
Try to recall the answer before flipping the card
Share this deck with friends to study together

Psychological Testing: Chapter 5: Reliability

Term	Definition
Reliability	Consistency in measurement; the total variance in an observed distribution of test scores equals the sum of the true variance plus the error variance
Reliability Coefficient	Index of reliability; proportiion that indicates the ratio between the true score variance on a test and the total variance
Concept of Reliability	`X = T+ E X = Observed score T = True score E = Error`
True Score Model	Also true that the magnitude of the presence of a certain psychological trait as measured b a test of that trait will be due to the true amount of that trait and other factors
Variance	Statistic useful in describing sources of a test score variability; useful because it can be broken down into components
True Variance	Variance from true differences
Error Variance	Variance from irrelevant, random sources
Reliability of a Test	The greater the proportion of the total variance attributed to true variance, the more reliable the test
Sources of Error Variance	Test Construction Administration Scoring Interpretation
Item/Content Sampling	Terms that refer to variation among items within a test as well as to variation among items between tests
Challenge in Test Development	Maximize the proportion of the total variance that is true variance and to minimize the proportion of the total variance that is error variance
Factors related to the Test Environment	Room temperature Level of Lighting Amount of ventilation and noise Instrument used to enter responses and even the writing surface on which responses are written
Factors related to Testtaker variables	Pressing emotional problems Physical discomfort Lack of sleep Effects of drugs or medication
Factors related to Examiner-Related Variables	Examiner’s physical appearance and demeanor; presence or absence of an examiner
Scoring and Scoring systems	Technical glitches may contaminate data
Test-Retest Method	Using the same instrument to measure the same thing at two points in time
Test-Restest Reliability	Result of a reliability evaluation; estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test
Test-Retest Measure	Appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time;
Coefficient of Stability	Estimate of test-retest reliability when the interval between testing is greater than six months
Coefficient of Equivalence	Alternate-Forms or Parraled forms coefficient of reliability
Parallel Forms	Exist when for each form of the test, the means and the variances of observed test scores are equal; means of scores obtained on parallel forms correlate equally with the tue score; scores obtained on parallel test correlate equally with other measures
Alternate Forms	Different versions of a test that have been constructed so as to be parallel; designed to be equivalent with respect to variables such as content and level of difficulty
Similarity between obtaining estimates of alternate forms reliability and parallel forms reliability and obtaining an estimate of test-retest reliability	Two test administrations with the same group are required Test scores may be affected by factors such as motivation, fatigue, or intervening events such as practice, learning or therapy
Item Sampling	Inherent in the computation of an alternate- or parallel-forms reliability coefficient; testtakers may do better or worse on a specific form of the test not as a function of their true ability but simply because of the particular items that were selected for inclusion in the test
Internal Consistency Estimate of Reliability/Estimate of Inter-Item Consistency	Obtaining an estimate of the reliability of a test without developing an alternate form of the test and without having to administer the test twice to the same people
Split-Half Reliability	Obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once; useful measure of reliability when it is impractical or undersirable to assess reliability with two tests or to administer a test twice
Steps to compute a Coefficient of Split-Half Reliability	Divide the test into equivalent halves. Calculate a Pearson r between scores on the two halves of the test Adjust the half-test reliability using the Spearman-Brown formula
To Split a Test	Randomly assign items to one or the other half of the test; assign odd-numbered items to one half of the test and even-numbered items to the other half
Odd-Even Reliability	assign odd-numbered items to one half of the test and even-numbered items to the other half
Mini Parallel Forms	Each half equal to the other in format, stylistic, statistical, and related aspect
Spearman-Brown Formula	Allows a test developer or user to estimate internal consistency and reliability from a correlation of two halves of a test; Specific application to estimate the reliability of a test that is legnthened or shhortened by any number of items; used to determine the number of items needed to attain a desired level of reliability
In adding items to increase test reliability to a desired level	The rule is that new items must be equivalent in content and difficulty so that the longer test still measures what the original test measured
When Internal Consistency Estimates of Reliability are Inappropriate	When measuring the reliability of a heterogeneous test and speed test
Inter-item Consistency	Refers to the degree of correlation among all the items on a scale; calculated from a single administration of a single form on a test; useful in assessing the homogeneity of a test
Homogeniety	Degree to which a test measures a single factor; extent to which items in a scale are unifactorial
Heterogeneity	Degree to which a test measures different factors; composed of items that measure more than one trait
Nature of Homogeneous Test	The more homogeneous a test is, the more inter-item consistency it can be expected to have; Desirable because it allows relatively straighforward test-score interpretation
Testtakers with the same score on a Homogeneous Test	Have similar abilities in the area tested
Testtakers with the same score on a Heterogeneous Test	May have different abilities
Homogeneous Test	Insufficient tool for measuring multifaceted psychological variables such as intelligence or personality
G. Frederic Kuder & M.W. Richardson	Developed their own measures for estimating reliability; Kuder-Richardson Formula 20 (KR-20)
Kuder Richardson Formula 20 (KR-20)	Most popular formula
Where test items are highly Homogeneous	KR-20 and split-half reliability estimates will be similar
Where test items are highly Heterogeneous	KR-20 will yield lower reliability estimates than the split-half method
Dichotomous Items	Items that can be scored right or wrong, such as multiple choice items
Test Battery	A selected assortment of tests and assessment procedures in the process of evaluation; typically composed of tests designed to measure different variables
r KR20	The Kuder-Richardson Formula 20 Reliability Coefficient
KR-21	Used if there is reason to assume that all the test items have approximately the same degree of difficulty; Outdated in an era of calculators and computers
Coefficient Alpha	Variant of the KR-20 that has received the most acceptance and is in widest used today; mean of all possible split-half correlations, corrected by the Spearman-Brown formula; approriate for use on tests containing nondichotomous items; preferred statistic for obtaining an estimate of internal consistency reliability; formula yields an estimate of the mean of all possible test-retest, split-half coefficients; widely used as a measure of reliability, in part because it requires only one administration of the test; gives information about the test scores and not the test itself
Coefficient Alpha Result Coefficient alpha is calculated to help answer questions about how similar sets of data are	Ranges in value from 0 to 1; impossible to yield a negative value of alpha, if negative, report as zero
Scale of Coefficient of Alpha	0 Absolutely no similarity 1 Perfectly identical Alpha is usually reported as Zero
Inter-Scorer Reliability	Degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure
Coefficient of Inter-scorer Reliability	A way to determine the degree of consistency among scorers
Approaches to the Estimation of Reliability	Test-Retest Alternate or Parallel Forms Internal or Inter-Item Consistency
How High a Coefficient of Reliability Should Be	On a continuum relative to the purpose and importance of the decisions to be made on the basis of scores on the test
Considerations of the Nature of The Testing Itself	Test items are homogeneous or heterogeneous in nature The characteristic, ability, or trait being measured is presumed to be dynamic or static The range of test scores is or is not restricted Test is a speed or a power test Test is or is not criterion-referenced
Sources of Variance in a Hypothetical Test	`True Variance 67% Error due to Test Construction 18% Administration Error 5% Unidentified Error 5% Scorer Error 5%`
Homogeneity of Test Items	HOmogeneous in items if it is functionally uniform throughout
Heterogeneity of Test Items	An estimate of internal consistency might be low relative to a more appropriate estimate of test-retest reliability
Dynamic Characteristic	A trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences; Obtained measurement would not be expected to vary significantly as a function of time, and either the test-retest or the alternate forms method would be appropriate;
Static Characteristic	Trait, state, or ability resumed to be relatively unchanging
Restriction of Range/Variance	If Variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be lower; if the variance of either variable in a correlational analysis is inflated by the sampling procedure, then the resulting correlation coefficient tends to be higher
Power Test	when a time limit is long enough to allow testtakers to attempt all items and if some items are so difficult that no testtaker is able to obtain a perfect score
Speed Test	Generally contains items of uniform level of difficulty so that when gien generous time limits, all testtakers should be able to complete all test items correctly; based on performance speed; time limit is established so that few, if any, of the testtakers will be able to complete the entire test
Reliability Estimate of A Speed Test	Based on performance from two independent testng periods using one of the following: Test-Retest Reliability Alternate-Forms Reliability Split-Half Reliability from two separately timed half tests
If Split Half Procedure is Used for a Speed Test	The obtained reliability coeffiient is for a half test and should be adjusted using the Spearman-Brown formula
Speed Test Administered Once & Measure of Internal Consistency is Calculated	Result will be a spuriously high reliability coefficient; two people, one who completes 82 items of a speed test and another who completes 61 items of the same speed test; correlation of the two will be close to 1 but will not say anything about response consistency
Criterion-Referenced Test	Designed to provide an indication of whether a testtaker stands with respect to some variable or cirterion, such as an educational or a vocational objective; tend to contain material that has been mastered in heirarchical fashion; tend to be interpreted in pass-fail terms, and any scrutiny of performance on individual items tends to be for diagnostic and remedial purpose
Test-Retest Reliability Estimate	Based on the correlation between the total scores on two admnistrations of the same test
Alternate-Forms Reliability Estimate	A reliability estimate is based on the correlation between scores on two halves of the test and is then adjusted using the Spearman-Brown formula to obtain a reliability estimate of the whole test
Generalizability Theory/Domain Sampling Theory	Seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score; A test's reliability is conceived of as an objective measure ofhow precisely the test score assesses the domain from which the test draws a sample
Domain of Behavior	Universe of items that could conceivably measure that behavior; hypothetical construct: one that shares certain characteristics with (and is measured by) the sample of items that make up the test
Generalizability Theory	May be viewed as an extension of true score theory wherein the concept of a universe score replaces that of a true score; developed by Lee J. Cronbach; Given the same conditions of all the facets in the universe, the exact same test score should be obtained
Lee J. Cronbach	Encouraged test deelopers and researchers to describe the details of the particular test situation (universe) leading to a speciic test score
Universe	Described in terms of its facets
Facets	Include things like the number of items in the test, the amount of training the test scorers have had, and the purose of the test administration
Universe Score	The test score; analogous to a true score in the true score model
Generalizability Study	Examines how generalizable scores from a particular test are if the test is administered in different situations; examines how much of an impact different facets of the universe have on the test score
Coefficients of Generalizability	Influence of particular facets on the test score; similar to reliability coefficients in the true score model
Decision Study	Developers examine the usefulness of test scores in helping the test user make decisions; designed to tell the test user how test scores should be used and how dependable those scores are as a basis for decisions, depending on the context of their use
Item Response Theory	Provide a way to model the probability that a person with X ability will be able to perform at a level of Y; Stated in terms of personality assessment, it models the probability that a person with X amount of a particular personality trait will exhibit Y amount of that trait on a personality test designed to measure it; not a term used to reer to a single theory or method
Latent	Physically unobservable
Latent-Trait Theory	Synonym for IRT; Propose models that describe how the latent trait influences performance on each test item; theoretically can take on values from -infinity to +infinity;
Characteristics of Items within an IRT Framework	Difficulty Leel of an Item \| Item's Level of Discrimination
Difficulty	Refers to the attribute of not being easily accomplished, solved, or comprehended; May also refer to physical difficulty
Physical Difficulty	How hard or easy it is for a person to engage in a particular activity
Discrimination	Signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured
Dichotomous Test Items	Test items or questions that can be answered with only one of two alternate responses, such as true-false, yes-no, or correct-incorrect questions
Polytomous Test Items	TEst items or questions with three or more alternative responses, where only one is scored correct or scored as being consistent with a targeted trait or other construct
Georg Rasch	Developed a group of IRT models; each item on the test is assumed to have an equivalent relationship with the construct being measured by the test;
Reliability Coefficient	Helps the test developer build an adequate measuring instrument Helps the test user select a suitable test Its usefulness does not end with test construction and selection
Standard Error of Measurement	SEM; provides a measure of the precision of an obsered test score; provides an estimate of the amount of error inherent in an obsered score or measurement; inverse relationship between SEM and reliability of a test; the higher the reliability of a test (or individual subtest within a test) the lower the SEM; tool used to estimate or infer the extent to which an observed score deviates from a ture score; standard deviation of a theoretically normal distribution of test scores obtained by one person on equivalent tests
Standard Error of a Score	Another term for Standard Error of Measurement; Index of the extent to which one's individual's scores vary over tests presumed to be parallel
Confidence Interval	Range or band of test scores that is likely to contain the true score
Standard Error of the Difference	A statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant
Questions that Standard Error of the Difference Between Two Scores can Answer	How did this individual's performance on test 1 compare with his or her performance on test 2? How did this individual's performance on test 1 compare with someone else's performance on test 1? How did thisindividual's performance on test 1 compare with someone else's performance on test 2?

Psychological Testing: Chapter 5: Reliability

Reliability

Key Terms

Related Flashcard Decks

Study Tips

Company

Explore

Study Tools