Back to AI Flashcard MakerEducation /Psychological - W3 - Chapter 8 - Test Development - DN

Psychological - W3 - Chapter 8 - Test Development - DN

Education58 CardsCreated 18 days ago

Anchor protocol is a standardized test answer sheet created by the test publisher to assess and ensure the consistency and accuracy of examiners' scoring. It serves as a benchmark to compare scoring practices across different raters.

anchor protocol

a test answer sheet

developed by a test publisher

to test the accuracy of examiners’ scoring

p.280

Tap or swipe ↕ to flip
Swipe ←→Navigate
1/58

Key Terms

Term
Definition

anchor protocol

a test answer sheet

developed by a test publisher

to test the accuracy of examiners’ scoring

p.280

biased test item

an item that favours one group in relation to another

when differences in group ability are controlled

p.271

binary-choice item

multiple choice item

contains only two possible responses (true-false)

p.254

categorical scaling

system of scaling

stimuli placed in one of two or more alternative categories that differ quantitatively with respect to some continuum

categorical scoring

a method of evaluation

where test responses earn credit toward placement in a particular class/category

sometimes testtakers must meet ...

ceiling effect

diminished utility of a tool of assessment in distinguishing testtakers at the high end of the ability, trait, or other attribute being measured

Related Flashcard Decks

Study Tips

  • Press F to enter focus mode for distraction-free studying
  • Review cards regularly to improve retention
  • Try to recall the answer before flipping the card
  • Share this deck with friends to study together
TermDefinition

anchor protocol

a test answer sheet

developed by a test publisher

to test the accuracy of examiners’ scoring

p.280

biased test item

an item that favours one group in relation to another

when differences in group ability are controlled

p.271

binary-choice item

multiple choice item

contains only two possible responses (true-false)

p.254

categorical scaling

system of scaling

stimuli placed in one of two or more alternative categories that differ quantitatively with respect to some continuum

p.249

categorical scoring

a method of evaluation

where test responses earn credit toward placement in a particular class/category

sometimes testtakers must meet a set number of responses corresponding to a particular criterion to be placed in a specific category

also called class scoring

contrast with cumulative scoring & ipsative scoring

p.260

ceiling effect

diminished utility of a tool of assessment in distinguishing testtakers at the high end of the ability, trait, or other attribute being measured

p. 259, 307

class scoring

a method of evaluation

where test responses earn credit toward placement in a particular class/category

sometimes testtakers must meet a set number of responses corresponding to a particular criterion to be placed in a specific category

contrast with cumulative scoring & ipsative scoring

p.260

comparative scaling

in test development

a method of developing ordinal scales

through the use of a sorting task

entails judging a stimulus in comparison with every other stimulus used on the test

p.249

completion item

requires an examinee to provide a word or phrase that completes a sentence

p. 254

computerized adaptive testing (CAT)

an interactive, computer-administered testtaking process

items are presented to the testtaker, based in part on the testtakers’ performance on previous items

p.15, 255-256

co-norming

the test norming process conducted on two or more tests

using the same sample of testtakers

when used to validate all of the tests being normed, this process may also be referred to as co-validation

p.138n4, 278

constructed-response format

a form of test item requiring a testtaker to construct or create a response

as opposed to simply selecting a response

contrast with selected-response format

p.252

co-validation

when co-norming is used to validate all of the tests being normed

this process may also be referred to as co-validation

p.278

cross-validation

a revalidation on a sample of testtakers

other than the testtakers on whom test performance was originally found to be a valid predictor of some criterion

p.278

essay item

a test item that requires a testtaker to write a composition

typically one that demonstrates recall of facts, understanding, analysis, and/or interpretation

p.255

expert panel

in test development process

group of people knowledgeable about - the subject matter being tested, and/or the population for whom the test is being designed

they can provide input to improve test’s content, fairness etc.

p.274-275

floor effect

a phenomenon arising from the diminished utility of a tool of assessment in distinguishing testtakers at the low end of the ability, trait, or other attribute being measured

p. 256-259

giveaway item

a test item, usually near the beginning of a test of ability or achievement

designed to be relatively easy

usually for the purpose of building the testtakers confidence or reducing test-related anxiety

p.263n4

What three criteria must be met when correcting for the impact of guessing?

must recognize that guesses are not normally totally random

must deal with the problem of omitted items

some testtakers are lucky and others unlucky

p.269-271

Guttman scale

a scale - items range sequentially from weaker to stronger expressions of the attitude or belief being measured

constructed so that selection of an earlier item presumes that all following items are also true of the testtaker

named after its developer

p.249

ipsative scoring

approach to scoring & interpretation

responses & presumed strength of measured trait are interpreted relative to the measured strength of other traits for that testtaker

contrast with class scoring & cumulative scoring

p.260

item analysis

general term used to describe various procedures

usually statistical, designed to explore how individual items work compared to others in the test & in the context of the whole test

e.g., to explore the level of difficulty of individual items on an achievement test

e.g., to explore the reliability of a personality test

contrast with qualitative item analysis

p.262-275

item bank

a collection of questions to be used in the construction of a test

p. 255, 257-259, 282-284

item branching

in computerised adaptive testing (CAT)

the individualised presentation of test items drawn from an item bank based on the testtakers’ previous responses

p.260

item-characteristic curve (ICC)

graphic representation of the probalistic relationship between a person's level of trait (ability, characteristic) being measured and the probability for responding to an item in a predicted way

also known as a category response curve or an item trace line

p.177, 281 p.268

item-difficulty index

items cannot be too easy or too hard in order to differentiate between testtakers knowledge of the subject matter

a statistic obtained by calculating the proportion of the total number of testtakers who answered an item correctly

p is used to denote item difficulty

a subscript 1 refers to the item number = p1

can range from 0-1

the larger the item-difficulty index, the easier the item

(i.e., the higher the p, the easier the item - because p represents the number of people passing the item)

p.263-264

item-discrimination index

measure of item discrimination

symbolised by d

p.264-268

item-endorsement index

the name given to an item-difficulty test (which is used in achievement testing) when used in other contexts (e.g., personality testing)

p. 263

item fairness

a reference to the degree of bias, if any, in a test item

p. 271-272

item format

a reference to the form, plan, structure, arrangement, or layout of individual test items

including whether the test items require testtakers to select or create a response

p.252-255

item pool

the reservoir or well from which items will or will not be drawn for the final version of the test

the collection of items to be further evaluated for possible selection for use in an item bank

p.251

item-reliability index

provides an indication of the internal consistency of a test

the higher the index, the greater the internal consistency

index is equal to

the product of the item-score standard deviation (s) and

the correlation (r) between the item score and the total test score

p.264

item-validity index

a statistic designed to provide an indication of the degree to which a test is measuring what it purports to measure

important when a test developer's goal is to maximise the criterion-related validity of a test

the higher the item-validity index, the greater the test's criterion-related validity

to calculate we must first know

the item-score standard deviation (symbolised as s1, s2, s3 etc.)

and the correlation between the item score and the criterion score

then we use the item difficulty index p1 in the following formula

s1 = square root of p1 (1 - p1)

the correlation between the score on item 1 and a score on a criterion measure (r1c) is multiplied by item 1's item-score standard deviation (s1)

the product is an index of an items validity (s1 r1c)

p.264

Likert scale

summative rating scale with 5 alternative responses

ranging on a continuum from e.g., "strongly agree" to "strongly disagree"

p.247

matching item

the testtaker is presented with two columns

premises on the left & responses on the right

task is to determine which response is best matched to which premise

young testtakers (draw a line)

others typically asked to write a letter/number as a response

p.253

method of paired comparisons

a scaling method

a pair of stimuli (e.g., photos) is selected according to a rule

(e.g., "select the one that is more appealing") p.248

multiple-choice format

one of the three types of selected-response item formats

three elements

a stem

a correct alternative or option

and several incorrect alternatives (referred to as distractors or foils)

p.252

pilot work

also referred to as pilot study & pilot research

preliminary research surrounding the creation of a prototype test

general objective is to determine how best to

gauge

assess, or

evaluate the targeted construct(s)

p.243-244

qualitative item analysis

non-statistical procedures designed to explore how individual test items work

both compared to other items in the test & in the context of the whole test

unlike statistical measures, they involve exploration of the issues by verbal means

(e.g., interviews & group discussions with testtakers & other relevant parties)

p.272-275

qualitative methods

techniques of data generation & analysis

rely primarily on verbal rather than mathematical or statistical procedures

p.272

rating scale

a system of ordered numerical or verbal descriptors

used to make judgements about the presence, absence, or magnitude of a particular trait, attitude, emotion, or other variable

p.205, 247, 371

scaling

1) in test construction

the process of setting rules for assigning numbers in measurement

2) the process by which a measuring device

is designed and calibrated &

the way numbers (or other indices) are assigned to different amounts of a trait, attribute, or characteristic being measured

p.244-251

scalogram analysis

an item-analysis procedure

entails graphic mapping of a testtaker's responses

p.250

scoring drift

a discrepancy between the scoring in an anchor protocol and the scoring of another protocol

p. 280

selected-response format

a form of test item

requiring testtakers to select a response

(e.g., true/false, multiple choice, and matching items)

as opposed to creating one - contrast with constructed-response format p.252

sensitivity review

a study of test items

usually during test development

items are examined for fairness to all prospective testtakers

for the presence of offensive language, stereotypes, or situations

p.274

short-answer item

may also be referred to as a completion item

a word, term, sentence or a paragraph may qualify

anything beyond this is an essay item

p.254

summative scale

an index derived from the summing of selected scores on a test or sub-test

p. 247

test conceptualization

an early stage of the test development process

when an idea for a particular test or test revision is conceived

p.240, 241-244

test construction

a stage in the process of test development

entails writing test items (or rewriting/revising existing items)

as well as formatting items, setting scoring rules, and otherwise designing and building a test

p.240

test development

an umbrella term for all that goes into the process of creating a test

p. 240-284

test revision

action taken to modify a test's content or format

for the purpose of improving the test's effectiveness as a tool of measurement

p.240

test tryout

a stage in the process of test development that entails administering a preliminary version of a test to a representative sample of testtakers

under conditions that simulate the conditions under which the final version of the test will be administered

p.240, 261-262

"think aloud" test administration

a method of qualitative item analysis

examinees verbalize their thoughts as they take the test

useful in understanding how

individual items function in a test

testtakers interpret or misinterpret the meaning of the individual items

p.274

true-false item

a binary-choice item

i.e., contains only one of two responses

requires testtaker to indicate whether a statement is or is not a fact

p.254

validity shrinkage

the decrease in item validities that inevitably occurs after cross-validation

p. 278

What is the optimal item difficulty?

usually midpoint between 1.0 and the probability of answering correctly by guessing

which is called the chance success proportion

multi choice (50% chance of getting it right by guessing) - .5 +1.00 = 1.5 divided by 2 = .60 10:00

p.263

How can you create a visual representation of the best items on a test

(i.e., if the objective is to maximise criterion-related validity)?

this can be achieved by plotting each item's

item-validity index and

item-reliability index

p.265

Fig 8-5