A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022)

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) helps you master complex topics with simplified explanations.

Lucas Taylor
Contributor
4.8
60
9 months ago
Preview (31 of 1120 Pages)
100%
Purchase to unlock

Page 1

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 1 preview image

Loading page image...

Page 2

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 2 preview image

Loading page image...

Page 3

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 3 preview image

Loading page image...

iA COMPENDIUM OFNEUROPSYCHOLOGICALTESTS

Page 4

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 4 preview image

Loading page image...

ii

Page 5

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 5 preview image

Loading page image...

iiiA COMPENDIUM OFNEUROPSYCHOLOGICALTESTSFundamentals of Neuropsychological Assessmentand Test Reviews for Clinical PracticeF O U R T HE D I T I O NElisabeth M. S.Sherman, Jing Ee Tan,and MarianneHrabok

Page 6

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 6 preview image

Loading page image...

ivOxford University Press is a department of the University of Oxford. It furthersthe University’s objective of excellence in research, scholarship, and educationby publishing worldwide. Oxford is a registered trade mark of Oxford UniversityPress in the UK and certain other countries.Published in the United States of America by Oxford UniversityPress198 Madison Avenue, New York, NY 10016, United States of America.© Oxford University Press2022All rights reserved. No part of this publication may be reproduced, storedina retrieval system, or transmitted, in any form or by any means, withouttheprior permission in writing of Oxford University Press, or as expressly permittedby law, by license, or under terms agreed with the appropriate reproductionrights organization. Inquiries concerning reproduction outside the scopeoftheabove should be sent to the Rights Department, Oxford University Press,attheaddressabove.You must not circulate this work in any otherformand you must impose this same condition on any acquirer.CIP data is on file at the Library of CongressISBN 978–0–19–985618–3This material is not intended to be, and should not be considered, a substitute for medical or otherprofessional advice. Treatment for the conditions described in this material is highly dependent onthe individual circumstances. And, while this material is designed to offer accurate information withrespect to the subject matter covered and to be current as of the time it was written, research andknowledge about medical and health issues is constantly evolving and dose schedules for medicationsare being revised continually, with new side effects recognized and accounted for regularly. Readersmust therefore always check the product information and clinical procedures with the most up-to-datepublished product information and data sheets provided by the manufacturers and the most recentcodes of conduct and safety regulation. The publisher and the authors make no representations orwarranties to readers, express or implied, as to the accuracy or completeness of this material. Withoutlimiting the foregoing, the publisher and the authors make no representations or warranties as to theaccuracy or efficacy of the drug dosages mentioned in the material. The authors and the publisher donot accept, and expressly disclaim, any responsibility for any liability, loss, or risk that may be claimedor incurred as a consequence of the use and/or application of any of the contents of this material.Printed by Integrated Books International, United States of America

Page 7

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 7 preview image

Loading page image...

vThis book is dedicated to the memory of Dr. Esther Strauss, mentor, role model, and friend. Esther was one of the first femaleneuropsychologists whom we saw gracefully mix science, scholarship, and family. She was humble and hard-working; shetaught us that the most daunting tasks of scholarship don’t require innate stores of superlative brilliance or rarified knowledge;they simply require putting one’s head down and getting to work. Over the years, we saw her navigate life with warmth,humor, and intelligence, and witnessed her dedication to and love of neuropsychology. She died too soon, in 2009, three yearsafter the last edition of this book was published; her imprint is still there in the words of this book. She is deeply missed.We also want to acknowledge and remember Dr. Otfried Spreen. Otfried was a pioneer in neuropsychology who helped shapeneuropsychology as we know it today through successive generations of students, academics, and clinicians who relied on hiswritings and scholarly work as roadmaps on how to understand and best practice neuropsychology. The very first edition of thisbook was a compilation of tests used at the University of Victoria Neuropsychology Laboratory at a time where few commercialtests existed and neuropsychologists relied on researchers for normative data. We hope that the current edition lives up toOtfried’s initial vision of a useful compilation of tests for practicing clinicians.

Page 8

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 8 preview image

Loading page image...

vi

Page 9

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 9 preview image

Loading page image...

viiCONTENTSPrefaceix1.PSYCHOMETRICS IN NEUROPSYCHOLOGICALASSESSMENT12.VALIDITY AND RELIABILITY INNEUROPSYCHOLOGICAL ASSESSMENT:NEW PERSPECTIVES243.PERFORMANCE VALIDITY, SYMPTOM VALIDITY,AND MALINGERING CRITERIA374.PREMORBID ESTIMATION48National Adult Reading Test (NART)48Oklahoma Premorbid Intelligence Estimate-IV (OPIE-IV)58Test of Premorbid Functioning (TOPF)645.INTELLIGENCE73Kaufman Brief Intelligence Test, Second Edition(KBIT-2)73Raven’s Progressive Matrices78Reynolds Intellectual Assessment Scales, SecondEdition (RIAS-2) and Reynolds IntellectualScreening Test, Second Edition (RIST-2)87Test of Nonverbal Intelligence, Fourth Edition(TONI-4)92Wechsler Abbreviated Scale of Intelligence,Second Edition (WASI-II)96Wechsler Adult Intelligence Scale—FourthEdition (WAIS-IV)100Woodcock-Johnson IV Tests of CognitiveAbilities (WJ IV COG)1196.NEUROPSYCHOLOGICAL BATTERIES ANDRELATED SCALES129CNS Vital Signs (CNSVS)129Kaplan Baycrest Neurocognitive Assessment (KBNA)142Neuropsychological Assessment Battery (NAB)148Repeatable Battery for the Assessment ofNeuropsychological Status (RBANS Update)165Ruff Neurobehavioral Inventory (RNBI)1907.DEMENTIA SCREENING1967 Minute Screen (7MS)196Alzheimer’s Disease Assessment Scale-Cognitive(ADAS-Cog)201Clinical Dementia Rating (CDR)206Dementia Rating Scale-2 (DRS-2)213General Practitioner Assessment of Cognition(GPCOG)233Mini-Mental State Examination (MMSE),Mini-Mental State Examination, 2nd Edition(MMSE-2), and Modified Mini-Mental StateExamination (3MS)237Montreal Cognitive Assessment (MoCA)2608.ATTENTION273Brief Test of Attention (BTA)273Conners Continuous Performance Test3rd Edition (CPT3)283Integrated Visual and AuditoryContinuous Performance Test, SecondEdition (IVA-2)289Paced Auditory Serial Addition Test (PASAT)298Ruff 2 & 7 Selective Attention Test (2 & 7 Test)318Symbol Digit Modalities Test (SDMT)327Test of Everyday Attention (TEA)347Test of Variables of Attention (T.O.V.A.)3569.EXECUTIVE FUNCTIONING362Behavior Rating Inventory of ExecutiveFunction—Adult Version (BRIEF-A)362Behavioural Assessment of the DysexecutiveSyndrome (BADS)374Category Test (CAT)382Clock Drawing Test (CDT)391Cognitive Estimation Test (CET)409Delis-Kaplan Executive Function System(D-KEFS)419Design Fluency Test434Dysexecutive Questionnaire (DEX)442Five-Point Test448Frontal Systems Behavior Scale (FrSBe)457Hayling and Brixton Tests466Ruff Figural Fluency Test (RFFT)480Stroop Test (Stroop)488Trail Making Test (TMT)518Verbal Fluency Test549Wisconsin Card Sorting Test (WCST)583

Page 10

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 10 preview image

Loading page image...

viiiviii|CO N T E N T S10.MEMORY602Benton Visual Retention Test Fifth Edition (BVRT-5)602Brief Visuospatial Memory Test—Revised (BVMT-R)614California Verbal Learning Test—Second Edition(CVLT-II)624Continuous Visual Memory Test (CVMT)636Hopkins Verbal Learning Test—Revised (HVLT-R)642Rey Auditory Verbal Learning Test (RAVLT)665Rey-Osterrieth Complex Figure Test (RCFT)697Rivermead Behavioural Memory Test—ThirdEdition (RBMT-3)720Selective Reminding Test (SRT)726Tactual Performance Test (TPT)752Warrington Recognition Memory Test (WRMT)760Wechsler Memory Scale—Fourth Edition(WMS-IV)76911.LANGUAGE786Boston Diagnostic Aphasia Examination ThirdEdition (BDAE-3)786Boston Naming Test, Second Edition (BNT-2)797Multilingual Aphasia Examination Third Edition(MAE)829Token Test83512.VISUAL-SPATIAL SKILLS843Benton Facial Recognition Test (FRT)843Hooper Visual Organization Test (HVOT)850Judgment of Line Orientation (JLO)85813.SENSORY FUNCTION871Bells Cancellation Test871Finger Localization876University of Pennsylvania Smell IdentificationTest (UPSIT)88014.MOTOR FUNCTION892Finger Tapping Test (FTT)892Grip Strength904Grooved Pegboard Test914Purdue Pegboard Test92315.PERFORMANCE VALIDITY930b Test930Dot Counting Test (DCT)937Medical Symptom Validity Test (MSVT)944Non-Verbal Medical Symptom Validity Test(NV-MSVT)957Rey Fifteen-Item Test (FIT)966Test of Memory Malingering (TOMM)974Victoria Symptom Validity Test (VSVT)987Word Choice995Word Memory Test (WMT)100216.SYMPTOM VALIDITY1019Minnesota Multiphasic Personality Inventory-2(MMPI-2)1019Minnesota Multiphasic Personality Inventory-2Restructured Form (MMPI-2-RF)1038Personality Assessment Inventory (PAI)1056Structured Inventory of MalingeredSymptomatology (SIMS)1068Credits1077List of Acronyms1081Test Index1085Subject Index1097

Page 11

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 11 preview image

Loading page image...

ixPREFACEKNOW YOUR TOOLSHow well do you know your tools? Although most of ushave a fairly good grasp of the main advantages and limita-tions of the tests we use, if we dig below the surface, we seethat this knowledge can at times be quite shallow. For ex-ample, how many neuropsychologists know the test-retestreliability coefficients for all the tests in their battery or candescribe the sensitivity and specificity of their tests? This isnot because the information is lacking (although this is alsoat times a problem), and it isn’t because the information isdifficult to find. Indeed, most of the information one couldever want on neuropsychological tests can be found on theoffice shelves of practicing neuropsychologists, in the testmanuals of the tests we most frequently use. The rest can beeasily obtained via literature searches or online. A workingknowledge of neuropsychological tests is hampered by themost common of modern-day afflictions: lack of time, toomany priorities, and, for want of a better term, informationoverload.Understanding the tests we use requires enough time toread test manuals and to regularly survey the research liter-ature for pertinent information as it arises. However, thereare simply too many manuals and too many studies for theaverage neuropsychologist to stay up to date on the strengthsand weaknesses of every test used. The reality is that manytests have lengthy manuals several hundred pages long,and some tests are associated with literally hundreds, eventhousands, of research studies. The longer the neuropsy-chological battery, the higher the stack of manuals and themore voluminous the research. A thorough understandingof every test’s psychometric properties and research base, inaddition to expert competency in administration, scoring,and interpretation, requires hours and hours of time,which for most practicing neuropsychologists is simply notfeasible.Our own experience bears this out. As is always the caseprior to launching a revision of theCompendium, there wasa large number of tests to review since the previous edition,and this was compounded by the release of several majortest batteries and complex scales such as the Wechsler AdultIntelligence Scale, Fourth Edition (WAIS-IV), WechslerMemoryScale,FourthEdition(WMS-IV),AdvancedClinicalSolutions(ACS),andMinnesotaMultiphasicPersonality Test-2 Restructured Form (MMPI-2-RF) sincethe previous edition. As an example, the ACS has an onlinemanual that is almost 400 pages long, in addition to an ad-ministration and scoring manual of more than 150 pages; theMMPI-2-RF has multiple test manuals and entire books ded-icated to its use. In parallel, since the previous edition of thisbook, there was an exponential increase in the number of re-search studies involving neuropsychological tests. As authorsand practicing clinicians, we were elated at the amount ofnew scholarship on neuropsychological assessment, yet dis-mayed as our offices became stacked with paperwork andour virtual libraries and online cloud storage repeatedlyreached maximum storage capacity. The sheer volume of lit-erature that we reviewed for this book was staggering, andcompleting this book was the most challenging professionaltask we have encountered. Our wish for this book is that ourefforts will have been worth it. At the very least, we hope thatthe time we spent on this book will save the readers sometime of theirown.The essential goal for this book was to create a clinicalreference that would provide, in a relatively easy-to-read,searchable format, major highlights of the most commonlyused neuropsychological tests in the form of comprehensive,empirically based critical reviews. To do this, we balancedbetween acting as clinicians and acting as researchers: wewere researchers when we reviewed the details of thescientific literature for each test, and we were clinicianswhen providing commentary on tests, focusing as much onthe practicalities of the test as on the scientific literature. Asevery neuropsychologist knows, there are some exquisitelyresearched tests that are terrible to use in clinical practicebecause they are too long, too cumbersome, or too com-plicated, and this was essential to convey to the readershipso that the book could be of practical utility to everydayclinicians like ourselves.In addition to the core focus on test reviews, the bookwas also designed to provide an overview of foundationalpsychometricconceptsrelevanttoneuropsychologicalpractice including overviews of models of test validity andbasics of reliability which have been updated since theprevious edition. As well, woven throughout the text is agreater emphasis on performance validity and symptom

Page 12

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 12 preview image

Loading page image...

xx|PR E F A C Evalidity in each review, as well as updated criteria for malin-gered neurocognitive dysfunction. The current edition ofthis book presents a needed updating based on the pastseveral years of research on malingering and performancevalidity in neuropsychology.“KnowYourTools”continuestobetheguidingprinciplebehindthiseditionoftheCompendiumofNeuropsychological Tests. We hope that after reading thisbook, users will gain a greater understanding of critical is-sues relevant to the broader practice of neuropsycholog-ical assessment, a strong working knowledge of the specificstrengths and weaknesses of the tests they use, and, mostimportantly, an enhanced understanding of clinical neuro-psychological assessment grounded in clinical practice andresearch evidence.CHANGES COMPARED TO PRIOREDITIONSUsers will notice several changes from the previous edition.Arguably the biggest change is the exclusive focus on adulttests and norms. Not including pediatric tests and normshad to be done to prevent the book from ballooning intoabsurd proportions. As some of us have combined adult andpediatric practices, this was a painful albeit necessary deci-sion. Fortunately, pediatric neuropsychological tests are al-ready well covered elsewhere (e.g., Baron,2018).Since its first publication in 1991, theCompendiumof Neuropsychological Testshas been an essential referencetext to guide the reader through the maze of literature ontests and to inform clinicians and researchers of the psycho-metric properties of their instruments so that they can makeinformed choices and sound interpretations. The goals ofthe fourth edition of theCompendiumremain the same, al-though admittedly, given the continued expansion of thefield, our coverage is necessarily selective; in the end, we hadto make very hard decisions about which tests to include andwhich tests to omit. Ultimately, the choice of which teststo include rested on practice surveys indicating the testsmost commonly used in the field; we selectively chose thosewith at least a 10% utilization rate based on surveys. Severalsurveys were key in making these decisions (Dandachi-FitzGerald, Ponds, & Merten, 2013; LaDuke, Barr, Brodale,& Rabin, 2017; Martin, Schroeder, & Odland, 2015;Rabin, Paolillo, & Barr, 2016; Young, Roper, & Arentsen,2016). As well, a small number of personal or sentimentalfavorites made it to the final edition, including some dear toEsther and Otfried. All the reviews were extensively revisedand updated, and many new tests were added, in particulara number of new cognitive screening tests for dementia, aswell as additional performance and symptom validity testsnot covered in the prior edition. We can therefore say fairlyconfidently that the book does indeed include most of theneuropsychological tests used by most neuropsychologists.Nevertheless, we acknowledge that some readers mayfind their favorite test missing from the book. For example,we did not cover computerized concussion assessmentbatteries or some specialized computerized batteries suchas the Cambridge Neuropsychological Test AutomatedBattery (CANTAB). To our great regret, this was impos-sible for both practical and logistical reasons. These reasonsincluded but were not limited to a lower rate of usage inthe field according to survey data, but also the need toavoid more weekday evenings, early mornings, weekends,and holidays with research papers to review for this book,a regular albeit inconvenient habit in our lives for the lastseveral years. Hopefully the reviews of computerized as-sessment batteries already in the literature will compensatefor this necessary omission; a few did manage to slip intothe book as well, such as the review of the CNS Vital Signs(CNSVS).Because of the massive expansion of research studies ontests, most reviews also had to be expanded. To make roomfor these longer reviews, some of the general introductorychapters were not carried over from the prior edition, asmost of the information is available in other books and re-sources (e.g., Lezak, Howieson, Bigler, & Tranel, 2012). Weretained the chapter on psychometrics and gave validityand reliability their own chapter to better cover changingmodels in the field. We also retained the chapter on per-formance validity, symptom validity, and malingering giventheir critical importance in assessment.In this edition, we also elected not to include any scalescovering the assessment of psychopathology, unless they alsofunctioned as symptom validity scales. Psychopathologyscales are not specific to neuropsychological assessment andare reviewed in multiple other sources, including severalbooks. We retained some scales and questionnaires meas-uring neuropsychological constructs such as executive func-tion, however. Last, for this edition, we included a look-upbox at the beginning of each review outlining the mainfeatures of each test. We hope that this change will make iteasier for readers to locate critical information and to com-pare characteristics across measures.ORGANIZATION OF THE BOOKThe first chapter in this volume presents basic psychometricconcepts in neuropsychological assessment and providesan overview of critical issues to consider in evaluating testsfor clinical use. The second chapter presents new ways oflooking at validity and reliability as well as psychometricand practical principles involved in evaluating validityand reliability evidence. (Note the important table in thischapter entitled, “Top 10 Reasons for Not Using Tests,”a personal favorite courtesy of Susan Urbina [2014].)Chapter 3 presents an overview of malingering, includingupdated malingering criteria.

Page 13

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 13 preview image

Loading page image...

xiPR E F A C E|xiChapters 4 to 16 address the specific domains of de-mentiascreening,premorbidestimation,intelligence,neuropsychological batteries and related scales, attention,executive functioning, memory, language, visual-spatialskills, sensory function, motor function, performance va-lidity, and symptom validity. Tests are assigned in a rationalmanner to each of the separate domains—with the implicitunderstanding that there exists considerable commonalityand overlap across tests measuring purportedly discretedomains. This is especially true of tests measuring attentionand of those measuring executive functioning.To promote clarity, each test review follows a fixed formatand includes Domain, Age Range, Administration Time,Scoring Format, Reference, Description, Administration,Scoring, Demographic Effects, Normative Data, EvidenceforReliability,EvidenceforValidity,Performance/Symptom Validity, and Comment. In each review, we takethe bird’s-eye view while grounding our impressions in thenitty-gritty of the scientific research; we have also triedto highlight clinical issues relevant to a wide variety ofexaminees and settings, with emphasis on diversity.CAUTIONS AND CAVEATSFirst, a book of this scope and complexity will unfortunately—and necessarily—contain errors. As well, it is possiblethat in shining a spotlight on a test’s limitations, we haveinadvertently omitted or distorted some information sup-portive of its strengths and assets. For that, we apologize inadvance. We encourage readers to inform us of omissions,misinterpretations, typographical errors, and inadvertentscientific or clinical blunders so that we can correct them inthe next edition.Second, while this book presents relevant research ontests,itisnotintendedasanexhaustivesurveyofneuropsycho-logical test research, and as such, will not include every rele-vant or most up-to-date research study for each test profiled.Our aim is to provide a general overview of research studieswhile retaining mention of some older studies as historicalbackground, particularly for some of the older measuresincluded in the book. The reader is encouraged to use thebook as a jumping-off point for more detailed reading andexploration of research relevant to neuropsychological tests.Third, neuropsychology as a field still has a consider-able way to go in terms of addressing inclusivity and diver-sity, particularly with regard to ethnicity and gender. Manyolder tests and references have ignored diversity altogetheror have used outdated terms or ways of classifying anddescribing people. As much as possible we have attemptedto address this, but our well-meaning efforts will necessarilyfall short.We also want to make it explicit that norms based onethnicity/race including the ones in this book are not tobeinterpretedasreflectingphysical/biological/geneticdifferences and that the selection of which norms to useshould be a decision based on what is best for the particularpatient’s clinical situation. We acknowledge the PositionStatement on Use of Race as a Factor in NeuropsychologicalTest Norming and Performance Prediction by the AmericanAcademyofClinicalNeuropsychology(AACN),asfollows:The field of neuropsychology recognizes that environ-mental influences play the predominant role in cre-ating racial disparities in test performance. Ratherthan attributing racial differences in neuropsycholog-ical test scores to genetic or biological predispositions,neuropsychology highlights environmental factors toexplain group differences including underlying socio-economic influences; access to nutritional, preventativehealthcare, and educational resources; the psycholog-ical and medical impact of racism and discrimination;the likelihood of exposure to environmental toxinsand pollutants; as well as measurement error due tobiased expectations about the performance of histor-ically marginalized groups and enculturation intothe groups on which tests were validated. The aboveis only a partial list of factors leading to differences inperformance among so-called racial groups, but noneof these factors, including those not enumerated here,is thought to reflect any biological predisposition thatis inherent to the group in question. Race, therefore,is often a proxy for factors that are attributable to in-equity, injustice, bias, and discrimination. (https://theaacn.org/wp-content/uploads/2021/11/AACN-Position-Statement-on-Race-Norms.pdf )ACKNOWLEDGMENTSWe first acknowledge the immense contribution to the fieldof neuropsychology by Otfried Spreen and Esther Strauss,who first had the idea that neuropsychology needed a com-pendium for its tests and norms. They created the firstCompendiumin 1991 and were authors for the subsequenteditions in 1998, with Elisabeth Sherman joining them asan additional author in the 2006 edition. Both Otfriedand Esther sadly passed away after the 2006 edition waspublished, leaving a large void in the field. We hope that thisbook does justice to their aim in creating theCompendiumand that the fourth edition continues their legacy of pro-viding the field of neuropsychology with the essential refer-ence text on neuropsychological tests and testing.We express our gratitude to the numerous authorswhose published work has provided the basis for ourreviews and who provided additional information, clarifi-cation, and helpful comments. Thank you to Travis Whiteat Psychological Assessment Resources, David Shafer atPearson, Jamie Whitaker at Houghton Mifflin Harcourt,

Page 14

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 14 preview image

Loading page image...

xiixii|PR E F A C Eand Paul Green for graciously providing us with testmaterials for review, and to all the other test authors andpublishers who kindly provided us with materials. We areindebted to them for their generous support.We also wish to thank those who served as ad hocreviewers for some test reviews. Special thanks to GlennLarrabee, Jim Holdnack, and Brian Brooks who providedpractical and scholarly feedback on some of the reviews andto Kevin Bianchini and Grant Iverson for some spiriteddiscussions and resultant soul-searching on malingering.Thanks also to Amy Kovacs at Psychological AssessmentResources and Joseph Sandford at BrainTrain for checkingsome of the reviews for factual errors. An immense debt ofgratitude is owed to Shauna Thompson, M.Ed., for her in-valuable help at almost every stage of this book and espe-cially for the heavy lifting at the very end that got this bookto print.Finally, we thank our families for their love and un-derstanding during the many hours, days, months, andyears it took to write this book. Elisabeth wishes to thankMichael Brenner, who held up the fort while the bookwent on, and on, and on; she also dedicates this book toher three reasons:Madeleine, Tessa, and Lucas. Specialthanks to Tessa in particular for her flawless editing andreferencework.Jing wishes to thank Sheldon Tay, who showered herwith love and encouragement through the evenings andweekends she spent writing, and for rearranging his lifearound her writing schedule.Marianne extends gratitude to Jagjit, for support, love,dedication, humor, and his “can do” attitude that sustainedher during this book; to their children Avani, Saheli, andJorah, for continuous light and inspiration; to her Mom,who spent many hours of loving, quality time with hergrandkids so Marianne could focus on writing; and to herfamily for support and believing in her always.R E F E R E N C E SBaron, I. S. (2018).Neuropsychological evaluation of the child: Domains,methods, and case studies(2nd ed.). New York: Oxford UniversityPress.Dandachi-FitzGerald, B., Ponds, R. W. H. M., & Merten, T. (2013).Symptom validity and neuropsychological assessment: A survey ofpractices and beliefs of neuropsychologists in six European countries.Archives of Clinical Neuropsychology,28(8), 771–783. https://doi.org/10.1093/arclin/act073LaDuke, C., Barr, W., Brodale, D. L., & Rabin, L. A. (2017). Towardgenerally accepted forensic assessment practices among clinicalneuropsychologists: A survey of professional practice and commontest use.Clinical Neuropsychologist, 1–20. https://doi.org/10.1080/13854046.2017.1346711Lezak, M. D., Howieson, D. B., Bigler, E. D., & Tranel, D. (2012).Neuropsychologicalassessment(5thed.).NewYork:OxfordUniversity Press.Martin, P. K., Schroeder, R. W., & Odland, A. P. (2015). Neuropsycho-logists’ validity testing beliefs and practices:A survey of NorthAmerican professionals.Clinical Neuropsychologist,29(6), 741–776.https://doi.org/10.1080/13854046.2015.1087597Rabin, L. A., Paolillo, E., & Barr, W. B. (2016). Stability in test-usagepractices of clinical neuropsychologists in the United States andCanada over a 10-year period: A follow-up survey of INS and NANmembers.Archives of Clinical Neuropsychology,31(3), 206–230.https://doi.org/10.1093/arclin/acw007Rabin, L., Spadaccini, A., Brodale, D., Charcape, M., & Barr, W. (2014).Utilization rates of computerized tests and test batteries amongclinical neuropsychologists in the US and Canada.ProfessionalPsychology: Research and Practice,45, 368–377.Young, J. C., Roper, B. L., & Arentsen, T. J. (2016). Validity testing andneuropsychology practice in the VA healthcare system: Results fromrecent practitioner survey.Clinical Neuropsychologist,30(4), 497–514.https://doi.org/10.1080/13854046.2016.1159730

Page 15

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 15 preview image

Loading page image...

11|PSYCHOMETRICS IN NEUROPSYCHOLOGICALASSESSMENTD A N I E L J . S L I C K A N D E L I S A B E T H M . S . S H E R M A NOVERVIEWThe process of neuropsychological assessment depends toa large extent on the reliability and validity of neuropsy-chological tests. Unfortunately, not all neuropsycholog-ical tests are created equal, and, like any other product,published tests vary in terms of their “quality,” as defined inpsychometric terms such as reliability, measurement error,temporal stability, sensitivity, specificity, and predictivevalidity and with respect to the care with which test itemsare derived and normative data are obtained. In additionto commercially available tests, numerous tests developedprimarily for research purposes have found their way intoclinical usage; these vary considerably with regard to psy-chometric properties. With few exceptions, when tests orig-inate from clinical research contexts, there is often validitydata but little else, which makes estimating measurementprecision and stability of test scores a challenge.Regardless of the origins of neuropsychological tests, theircompetent use in clinical practice demands a good workingknowledge of test standards and of the specific psychometriccharacteristics of each test used. This includes familiarity withthe Standards for Educational and Psychological Testing(American Educational Research Association [AERA] et al.,2014)and a working knowledge of basic psychometrics.Texts such as those by Nunnally and Bernstein (1994) andUrbina (2014) outline some of the fundamental psycho-metric prerequisites for competent selection of tests and in-terpretation of obtained scores. Other neuropsychologicallyfocused texts such as Mitrushina et al. (2005), Lezak et al.(2012), Baron (2018), and Morgan and Ricker (2018) alsoprovide guidance. This chapter is intended to provide a broadoverview of some important psychometric concepts andproperties of neuropsychological tests that should be consid-ered when critically evaluating tests for clinicalusage.THE NORMAL CURVEWithin general populations, the frequency distributionsof a large number of physical, biological, and psychologicalattributes approximate a bell-shaped curve, as shown inFigure 1–1. Thisnormal curveornormal distribution,so named by Karl Pearson, is also known as theGaussianorLaplace-Gaussdistribution,afterthe18th-centurymathematicians who first defined it. It should be notedthat Pearson later stated that he regretted his choice of“normal” as a descriptor for the normal curve because it had“the disadvantage of leading people to believe that all otherdistributions of frequency are in one sense or another ‘ab-normal.’ That belief is, of course, not justifiable” (Pearson,1920, p. 25).The normal distribution is central to many commonlyused statistical and psychometric models and analyticmethods (e.g., classical test theory) and is very often theimplicitly or explicitly assumed population distributionfor psychological constructs and test scores, though this as-sumption is not always correct.D E F I N I T I O NA N DC H A R A C T E R I S T I C SThe normal distribution has a number of specific properties.It is unimodal, perfectly symmetrical, and asymptotic at thetails. With respect to scores from measures that are normallydistributed, theordinate, or height of the curve at any pointalong thex(test score) axis, is the proportion of personswithin the sample who obtained a given score. The ordinatesfor a range of scores (i.e., between two points on thexaxis)may also be summed to give the proportion of persons whoobtained a score within the specified range. If a specifiednormal curve accurately reflects a population distribution,then ordinate values are also equivalent to the probabilityof observing a given score or range of scores when randomlysampling from the population. Thus, the normal curve mayalso be referred to as aprobability distribution.xf(x)Figure 1–1The normalcurve.

Page 16

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 16 preview image

Loading page image...

2|A CO MpE NdI U MOf NE U R Op s yC H O L OgI C A LTEsTs2The normal curve is mathematically defined as follows:fxex( )()=1222πσμ[1]Where:x= measurement values (test scores)μ= the mean of the test score distributionσ= the standard deviation of the test score distributionπ= the constantpi(3.14 . . . )e= the base of natural logarithms (2.71 . . . )f(x) = the height (ordinate) of the curve for any giventestscoreR E L E V A N C EF O RA S S E S S M E N TAs noted previously, because it is a frequency distribu-tion, the area under any given segment of the normal curveindicates the frequency of observations or cases withinthat interval. From a practical standpoint, this providespsychologists with an estimate of the “normality” or “ab-normality” of any given test score or range of scores (i.e.,whether it falls in the center of the bell shape, where themajority of scores lie, or instead at either of the tail ends,where few scores can be found).S TA N D A R D I Z E D S C O R E SAn individual examinee’s raw score on a test has little valueon its own and only takes on clinical meaning by com-paring it to the raw scores obtained by other examinees inappropriate normative orreference samples. When referencesample data are normally distributed, then raw scores maybestandardizedor converted to a metric that denotes rankrelative to the participants comprising the reference sample.To convert raw scores to standardized scores, scores may belinearly transformed or “standardized” in several ways. Thesimplest standard score is thez score, which is obtained bysubtracting the sample mean score from an obtained scoreand dividing the result by the sample standard deviation, asshowbelow:zxXSD=()/[2]Where:x= measurement value (testscore)X= the mean of the test score distributionSD= the standard deviation of the test scoredistributionThe resulting distribution ofzscores has a mean of 0 anda standard deviation (SD) of 1, regardless of the metric ofraw scores from which it was derived. For example, given amean of 25 and anSDof 5, a raw score of 20 translates intoazscore of −1.00. In addition to thezscore, linear trans-formation can be used to produce other standardized scoresthat have the same properties. The most common of theseare T scores (mean [M] = 50,SD= 10) and standardizedscores used in most IQ tests (M= 10,SD= 3, andM= 100,SD= 15). It must be remembered thatzscores, T scores,and all other standardized scores are derived fromsamples;although these are often treated as population values, anylimitations of generalizability due to reference sample com-position or testing circumstances must be taken into con-sideration when standardized scores are interpreted.T H EM E A N I N GO FS TA N D A R D I Z E DT E S T S C O R E SAs well as facilitating translation of raw scores to estimatedpopulation ranks, standardization of test scores, by virtue ofconversion to a common metric, facilitates comparison ofscores across measures—as long as critical assumptions aremet, including that raw score distributions of tests being com-pared are approximately normal. In addition, if standardizedscores are to be compared, they should be derived from similarsamples or, more ideally, from the same sample. A T score of50 on a test normed on a population of university studentsdoes not have the same meaning as an “equivalent” T scoreon a test normed on a population of older adults. Whencomparing standardized scores, one must also take into con-sideration both the reliability of the two measures and theirintercorrelation before determining if a significant differenceexists (see Crawford & Garthwaite, 2002). In some cases (e.g.,tests with low precision), relatively large disparities betweenstandardized scores may not actually reflect reliable differencesand therefore may not be clinically meaningful. Furthermore,statistically significant or reliable differences between testscores may be common in a reference sample; therefore, thebase rate of score differences in reference samples must also beconsidered. One should also keep in mind that when raw testscores are not normally distributed, standardized scores willnot accurately reflect actual population rank, and differencesbetween standardized scores will be misleading.Note also that comparability across tests does not implyequality in meaning and relative importance of scores. Forexample, one may compare standardized scores on measuresof pitch discrimination and intelligence, but it will rarely bethe case that these scores are of equal clinical or practicalsignificance.S TA N D A R D I Z E D P E R C E N T I L E SThe standardized scores just described are useful but alsosomewhat abstract. In comparison, a more easily under-standable and clinically useful metric is thepercentile,which denotes the percentage of scores that fall at or below

Page 17

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 17 preview image

Loading page image...

ps yC H O M E T R I CsI NNE U R Op s yC H O L OgI C A LA s sEs sM E N T|33a given test score. It is critically important to distinguishbetween percentile scores that are derived directly fromraw untransformed test score distributions and percentilescores that are derived from linear transformations of rawtest scores because the two types of percentile scores willonly be equivalent when reference sample distributions arenormally distributed, and they may diverge quite mark-edly when reference sample distributions are non-normal.Unfortunately, there is no widely used nomenclature to dis-tinguish between the two types of percentiles, and so it maynot always be clear which type is being referred to in testdocumentation and research publications. To ensure claritywithin this chapter, percentile scores derived from lineartransformations of raw test scores are always referred to asstandardized percentiles.When raw scores have been transformed into standard-ized scores, the corresponding standardized percentile rankcan be easily looked up in tables available in most statisticaltexts or quickly obtained via online calculators.Zscoreconversions to percentiles are shown in Table 1–1. Notethat this method for deriving percentiles should only beused when raw score distributions are normally distributed.When raw score distributions are substantially non-normal,percentiles derived via linear transformation will not accu-rately correspond to actual percentile ranks within the ref-erence samples from which they were derived.I N T E R P R E TAT I O NO FS TA N D A R D I Z E DP E R C E N T I L E SAn important property of the normal curve is that the re-lationship between raw orzscores (which for purposes ofthis discussion are equivalent since they are linear trans-formations of each other) and percentiles is not linear. Thatis, a constant difference between raw orzscores will be as-sociated with a variable difference in percentile scores as afunction of the distance of the two scores from the mean.This is due to the fact that there are proportionally moreobservations (scores) near the mean than there are fartherfrom the mean; otherwise, the distribution would be rec-tangular, or non-normal. This can readily be seen in Figure1–2, which shows the normal distribution with demar-cation ofzscores and corresponding percentile ranges.Because percentiles have a nonlinear relationship with rawscores, they cannot be used for some arithmetic proceduressuch as calculation of average scores; standardized scoresmust be used instead.The nonlinear relation betweenzscores and percentileshas important interpretive implications. For example, a one-point difference between twozscores may be interpreteddifferently depending on where the two scores fall on thenormal curve. As can be seen, the difference between azscore of 0 and azscore of +1.00 is 34 percentile points, be-cause 34% of scores fall between these twozscores (i.e., thescores being compared are at the 50th and 84th percentiles).However, the difference between azscore of +2.00 and azscore of +3.00 is less than three percentile points becauseonly 2.5% of the distribution falls between these two points(i.e., the scores being compared are at the 98th and 99.9thpercentiles). On the other hand, interpretation of percen-tile score differences is also not straightforward in that anequivalent “difference” between two percentile rankingsmay entail different clinical implications depending onwhether the scores occur at the tail end of the curve or ifthey occur near the middle of the distribution. For example,the 30 percentile point difference between scores at the 1stand 31st percentiles will be more clinically meaningful thanthe same 30 percentile point difference between scores atthe 35th and 65th percentiles.I N T E R P R E T I N GE X T R E M ES TA N D A R D I Z E DS C O R E SA final critical issue with respect to the meaning of standard-ized scores has to do with extreme observations. In clinicalpractice, one may encounter standardized scores that areeither extremely low or extremely high. The meaning andcomparability of such scores will depend critically on thecharacteristics of the normative samples from which theyare derived.For example, consider a hypothetical case in which anexaminee obtains a raw score that is below the range ofscores found in a normative sample. Suppose further thatthe examinee’s raw score translates to azscore of −5.00,nominally indicating that the probability of encounteringthis score in the normative sample would be 3 in 10 mil-lion (i.e., a percentile ranking of .00003). This represents aconsiderable extrapolation from the actual normative data,as (1) the normative sample did not include 10 millionindividuals, and (2) not a single individual in the normativesample obtained a score anywhere close to the examinee’sscore. The percentile value is therefore an extrapolation andconfers a false sense of precision. While one may be con-fident that it indicates impairment, there may be no basisto assume that it represents a meaningfully “worse” perfor-mance than azscore of −3.00, or of−4.00.Theestimated prevalence valueof an obtained standardscore can be calculated to determine whether interpreta-tion of extreme scores may be appropriate. This is simply ac-complished by inverting the percentile score correspondingto thezscore (i.e., dividing 1 by the percentile score). Forexample, azscore of −4 is associated with an estimatedfrequency of occurrence or prevalence of approximately0.00003. Dividing 1 by this value gives a rounded result of33,333. Thus, the estimated prevalence value of this score inthe population is 1 in 33,333. If the normative sample fromwhich azscore is derived is considerably smaller than thedenominator of the estimated prevalence value (i.e., 33,333

Page 18

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 18 preview image

Loading page image...

4|A CO MpE NdI U MOf NE U R Op s yC H O L OgI C A LTEsTs4TABLE1–1score ConversionTablesTANdARdsCOREsaTsCOREssCALEdsCOREsbpERCENTILEsZ/+ZpERCENTILEssCALEdsCOREsbTsCOREssTANdARdsCOREsa≤55≤20≤1≤0.1≤3.00≥≥99.9≥19≥80≥14556–6021–232<12.67–2.99>991877–99140–14461–6724–27312.20–2.66991773–76133–13968–7028–30421.96–2.19981670–72130–13271–723131.82–1.959769128–12973–7432–3341.70–1.819667–68126–12775–7634551.60–1.69951566124–1257761.52–1.5994123783571.44–1.519365122793681.38–1.43926412180691.32–1.3791141208137101.26–1.319063119111.21–1.25898238121.16–1.20886211883131.11–1.15871178439141.06–1.108661116151.02–1.05858540716.98–1.0184136011517.94–.9783864118.90–.9382591148719.86–.898111320.83–.8580884221.79–.82795811222.76–.78788923.73–.75771114324.70–.72765790825.66–.69751211026.63–.6574914427.60–.62735610928.57–.597229.54–.56719230.52–.53701084531.49–.5169559332.46–.486810733.43–.4567944634.40–.42665410635.38–.396536.35–.376495937.32–.3463111054738.30–.3162539639.27–.296110440.25–.266041.22–.2459974842.19–.21585210343.17–.185744.14–.16569845.12–.13551024946.09–.1154519947.07–.085310148.04–.065249.02–.0351100501050.00–.01501050100aM= 100,SD= 15.bM= 10,SD= 3.

Page 19

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 19 preview image

Loading page image...

ps yC H O M E T R I CsI NNE U R Op s yC H O L OgI C A LA s sEs sM E N T|55in the example), then some caution may be warranted ininterpreting the percentile. In addition, whenever such ex-treme scores are being interpreted, examiners should alsoverify that the examinee’s raw score falls within the range ofraw scores in the normative sample. If the normative samplesize is substantially smaller than the estimated prevalencesample sizeandthe examinee’s score falls outside the samplerange, then standardized scores and associated percentilesshould be interpreted with considerable caution. Regardlessof thezscore value, it must also be kept in mind that in-terpretation of the associated percentile value may not bejustifiable if the normative sample has a significantly non-normal distribution. In sum, the clinical interpretation ofextreme scores depends to a large extent on how extremethe score is and on the properties of the reference samplesinvolved. One can have more confidence that a percen-tile is reasonably accurate if (1) the score falls within therange of scores in the reference sample, (2) the referencesample is large and accurately reflects relevant populationparameters, and (3) the shape of the reference sample distri-bution is approximately normal, particularly in tail regionswhere extreme scores arefound.NON-NORMALITyAlthough ideal from a psychometric standpoint, normaldistributions appear to be the exception rather than therule when it comes to normative data for psychologicalmeasures, even for very large samples. In a landmark study,Micceri (1989) analyzed 400 reference samples for psycho-logical and education tests, including 30 national tests and131 regional tests. He found that extremes of asymmetryand multimodality were the norm rather than the exceptionand so concluded that the “widespread belief in the naïveassumption of normality” of score distributions for psycho-logical tests is not supported by the actual data (p. 156).The primary factors that lead to non-normal test scoredistributions have to do with test design, reference samplecharacteristics, and the constructs being measured. Moreconcretely, these factors include (1)test item sets thatdo not cover a full range of difficulty resulting in floor/ceiling effects, (2)the existence of distinct unseparatedsubpopulations within reference samples, and (3) the abil-ities being measured are not normally distributed in thepopulation.S K E WAs with the normal curve, some varieties of non-normalitymay be characterized mathematically.Skewis a formalmeasure of asymmetry in a frequency distribution thatcan be calculated using a specific formula (see Nunnally &Bernstein, 1994). It is also known as thethird moment of adistribution(the mean and variance are the first and secondmoments, respectively). A true normal distribution is per-fectly symmetrical about the mean and has a skew of zero.A non-normal but symmetric distribution will also have askew value that is at or near zero. Negative skew values indi-cate that the left tail of the distribution is heavier (and oftenmore elongated) than the right tail, which may be trun-cated, while positive skew values indicate that the oppositepattern is present (see Figure 1–3). When distributions areskewed, the mean and median are not identical; the meanwill not be at the midpoint in rank, andzscores will notaccurately translate into sample percentile rank values. Theerror in mapping ofzscores to sample percentile ranksincreases as skew increases.T R U N C AT E D D I S T R I B U T I O N SSignificant skew often indicates the presence of a truncateddistribution, characterized by restriction in the range ofscores on one side of a distribution but not the other, as isthe case, for example, with reaction time measures, whichcannot be lower than several hundred milliseconds, butcan reach very high positive values in some individuals. Infact, distributions of scores from reaction time measures,whether aggregated across trials on an individual level oracross individuals, are often characterized by positive skewand positive outliers. Mean values may therefore be posi-tively biased with respect to the “central tendency” of thedistribution as defined by other indices, such as the me-dian. Truncated distributions are also commonly seen forerror scores. A good example of this is failure to maintainset (FMS) scores on the Wisconsin Card Sorting Test (see34%34%13.5%13.5%2.35%0.15%+3+2+1–1–2–30.15%2.35%0Figure 1–2The normal curve demarcated by z scores.Positive SkewNegative SkewFigure 1–3Skewed distributions.

Page 20

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 20 preview image

Loading page image...

6|A CO MpE NdI U MOf NE U R Op s yC H O L OgI C A LTEsTs6review in this volume). In a normative sample of 30-to 39-year-old persons, observed raw scores range from 0 to 21,but the majority of persons (84%) obtain scores of 0 or 1,and less than 1% obtain scores greater than3.F L O O RA N DC E I L I N GE F F E C T SFloor and ceiling effects may be defined as the presence oftruncated tails in the context of limitations in range of itemdifficulty. For example, a test may be said to have ahigh floorwhen a large proportion of the examinees obtain raw scoresat or near the lowest possible score. This may indicate thatthe test lacks a sufficient number and range of easier items.Conversely, a test may be said to have alow ceilingwhenthe opposite pattern is present (i.e., when a high numberof examinees obtain raw scores at or near the highest pos-sible score). Floor and ceiling effects may significantly limitthe usefulness of a measure. For example, a measure with ahigh floor may not be suitable for use with low functioningexaminees, particularly if one wishes to delineate level ofimpairment.M U LT I M O D A L I T YA N DO T H E RT Y P E SO FN O N -N O R M A L I T YMultimodalityis the presence of more than one “peak” in afrequency distribution (see the histogram in Figure 1–4 foran example). Pronounced multimodality strongly suggeststhe presence of two or more distinct subpopulationswithin a reference sample, and test developers who areconfronted with such data should strongly consider evalu-ating grouping variables (e.g., level of education) that mightseparate examinees into subgroups that have better shapedscore distributions. Another form of non-normality is theuniformornear-uniform distribution(a distribution withno or minimal peak and relatively equal frequency acrossall scores), though this type of distribution is rarely seen inpsychologicaldata.S U B G R O U P SV E R S U SL A R G E RR E F E R E N C E S A M P L E SScoredistributionsforageneralpopulationandsubpopulations may not share the same shape. Scores maybe normally distributed within an entire population butnot normally distributed within specific subgroups, and theconverse may also be true. Scores from general populationsand subgroups may even be non-normal in different ways(e.g., positively vs. negatively skewed). Therefore, test usersshould not assume that reference samples and subgroupsfrom those samples share a common distribution shape butshould carefully evaluate relevant data from test manualsor other sources to determine the characteristics of thedistributions of any samples or subsamples they may utilizeto obtain standardized scores. It should also be noted thateven when an ability being measured is normally distrib-uted within a subgroup, distributions of scores from suchsubgroups may nevertheless be non-normal if tests do notinclude sufficient numbers of items covering a wide enoughrange of difficulty, particularly at very low and high levels.For example, score distributions from intelligence testsmay be truncated and/or skewed within subpopulationswith very low or high levels of education. Within suchsubgroups, test scores may be of limited utility for rankingindividuals because of ceiling and floor effects.S A M P L ES I Z EA N DN O N -N O R M A L I T YThe degree to which a given distribution approximates theunderlying population distribution increases as the numberof observations (N) increases and becomes less accurate asNdecreases. This has important implications for norms de-rived from small samples. A larger sample will produce amore normal distribution, but only if the underlying pop-ulation distribution from which the sample is obtained isnormal. In other words, a largeNdoes not “correct” fornon-normality of an underlying population distribution.However, small samples may yield non-normal test scoredistributions due to random sampling errors, even when theconstruct being measured is normally distributed withinthe population from which the sample is drawn. That is, onemay not automatically assume, given a non-normal distribu-tion in a small sample, that the population distribution is infact non-normal (note that the converse may also betrue).N O N -N O R M A L I T YA SAF U N D A M E N TA LC H A R A C T E R I S T I CO FC O N S T R U C T SB E I N G M E A S U R E DDepending on the characteristics of the construct beingmeasured and the purpose for which a test is being designed,a normal distribution of reference sample scores may not beexpected or even desirable. In some cases, the populationdistribution of the construct being measured may not benormally distributed (e.g., reaction time). Alternatively,test developers may want to identify and/or discriminatebetween persons at only one end of a continuum of abili-ties. For example, the executive functioning scales reviewedin this volume are designed to detect deficits and not exec-utive functioning strengths; aphasia scales work the sameway. These tests focus on the characteristics of only one sideof the distribution of the general population (i.e., the lowerend), while the characteristics of the other side of the dis-tribution are less of a concern. In such cases, measures mayeven be deliberately designed to have floor or ceiling effectswhen administered to a general population. For example,if one is not interested in one tail (or even one-half ) of thedistribution, items that would provide discrimination inthat region may be omitted to save administration time. Inthis case, a test with a high floor or low ceiling in the general

Page 21

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 21 preview image

Loading page image...

ps yC H O M E T R I CsI NNE U R Op s yC H O L OgI C A LA s sEs sM E N T|77Percentiles0.020304050607080Raw ScoreMean = 50,SD= 100.8688493Figure 1–4A non- normal test score distribution.population (and with positive or negative skew) may bemore desirable than a test with a normal distribution.Nevertheless, all things being equal, a more normal-lookingdistribution of scores within the targeted subpopulation isusually desirable, particularly if tests are to be used acrossthe range of abilities (e.g., intelligence tests).I M P L I C AT I O N SO FN O N -N O R M A L I T YWhen reference sample distributions are substantially non-normal, any standardized scores derived by linear transfor-mation, such as T scores and standardized percentiles, willnot accurately correspond to actual percentile ranks withinthe reference sample (and, by inference, the reference pop-ulation). Depending on the degree of non-normality, thedegree of divergence between standardized scores andpercentiles derived directly from reference sample rawscores can be quite large. For a concrete example of thisproblem, consider the histogram in Figure 1–4, whichshows a hypothetical distribution (n= 1,000) of raw scoresfrom a normative sample for a psychological test. To sim-plify the example, the raw scores have a mean of 50 and astandard deviation of 10, and therefore no linear transfor-mation is required to obtain T scores. From a glance, it isreadily apparent that the distribution of raw scores is grosslynon-normal; it is bimodal with a truncated lower tail andsignificant positive skew, consistent with a significant flooreffect and the likely existence of two distinct subpopulationswithin the normative sample.A normal curve derived from the sample mean andstandard deviation is overlaid on the histogram in Figure1–4 for purposes of comparing the assumed distribution ofraw scores corresponding to T scores with the actual dis-tribution of raw scores. As can be seen, the shapes of theassumed and actual distributions differ quite considerably.Percentile scores derived directly from the raw test scoresare also shown for given T scores to further illustrate thedegree of error that can be associated with standardizedscores derived via linear transformation when referencesample distributions are non-normal. For example, a Tscore of 40 nominally corresponds to the 16th percentile,but, with respect to the hypothetical test being consideredhere, a T score of 40 actually corresponds to a level of per-formance that falls below the 1st percentile within the ref-erence sample. Clearly, the difference between percentilesderived directly from the sample distribution as opposedto standardized percentiles is not trivial and has significantimplications for clinical interpretation. Therefore, when-ever reference sample distributions diverge substantiallyfrom normality, percentile scores derived directly fromuntransformed raw test scores must be used rather thanscaled scores and percentiles derived from linear transform-ations, and tables with such data should be provided by testpublishers as appropriate. Ultimately, regardless of what in-formation test publishers provide, it is always incumbent onclinicians to evaluate the degree to which reference sampledistributions depart from normality in order to determinewhich types of scores should beused.C O R R E C T I O N SF O RN O N -N O R M A L I T YAlthough the normal curve is from many standpointsan ideal or even expected distribution for psychologicaldata, reference sample scores do not always conform to anormal distribution. When a new test is constructed, non-normality can be “corrected” by examining the distributionof scores on the prototype test, adjusting test properties,and resampling until a normal distribution is reached. Forexample, when a test is first administered during a try-outphase and a positively skewed distribution is obtained (i.e.,

Page 22

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 22 preview image

Loading page image...

8|A CO MpE NdI U MOf NE U R Op s yC H O L OgI C A LTEsTs8with most scores clustering at the tail end of the distribu-tion), the test likely has too high a floor. Easy items can thenbe added so that the majority of scores fall in the middleof the distribution rather than at the lower end (Urbina,2014). When this is successful, the greatest numbers ofindividuals obtain about 50% of items correct. This level ofdifficulty usually provides the best differentiation betweenindividuals at all ability levels (Urbina,2014).When confronted with reference samples that are notnormally distributed, some test developers resort to a va-riety of “normalizing” procedures, such as log transform-ations on the raw data, before deriving standardized scores.A discussion of these procedures is beyond the scope ofthis chapter, and interested readers are referred to Urbina(2014). Although they can be useful in some circumstances,normalization procedures are by no means a panacea be-cause they often introduce problems of their own with re-spect to interpretation. Urbina (2014) states that scoresshould only be normalized if (1) they come from a largeand representative sample, or (2) any deviation from nor-mality arises from defects in the test rather than character-istics of the sample. Furthermore, it is preferable to modifytest content and procedures during development (e.g., byadding or modifying items) to obtain a more normal dis-tribution of scores rather than attempting to transformnon-normal scores into a normal distribution. Whenevernormalization procedures are used, test publishers shoulddescribe in detail the nature of any sample non-normalitythat is being corrected, the correction procedures used, andthe degree of success of such procedures (i.e., the distribu-tion of scores after application of normalizing proceduresshould be thoroughly described). The reasons for correc-tion should also be justified, and percentile conversions de-rived directly from un-normalized raw scores should also beprovided as an option for users. Despite the limitations in-herent in methods for correcting for non-normality, Urbina(2014) notes that most test developers will probably con-tinue to use such procedures because normally distributedtest scores are required for some statistical analyses. Froma practical point of view, test users should be aware of themathematical computations and transformations involvedin deriving scores for their instruments. When all otherthings are equal, test users should choose tests that provideinformation on score distributions and any procedures thatwere undertaken to correct non-normality over those thatprovide partial or no information.P E R C E N T I L E SD E R I V E DD I R E C T LYF R O MR A WS C O R ED I S T R I B U T I O N SA SAP R I M A R YM E T R I CF O RT E S TR E S U LT SCrawford and Garthwaite (2009) argue that, for clinicalassessments, percentile scores derived directly from raw scoredistributions should always be obtained and they should serveas the primary metric for interpretation and presentation oftest results in reports. These researchers state that “percentileranks express scores in a form that is of greater relevance to theneuropsychologist than any alternative metric because theytell us directly how common or uncommon such scores arein the normative population” (p. 194). They note that whenreference sample distributions are normally distributed, stan-dardized scores are also useful, particularly for certain arith-metical and psychometric procedures for which percentilescannot be used, such as averaging scores. However, raw scorepercentiles must always be used instead of standardized scoreswhenever reference samples are non-normal as the latter haveminimal meaning in such cases. Crawford, Garthwaite, andSlick (2009) also advance the preceding argument and, inaddition, provide a proposed set of reporting standards forpercentiles as well as detailed methods for calculating accu-rate confidence intervals for raw score percentiles—includinga link to free software for performing the calculations onDr.JohnCrawford’swebsite(https://homepages.abdn.ac.uk/j.crawford/pages/dept/psychom.htm). It is good prac-tice to include confidence intervals when percentiles arepresented in reports, particularly in high-stakes assessmentswhere major decisions rely on finite score differences (e.g., de-termination of intellectual disability for criminal-forensic ordisability purposes).E X T R A P O L AT I O NA N DI N T E R P O L AT I O NDespite the best efforts of test publishers to obtain optimumreference samples, there are times when such samples fallshort with respect to score ranges or cell sizes for subgroupssuch as age categories. In these cases, test developers mayturn to extrapolation and/or interpolation for purposes ofobtaining a full range of scaled scores, using techniques suchas multiple regression. For example, Heaton and colleagueshave published sets of norms that use multiple regression toderive scaled scores that are adjusted for demographic char-acteristics, including some for which reference sample sizesare very small (Heaton et al., 2003). Although multiple re-gression is robust to slight violations of assumptions, substan-tial estimation errors may occur when model assumptions areviolated.Test publishers sometimes derive standardized scoreconversionsbyextrapolationbeyondtheboundsofvariables such as age within a reference sample. Such normsshould always be used with considerable caution due to thelack of actual reference data. Extrapolation methods, suchas regression techniques, depend on trends in the referencedata. Such trends can be complex and difficult to model,changing slope quite markedly across the range of predictorvariables. For example, in healthy individuals, vocabularyincreases exponentially during preschool years, but then therate of acquisition begins to taper off during early schoolyears and slows considerably over time through early adult-hood, remains relatively stable in middle age, and thenshows a minor decrease with advancing age. Modeling such

Page 23

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 23 preview image

Loading page image...

ps yC H O M E T R I CsI NNE U R Op s yC H O L OgI C A LA s sEs sM E N T|99complex curves in a way that allows for accurate extrapola-tion is certainly a challenge, and even a well-fitting modelthat is extended beyond actual data points provides only aneducated guess that may not be accurate.Interpolation, utilizing the same types of methodsas are employed for extrapolation, is sometimes used forderiving standardized scores when there are gaps in refer-ence samples with respect to variables such as age or yearsof education. When this is done, the same limitations andinterpretive cautions apply. Whenever test publishers useextrapolation or interpretation to derive scaled scores, themethods employed should be adequately described, anyviolations of underlying assumptions of statistical modelsutilized should be noted, and estimation error metricsshould be reported.MEAsUREMENT ERRORA good working understanding of conceptual issues andmethods of quantifying measurement error is essential forcompetent clinical practice. We start our discussion of thistopic with concepts arising from classical test theory.T R U E S C O R E SA central element of classical test theory is the conceptof atrue score, or the score an examinee would obtain ona measure in the absence of any measurement error (Lord& Novick, 1968). True scores can never be known. Instead,they are estimated and are conceptually defined as the meanscore an examinee would obtain across an infinite numberof equivalent randomly sampled parallel forms of a test, as-suming that the examinee’s scores were not systematicallyaffected by test exposure, practice, or other time-relatedfactors such as maturation (Lord & Novick, 1968). Incontrast to true scores,obtained scoresare the actual scoresyielded by tests. Obtained scores include any measurementerror associated with a given test. That is, they are the sumof true scores and error. Note that measurement error in theclassical model arises only from test characteristics; meas-urement error arising from particular characteristics of in-dividual examinees or testing circumstances is not explicitlyaddressed or accountedfor.In the classical model, the relation between obtainedand true scores is expressed in the following formula, whereerror (e) is random and all variables are assumed to be nor-mally distributed:xte=+[3]Where:x= obtainedscoret= truescoree= errorWhen test reliability is less than perfect, as is always thecase, the net effect of measurement error across examinees isto bias obtained scores outward from the population mean.That is, scores that are above the mean are most likely higherthan true scores, while those that are below the mean aremost likely lower than true scores (Lord & Novick, 1968).Estimated true scorescorrect this bias by regressing obtainedscores toward the normative mean, with the amount of re-gression depending on test reliability and deviation of theobtained score from the mean. The formula for estimatedtrue scores (t)is:tXrxXxx′ =+[()][4]Where:X= mean testscorerxx= test reliability (internal consistency reliability)x= obtainedscoreIf working withzscores, the formula is simpler:trzxx′ =×[5]Formula 4 shows that an examinee’s estimated true scoreis the sum of the mean score of the group they belongto (i.e., the normative sample) and the deviation of theirobtained score from the normative mean weighted bytest reliability (as derived from the same normativesample). Furthermore, as test reliability approaches unity(i.e.,r= 1.0), estimated true scores approach obtainedscores (i.e., there is little measurement error, so estimatedtrue scores and obtained scores are nearly equivalent).Conversely, as test reliability approaches zero (i.e., whena test is extremely unreliable), estimated true scores ap-proach the mean test score. That is, when a test is highlyreliable, greater weight is given to obtained scores thanto the normative mean score; but, when a test is very un-reliable, greater weight is given to the normative meanscore than to obtained scores. Practically speaking, esti-mated true scores will always be closer to the mean thanobtained scores (except, of course, where the obtainedscore is at themean).T H EU S EO FT R U ES C O R E SI NC L I N I C A L P R A C T I C EAlthough the true score model is abstract, it has practicalutility and important implications for test score interpre-tation. For example, what may not be immediately obviousfrom Formulas 4 and 5 is readily apparent in Table 1–2: esti-mated true scores translate test reliability (or lack thereof )into the same metric as actual test scores.As can be seen in Table 1–2, the degree of regressionto the mean of true scores is inversely related to test relia-bility and directly related to degree of deviation from the

Page 24

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 24 preview image

Loading page image...

10|A CO MpE NdI U MOf NE U R Op s yC H O L OgI C A LTEsTs10reference mean. This means that the more reliable a test is,the closer obtained scores are to true scores and that the fur-ther away the obtained score is from the sample mean, thegreater the discrepancy between true and obtained scores.For a highly reliable measure such as Test 1 (r= .95), truescore regression is minimal even when an obtained score liesa considerable distance from the sample mean; in this ex-ample, a standard score of 130, or twoSDs above the mean,is associated with an estimated true score of 129. In con-trast, for a test with low reliability, such as Test 3 (r= .65),true score regression is quite substantial. For this test, anobtained score of 130 is associated with an estimated truescore of 120; in this case, fully one-third of the observed de-viation from the mean is “lost” to regression when the esti-mated true score is calculated.Such information has important implications with re-spect to interpretation of test results. For example, as shownin Table 1–2, as a result of differences in reliability, obtainedscores of 120 on Test 1 and 130 on Test 3 are associatedwith essentially equivalent estimated true scores (i.e., 119and 120, respectively). If only obtained scores are consid-ered, one might interpret scores from Test 1 and Test 3 assignificantly different even though these “differences” ac-tually disappear when measurement precision is taken intoaccount. It should also be noted that this issue is not limitedto comparisons of scores from the same individual acrossdifferent tests but also applies to comparisons betweenscores from different individuals from the same test whenthe individuals come from different groups and the test inquestion has different reliability levels across those groups.Regression to the mean may also manifest as pro-nounced asymmetry of confidence intervals centered ontrue scores, relative to obtained scores, as discussed in moredetail later. Although calculation of true scores is encour-aged as a means of translating reliability coefficients intomore concrete and useful values, it is important to con-sider that any significant difference between characteristicsof an examinee and the sample from which a mean samplescore and reliability estimate were derived may invalidatethe process. For example, it makes little sense to estimatetrue scores for severely brain-injured individuals on meas-ures of cognition using test parameters from healthy nor-mative samples because mean scores within brain-injuredpopulations are likely to be substantially different fromthose seen in healthy normative samples; reliabilities maydiffer substantially as well. Instead, one may be justified inderiving estimated true scores using data from a comparableclinical sample if this is available. These issues underscorethe complexities inherent in comparing scores from dif-ferent tests in different populations.T H ES TA N D A R DE R R O RO FM E A S U R E M E N TExaminers may wish to quantify the margin of error associ-ated with using obtained scores as estimates of true scores.When the reference sample scoreSDand the internal con-sistency reliability of a test are known, an estimate of theSDof obtained scores about true scores may be calculated.This value is known as thestandard error of measurement,orSEM(Lord & Novick, 1968). More simply, theSEMprovides an estimate of the amount of error in a person’sobserved score. It is a function of the reliability of the testand of the variability of scores within the sample. TheSEMis inversely related to the reliability of the test. Thus, thegreater the reliability of the test, the smaller theSEMis, andthe more confidence the examiner can have in the precisionof thescore.TheSEMis defined by the following formula:SEMSDrxx=1[6]Where:SD= the standard deviation of the test, as derived froman appropriate normativesamplerxx= the reliability coefficient of the test (usuallyinternal reliability)C O N F I D E N C E I N T E R V A L SWhile theSEMcan be considered on its own as an indexof test precision, it is not necessarily intuitively interpret-able, and there is often a tendency to focus excessively ontest scores as point estimates at the expense of considera-tion of associated estimation error ranges. Such a tendencytodisregardimprecisionisparticularlyinappropriatewhen interpreting scores from tests with lower reliability.Clinically, it is therefore very important to report, in a con-crete and easily understandable manner, the degree of pre-cision associated with specific test scores. One method ofdoing this is to useconfidence intervals.TheSEMis used to form a confidence interval (or range ofscores) around estimated true scores within which obtainedscores are most likely to fall. The distribution of obtainedscores about the true score (the error distribution) is assumedto be normal, with a mean of zero and anSDequal to theSEM; therefore, the bounds of confidence intervals can be setto include any desired range of probabilities by multiplyingby the appropriatezvalue. Thus, if an individual were to takea large number of randomly parallel versions of a test, theTABLE1–2Estimated Truescore Values for Three Observedscores at Three Levels of ReliabilityOBsERVEdsCOREs(M= 100,SD= 15)RELIABILITy110120130Test 1.95110119129Test 2.80108116124Test 3.65107113120NOTE:Estimated true scores rounded to whole values.

Page 25

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 25 preview image

Loading page image...

ps yC H O M E T R I CsI NNE U R Op s yC H O L OgI C A LA s sEs sM E N T|1111resulting obtained scores would fall within an interval of ±1SEMof the estimated true scores 68% of the time and within1.96SEM95% of the time (see Table1–1).Obviously, confidence intervals for unreliable tests (i.e.,with a largeSEM) will be larger than those for highly reli-able tests. For example, we may again use data from Table1–2. For a highly reliable test such as Test 1, a 95% confi-dence interval for an obtained score of 110 ranges from 103to 116. In contrast, the confidence interval for Test 3, a lessreliable test, is considerably larger, ranging from 89 to124.It is important to bear in mind that confidence intervalsfor obtained scores that are based on theSEMare centeredonestimated true scoresand are based on a model that dealswith performance across a large number of randomly par-allel forms. Such confidence intervals will be symmetricaround obtained scores only when obtained scores are atthe test mean or when reliability is perfect. Confidenceintervals will be asymmetric about obtained scores to thesame degree that true scores diverge from obtained scores.Therefore, when a test is highly reliable, the degree of asym-metry will often be trivial, particularly for obtained scoreswithin oneSDof the mean. For tests of lesser reliability,the asymmetry may be marked. For example, in Table 1–2,consider the obtained score of 130 on Test 2. The esti-mated true score in this case is 124 (see Equations 4 and 5).Using Equation 5 and az-multiplier of 1.96, we find that a95% confidence interval for the obtained scores spans ±13points, or from 111 to 137. This confidence interval is sub-stantially asymmetric about the obtainedscore.It is also important to note thatSEM-based confidenceintervals should not be used for estimating the likelihood ofobtaining a given score at retesting with the same measure aseffects of prior exposure are not accounted for. In addition,Nunnally and Bernstein (1994) point out that use ofSEM-based confidence intervals assumes that error distributionsare normally distributed and homoscedastic (i.e., equal inspread) across the range of scores obtainable for a given test.However, this assumption may often be violated. A numberof alternate error models do not require these assumptionsand may thus be more appropriate in some circumstances(see Nunnally & Bernstein, 1994, for a detailed discussion).In addition, there are quite a number of alternate methodsfor estimating error intervals and adjusting obtained scoresfor regression to the mean and other sources of measure-ment error (Glutting et al., 1987). There is no universallyagreed upon method for estimating measurement errors,and the most appropriate methods may vary across differenttypes of tests and interpretive uses, though the majority ofmethods will produce roughly similar results in many cases.In any case, a review of alternate methods for estimating andcorrecting for measurement error is beyond the scope of thisbook; the methods presented were chosen because they con-tinue to be widely used and accepted, and they are relativelyeasy to grasp conceptually and mathematically. Ultimately,the choice of which specific method is used for estimatingand correcting for measurement error is far less importantthan the issue of whetheranysuch estimates and correctionsare calculated and incorporated into test score interpreta-tion. That is, test scores should never be interpreted in theabsence of consideration of measurementerror.T H ES TA N D A R DE R R O RO FE S T I M AT I O NIn addition to estimating confidence intervals for obtainedscores, one may also be interested in estimating confidenceintervals for estimated true scores (i.e., the likely range oftrue scores about the estimated true score). For this pur-pose, one may construct confidence intervals using thestandard error of estimation(SEE; Lord & Novick, 1968).The formula for thisis:SESDrrExxxx=()1[7]Where:SD= the standard deviation of the variable beingestimatedrxx= the test reliability coefficientThe SEE, like theSEM, is an indication of test precision.As with theSEM, confidence intervals are formed aroundestimated true scores by multiplying the SEEby a desiredzvalue. That is, one would expect that, over a large numberof randomly parallel versions of a test, an individual’s truescore would fall within an interval of ±1 SEEof the esti-mated true scores 68% of the time, and fall within 1.96 SEE95% of the time. As with confidence intervals based ontheSEM, those based on the SEEwill usually not be sym-metric around obtained scores. All of the other caveats de-tailed previously regardingSEM-based confidence intervalsalsoapply.The choice of constructing confidence intervals basedon theSEMversus the SEEwill depend on whether one ismore interested in true scores or obtained scores. That is,while theSEMis a gauge of test accuracy in that it is usedto determine the expected range ofobtainedscores abouttrue scores over parallel assessments (the range of error inmeasurementof the true score), the SEEis a gauge of esti-mation accuracy in that it is used to determine the likelyrange within whichtruescores fall (the range of error ofesti-mationof the true score). Regardless, bothSEM-based andSEE-based confidence intervals are symmetric with respectto estimated true scores rather than the obtained scores, andthe boundaries of both will be similar for any given level ofconfidence interval when a test is highly reliable.T H ES TA N D A R DE R R O RO FP R E D I C T I O NWhen the standard deviation of obtained scores for an al-ternate form is known, one may calculate the likely rangeof obtained scores expected on retesting with a parallel

Page 26

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 26 preview image

Loading page image...

12|A CO MpE NdI U MOf NE U R Op s yC H O L OgI C A LTEsTs12form. For this purpose, thestandard error of prediction(SEP;Lord & Novick, 1968) may be used to construct confidenceintervals. The formula for thisis:SESDrpyxx=12[8]Where:SDy= the standard deviation of the parallel formadministered atretestrxx= the reliability of the form used at initial testingIn this case, confidence intervals are formed around esti-mated true scores (derived from initial obtained scores) bymultiplying the SEPby a desiredzvalue. That is, one wouldexpect that, when retested over a large number of randomlysampled parallel versions of a test, an individual’s obtainedscore would fall within an interval of ±1 SEPof the esti-mated true scores 68% of the time and fall within 1.96 SEE95% of the time. As with confidence intervals based on theSEM, those based on the SEPwill generally not be symmetricaround obtained scores. All of the other caveats detailed pre-viously regarding theSEM-based confidence intervals alsoapply. In addition, while it may be tempting to use SEP-basedconfidence intervals for evaluating significance of change atretesting with the same measure, this practice violates theassumptions that a parallel form is used at retest and, particu-larly, that no prior exposure effectsapply.S TA N D A R DE R R O R SA N DT R U ES C O R E S :P R A C T I C A L I S S U E SNunnally and Bernstein (1994) note that most testmanuals do “an exceptionally poor job of reporting esti-mated true scores and confidence intervals for expectedobtainedscoresonalternativeforms.Forexample,intervals are often erroneously centered about obtainedscores rather than estimated true scores. Often the topicis not even discussed” (p. 260). As well, in general, con-fidence intervals based on age-specificSEMs are prefer-able to those based on the overallSEM(particularly at theextremes of the age distribution, where there is the mostvariability) and can be constructed using age-basedSEMsfound in most manuals.As outlined earlier, estimated true scores and their as-sociated confidence intervals can contribute substantiallyto the process of interpreting test results, and an argumentcan certainly be made that these should be preferred toobtained scores for clinical purposes and also for research.Nevertheless, there are compelling practical reasons toprimarily focus on obtained scores, the most importantof which is that virtually all data in test manuals andindependentresearchconcerningpsychometricprop-erties of tests are presented in the metric of obtainedscores. In addition, a particular problem with the use ofthe SEPfor test-retest comparisons is that it is based ona psychometric model that typically does not apply: inmost cases, retesting is carried out using the same test thatwas originally administered rather than a parallel form.Usually, obtained test-retest scores are interpreted ratherthan the estimated true scores, and test-retest reliabilitycoefficients for obtained scores are usually lower—andsometimes much lower—than internal consistency relia-bility coefficients. In addition, the SEPdoes not accountfor practice/exposure effects, which can be quite substan-tial when the same test is administered a second time.As a result, SEP-based confidence intervals will often bemiscentered and too small, resulting in high false-positiverates when used to identify significant changes in perfor-mance over time. For more discussion regarding the calcu-lation and uses of theSEM, SEE, SEP, and alternative errormodels, see Dudek (1979), Lord and Novick (1968), andNunnally and Bernstein (1994).sCREENINg,dIAgNOsIs, ANdOUTCOMEpREdICTION OfTEsTsIn some cases, clinicians use tests to measurehow muchof an attribute (e.g., intelligence) an examinee has, whilein other cases tests are used to help determine whether ornot an examinee has a specific attribute, condition, or ill-ness that may be eitherpresent or absent(e.g., Alzheimer’sdisease). In the latter case, a special distinction in test usemay be made.Screening testsare those which are broadly orroutinely used to detect a specific attribute or illness, oftenreferred to as acondition of interest(COI) among personswho are not “symptomatic” but who may nonetheless havethe COI (Streiner, 2003).Diagnostic testsare used to as-sist in ruling in or out a specific condition in persons whopresent with “symptoms” that suggest the diagnosis inquestion. Another related use of tests is for purposes of pre-diction of outcome. As with screening and diagnostic tests,the outcome of interest may be defined in binary terms—itwill either occur or not occur (e.g., the examinee will beable to handle independent living or not). Thus, in all threecases, clinicians will be interested in the relation betweena measure’s distribution of scores and an attribute or out-come that is defined in binary terms. It should be notedthat tests used for screening, diagnosis, and prediction maybe used when the COI or outcome to be predicted consistsof more than two categories (e.g., mild, moderate, and se-vere). However, only the binary case will be considered inthis chapter.Typically, data concerning screening or diagnostic ac-curacy are obtained by administering a test to a sample ofpersons who are also classified, with respect to the COI, bya so-called gold standard. Those who have the condition ac-cording to the gold standard are labeledCOI+, while thosewho do not have the condition are labeledCOI. In medi-cine, the gold standard may be a highly accurate diagnostic

Page 27

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 27 preview image

Loading page image...

ps yC H O M E T R I CsI NNE U R Op s yC H O L OgI C A LA s sEs sM E N T|1313test that is more expensive and/or has a higher level of asso-ciated risk of morbidity than some new diagnostic methodthat is being evaluated for use as a screening measure or asa possible replacement for the existing gold standard. Inneuropsychology, the situation is often more complex asthe COI may be a psychological construct or behavior (e.g.,cognitive impairment, malingering) for which consensuswith respect to fundamental definitions is lacking or diag-nostic gold standards may notexist.The simplest way to relate test results to binary diag-noses or outcomes is to utilize acutoff score. This is a singlepoint along the continuum of possible scores for a giventest. Scores at or above the cutoff classify examinees asbelonging to one of two groups; scores below the cutoffclassify examinees as belonging to the other group. Thosewho have the COI according to the test are labeled astestpositive(Test+), while those who do not have the COI arelabeledtest negative(Test).Table1–3showstherelationbetweenexamineeclassifications based on test results versus classificationsbased on a gold standard measure. By convention, test clas-sification is denoted by row membership and gold standardclassification is denoted by column membership. Cell valuesrepresent the total number of persons from the samplefalling into each of four possible outcomes with respect toagreement between a test and a respective gold standard.Agreements between gold standard and test classificationsare referred to astrue-positiveandtrue-negativecases, whiledisagreements are referred to asfalse-positiveandfalse-negativecases, withpositiveandnegativereferring to thepresence or absence of a COI per classification by the goldstandard. When considering outcome data, observed out-come is substituted for the gold standard. It is importantto keep in mind while reading the following section thatwhile gold standard measures are often implicitly treated as100% accurate, this may not always be the case. Any limita-tions in accuracy or applicability of a gold standard or out-come measure need to be accounted for when interpretingclassification accuracy statistics. See Mossman et al. (2012)and Mossman et al. (2015) for thorough discussions of thisproblem and methods to account for it when validating di-agnostic measures.S E N S I T I V I T Y,S P E C I F I C I T Y,A N DL I K E L I H O O D R AT I O SThe general accuracy of a test with respect to a specific COIis reflected by data in thecolumnsof a classification accuracytable (Streiner, 2003). The column-based indices includesensitivity,specificity,and thepositiveandnegative likelihoodratios(LR+and LR). The formulas for calculation of thecolumn-based classification accuracy statistics from data inTable 1–4 are givenbelow:Sensitivity = A/ A + C()[9]Specificity = D/ D + B()[10]LR = Sensitivity/ 1Specificity()+[11]LRSpecificitySensitivity=/()1[12]Sensitivity is defined as the proportion of COI+examineeswho are correctly classified as such by a test. Specificity isdefined as the proportion of COIexaminees who are cor-rectly classified as such by a test. Thepositive likelihood ratio(LR+) combines sensitivity and specificity into a singleindex of overall test accuracy indicating the odds (likeli-hood) that a positive test result has come from a COI+examinee. For example, a likelihood ratio of 3.0 may beinterpreted as indicating that a positive test result is threetimes as likely to have come from a COI+examinee as froma COIone. The LRis interpreted conversely to the LR+.As the LR approaches 1, test classification approximatesrandom assignment of examinees. That is, a person who isTest+is equally likely to be COI+or COI. For purposes ofworking examples, Table 1–4 presents hypothetical test andgold standarddata.UsingEquations9to12,thehypotheticaltestdemonstrates moderate sensitivity (.75) and high speci-ficity (.95), with an LR+of 15 and an LRof 3.8. Thus, forthe hypothetical measure, a positive result is 15 times morelikely to be obtained by an examinee who has the COI thanby one who does not, while a negative result is 3.8 timesmore likely to be obtained by an examinee who does nothave the COI than by one whodoes.TABLE1–3Classification/prediction Accuracy of a Testin Relation to a “goldstandard” or Actual OutcomegOLdsTANdARdTEsT REsULTCOI+COIROW TOTALTest PositiveA (TruePositive)B (FalsePositive)A + BTest NegativeC (FalseNegative)D (TrueNegative)C + DColumn totalA + CB + DN= A + B + C + DNOTE:COI = condition of interest.TABLE1–4Classification/prediction Accuracy of a Testin Relation to a “goldstandard” or Actual Outcome(Hypotheticaldata)gOLdsTANdARdTEsT REsULTCOI+COIROW TOTALTest Positive30232Test Negative103848Column total4040N= 80NOTE:COI = condition of interest.

Page 28

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 28 preview image

Loading page image...

14|A CO MpE NdI U MOf NE U R Op s yC H O L OgI C A LTEsTs14Note that sensitivity, specificity, and LR+/are param-eter estimates that have associated errors of estimation thatcan be quantified. The magnitude of estimation error is in-versely related to sample size and can be quite large whensample size is small. The formulas for calculating standarderrors for sensitivity, specificity, and the LR are complexand will not be presented here (see McKenzie et al., 1997).Fortunately, these values may also be easily calculated usinga number of readily available computer programs. Usingone of these (Mackinnon, 2000) with data from Table 1–4,the 95% confidence interval for sensitivity was found to be.59 to .87, while that for specificity was .83 to .99. LR+was3.8 to 58.6, and LRwas 2.2 to 6.5. Clearly, the range ofmeasurement error is not trivial for this hypothetical study.In addition to appreciating issues relating to estimationerror, it is also important to understand that while column-based indices provide useful information about test validityand utility, a test may nevertheless have high sensitivity andspecificity but still be of limited clinical value in some situ-ations, as will be detailedlater.P O S I T I V EA N DN E G AT I V EP R E D I C T I V EV A L U EAs opposed to being concerned with test accuracy at thegrouplevel, clinicians are typically more concerned withtest accuracy in the context of diagnosis and other deci-sion making at the level ofindividualexaminees. That is,clinicians wish to determine whether or not an individualexaminee does or does not have a given COI. In this sce-nario, clinicians must consider indices derived from the datain therowsof a classification accuracy table (Streiner, 2003).These row-based indices are positive predictive value (PPV)and negative predictive value (NPV). The formulas for cal-culation of these from data in Table 1–3 are givenhere:PPV = A/ A + B()[13]NPV = D/ C + D()[14]PPV is defined as the probability that an individual witha positive test result has the COI. Conversely, NPV is de-fined as the probability that an individual with a negativetest result doesnothave the COI. For example, predictivepower estimates derived from the data presented in Table1–4 indicate that PPV = .94 and NPV = .79. Thus, in thehypothetical dataset, 94% of persons who obtain a positivetest result actually have the COI, while 79% of people whoobtain a negative test result do not in fact have the COI.When predictive power is close to .50, examinees are ap-proximately equally likely to be COI+as COI, regardlessof whether they are Test+or Test. When predictive poweris less than .50, test-based classifications or diagnoses will beincorrect more often than not. However, predictive powervalues at or below .50 may still be informative. For example,if the population prevalence of a COI is .05 and the PPVbased on test results is .45, a clinician can rightly concludethat an examinee is much more likely to have the COI thanmembers of the general population, which may be clinicallyrelevant.As with sensitivity and specificity, PPV and NPV areparameter estimates that should always be considered in thecontext of estimation error. Unfortunately, standard errorsor confidence intervals for estimates of predictive power arerarely listed when these values are reported; clinicians arethus left to their own devices to calculate them. Fortunately,these values may be easily calculated using a number of freelyavailable computer programs (see Crawford, Garthwaite, &Betkowska, 2009; Mackinnon, 2000). Using one of these(Mackinnon, 2000) with data from Table 1–4, the 95%confidence intervals for PPV and NPV given the base ratein the study were found to be .94 to .99 and .65 to .90, re-spectively. Clearly, the confidence interval range is nottrivial for this small dataset.B A S E R AT E SOf critical importance to clinical interpretation of testscores, PPV and NPV vary with the base rate or prevalenceofaCOI.The prevalence of a COI is defined with respect toTable 1–3as:()A + C /N[15]As should be readily apparent from inspection of Table1–4, the prevalence of the COI in the sample is 50%.Formulas for deriving predictive power for any level ofsensitivity and specificity and a specified prevalence aregivenhere:PPVPrevalenceSensitivity(PrevalenceSensitivityPre=××+)[(1vvalence)(Specificity×1)][16]NPV1PrevalenceSpecificity[(1Prevalence)Specificity]=××+[PPrevalence(Sensitivity×1)][17]From inspection of these formulas, it should be apparentthat, regardless of sensitivity and specificity, predictivepower will vary between 0 and 1 as a function of prevalence.Application of Formulas 16 and 17 to the data presented inTable 1–4 across the range of possible base rates provides therange of possible PPV and NPV values depicted in Figure1–5 (note that Figure 1–5 was produced by a spreadsheetdeveloped for analyzing the predictive power of tests andis freely available from Daniel Slick at dslick@gmail.com).As can be seen in Figure 1–5, the relation between pre-dictive power and prevalence is curvilinear and asymptotic,

Page 29

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 29 preview image

Loading page image...

ps yC H O M E T R I CsI NNE U R Op s yC H O L OgI C A LA s sEs sM E N T|1515with endpoints at 0 and 1. For any given test cutoff score,PPV will always increase with base rate, while NPV willsimultaneously decrease. For the hypothetical test beingconsidered, one can see that both PPV and NPV are mod-erately high (at or above .80) when the COI base rate rangesfrom 20% to 50%. The tradeoff between PPV and NPV athigh and low base rate levels is also readily apparent; as thebase rate increases above 50%, PPV exceeds .95 while NPVdeclines, falling below .50 as the base rate exceeds 80%.Conversely, as the base rate falls below 30%, NPV exceeds.95 while PPV rapidly drops off, falling below 50% as thebase rate falls below7%.From the foregoing, it is apparent that the predic-tive power values derived from data presented in Table1–4 would not be applicable in settings where base ratesvary from the 50% value in the hypothetical dataset. Thisis important because, in practice, clinicians may often bepresented with PPV values based on data where “preva-lence” values are near 50%. This is due to the fact that, re-gardless of the prevalence of a COI in the population, somediagnostic validity studies employ equal-sized samples ofCOI+and COIindividuals to facilitate statistical ana-lyses. In contrast, the actual prevalence of COIs maydiffer substantially from 50% in various clinical settingsand circumstances (e.g., screening vs. diagnostic use).For examples of differing PPV and NPV across differentbase rates, see Chapter 16, on the Minnesota MultiphasicPersonalityInventory,2(MMPI-2)andMinnesotaMultiphasic Personality Inventory, 2 Restructured Form(MMPI-2-RF).For example, suppose that the data from Table 1–4were from a validity trial of a neuropsychological measuredesigned for administration to young adults for purposesof predicting development of schizophrenia. The questionthen arises: Should the measure be used for broad screeninggiven a lifetime schizophrenia prevalence of .008? UsingFormula 16, one can determine that for this purpose themeasure’s PPV is only .11 and thus the “positive” test resultswould be incorrect 89% of thetime.Conversely, the prevalence of a COI may in somesettings be substantially higher than 50%. As an exampleof the other extreme, the base rate of head injuries amongpersons admitted to an acute hospital head injury reha-bilitation service is essentially 100%, in which case theuse of neuropsychological tests to determine whetheror not examinees had sustained a head injury would notonly be redundant, but very likely lead to false-negativeerrors (such tests could, of course, be legitimately used forother purposes, such as grading injury severity). Clearly,clinicians need to carefully consider published data con-cerning sensitivity, specificity, and predictive power inlight of intended test use and, if necessary, calculate PPVand NPV values and COI base rate estimates applicable tospecific groups of examinees seen in their own practices.In addition, it must be kept in mind that PPV and NPVvalues calculated for individual examinees are estimatesthat have associated measurement errors that allow for con-struction of confidence intervals. Crawford, Garthwaite,and Betkowska (2009) provide details on the calculationof such confidence intervals and also a free computer pro-gram that performs the calculations.D I F F I C U LT I E SW I T HE S T I M AT I N GA N DA P P LY I N GB A S ER AT E SPrevalence or base rate estimates may be based on large-scaleepidemiological studies that provide good data on the rate ofoccurrence of COIs in the general population or within spe-cific subpopulations and settings (e.g., prevalence rates of var-ious psychiatric disorders in inpatient psychiatric settings).However, in some cases, no prevalence data may be avail-able, or reported prevalence data may not be applicable tospecific settings or subpopulations. In these cases, clinicianswho wish to determine predictive power must develop theirown base rate estimates. Ideally, these can be derived fromdata collected within the same setting in which the test willbe employed, though this is typically time-consuming andmany methodological challenges may be faced, includinglimitations associated with small sample sizes. Methods forestimating base rates in such contexts are beyond the scopeof this chapter; interested readers are directed to Mossman(2003), Pepe (2003), and Rorer and Dawes (1982).D E T E R M I N I N GT H EO P T I M U MC U T O F FS C O R E :R O CA N A LY S E SA N DO T H E RM E T H O D SThe foregoing discussion has focused on the diagnosticaccuracy of tests using specific cutoff points, presumably1.0Sensitivity = .75 Specificity = .95.9Predictive Power.8.7.6.5.4.3.2.1.0.0.1.2.3.4.5.6.7.8.91.0Base RatePPVNPVFigure 1–5Relation of predictive power to prevalence—hypotheticaldata.

Page 30

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 30 preview image

Loading page image...

16|A CO MpE NdI U MOf NE U R Op s yC H O L OgI C A LTEsTs16ones that are optimal for given tasks such as diagnosingdementiaordetectingnoncredibleperformance.A number of methods for determining an optimum cutoffpoint are available, and, although they may lead to sim-ilar results, the differences between them are not trivial.Many of these methods are mathematically complex and/or computationally demanding, thus requiring computerapplications.The determination of an optimum cutoff score for de-tection or diagnosis of a COI is often based on simulta-neous evaluation of sensitivity and specificity or predictivepower across a range of scores. In some cases, this infor-mation, in tabular or graphical form, is simply inspectedand a score is chosen based on a researcher’s or clinician’scomfort with a particular error rate. For example, in malin-gering research, cutoffs that minimize false-positive errorsor hold them below a low threshold are often explicitlychosen (i.e., by convention, a specificity of .90 or higher),even though such cutoffs are associated with relatively largefalse-negative error rates (i.e., lower detection of examineeswith the COI, malingering).A more formal, rigorous, and often very useful set oftools for choosing cutoff points and for evaluating and com-paring test utility for diagnosis and decision making fallsunder the rubric ofreceiver operating characteristics(ROC)analyses. Clinicians who use tests for diagnostic or otherdecision-making purposes should be familiar with ROCprocedures. The statistical procedures utilized in ROC ana-lyses are closely related to and substantially overlap those ofBayesian analyses. The central graphic element of ROC ana-lyses is the ROC graph, which is a plot of the true-positiveproportion (yaxis) against the false-positive proportion (xaxis) associated with each specific score in a range of testscores. Figure 1–6 shows an example a ROC graph. Thearea under the curve is equivalent to the overall accuracyof the test (proportion of the entire sample correctly classi-fied), while the slope of the curve at any point is equivalentto the LR+associated with a specific testscore.A number of ROC methods have been developed fordetermining cutoff points that consider not only accu-racy, but also allow for factoring in quantifiable or quasi-quantifiable costs and benefits and the relative importanceof specific costs and benefits associated with any givencutoff score. ROC methods may also be used to comparethe diagnostic utility of two or more measures, which maybe very useful for purposes of test selection. AlthoughROC methods can be very useful clinically, they have notyet made broad inroads into most of the clinical neuropsy-chological literature, with the exception of some researchon dementia screening and research on performance va-lidity and symptom validity (see reviews in this volume).A detailed discussion of ROC methods is beyond the scopeof this chapter; interested readers are referred to Mossmanand Somoza (1992), Pepe (2003), Somoza and Mossman(1992), and Swets, Dawes, and Monahan (2000).E V A L U AT I O NO FP R E D I C T I V EP O W E RA C R O S SAR A N G EO FC U T O F FS C O R E SA N DB A S E R AT E SAs noted earlier, it is important to recognize that positiveand negative predictive power arenotproperties of testsbut rather are properties of specific test scores in specificcontexts. The foregoing sections describing the calculationand interpretation of predictive power have focused onmethods for evaluating the value of a single cutoff point fora given test for purposes of classifying examinees as COI+or COI. However, by focusing exclusively on single cutoffpoints, clinicians are essentially transforming continuoustest scores into binary scores, thus discarding much po-tentially useful information, particularly when scores areconsiderably above or below a cutoff. Lindeboom (1989)proposed an alternative approach in which predictivepower across a range of test scores and base rates can bedisplayed in a single Bayesian probability table. In this ap-proach, test scores define the rows and base rates definethe columns of a table; individual table cells contain theassociated PPV and NPV for a specific score and spe-cific base rate. Such tables have rarely been constructedfor standardized measures, but examples can be found insome test manuals (e.g., the Victoria Symptom ValidityTest; Slick et al., 1997). The advantage of this approach isthat it allows clinicians to consider the diagnostic confi-dence associated with an examinee’s specific score, leadingto more accurate assessments. A limiting factor for useof Bayesian probability tables is that they can only beconstructed when sensitivity and specificity values for anentire range of scores are available, which is rarely the casefor most tests. In addition, predictive power values in suchtables are subject to any validity limitations of underlying1.0.9.8.7.6.5.4.3.2.1.0.0.1.2.3.4.5.6.7.8.9False-Positive ProbabilityTrue-Positive Probability1.0Figure 1–6An ROCgraph.

Page 31

A Compendium of Neuropsychological Tests: Fundamentals of Neuropsychological Assessment and Test Reviews for Clinical Practice (2022) - Page 31 preview image

Loading page image...

ps yC H O M E T R I CsI NNE U R Op s yC H O L OgI C A LA s sEs sM E N T|1717data and should include associated standard errors or con-fidence intervals.C O M B I N I N GR E S U LT SO FM U LT I P L ES C R E E N I N G /D I A G N O S T I C T E S T SOften, more than one test that provides data relevant to aspecific diagnosis is administered. In these cases, cliniciansmay wish to integrate predictive power estimatesacrossmeasures. There may be a temptation to use the PPV asso-ciated with a score on one measure as the “base rate” whenthe PPV for a score from a second measure is calculated.For example, suppose that the base rate of a COI is 15%.When a test designed to detect the COI is administered, anexaminee’s score translates to a PPV of 65%. The examinerthen administers a second test designed to detect the COI,but when PPV for the examinee’s score on the second testis calculated, a “base rate” of 65% is used rather than 15%because the former is now the assumed prior probabilitythat the examinee has the COI given their score on the firsttest administered. The resulting PPV for the examinee’sscore on the second measure is now 99%, and the examinerconcludes that the examinee has the COI. While this pro-cedure may seem logical, it will produce an inflated PPV es-timate for the second test score whenever the two measuresare correlated, which will almost always be the case whenboth measures are designed to screen for or diagnose thesameCOI.A more defensible method for combining results ofmultiple diagnostic tests is to derive empirically derivedclassification rules based on the number of positive findingsfrom a set of screening/diagnostic tests. While this ap-proach to combining test results can produce more accurateclassifications, its use of binary data (positive or negativefindings) as inputs does not capitalize on the full range ofdata available from each test, and so accuracy may not beoptimized. To date, this approach to combining test resultshas primarily been used with performance/symptom va-lidity tests, and there have been some interesting debatesin the literature concerning the accuracy and clinical utilityof the derived classification rules; see Larrabee (2014a,2014b), Bilder et al. (2014), and Davis and Millis (2014).A preferred psychometric method for integrating scoresfrom multiple screening/diagnostic measures, one thatutilizes the full range of data from each test, is to constructgroup membership (i.e., COI+vs. COI) prediction equa-tions using methods such as logistic regression or multiwayfrequency analyses. These methods can be used clinically togenerate binary classifications or classification probabilities,with the latter being preferred because it is a better gauge ofaccuracy. Ideally, the derived classification formulas shouldbe well validated before being utilized clinically. Moredetails on methods for combining classification data acrossmeasures may be found in Franklin and Krueger (2003) andPepe (2003).W H YA R EC L A S S I F I C AT I O NA C C U R A C YS TAT I S T I C SN O TU B I Q U I T O U SI NN E U R O P S Y C H O L O G I C A LR E S E A R C HA N DC L I N I C A LP R A C T I C E ?Of note, the mathematical relations between sensitivity,specificity, base rates, and predictive power were first elu-cidated by Thomas Bayes and published in 1763; methodsfor deriving predictive power and other related indices ofconfidence in decision making are thus often referred toasBayesianstatistics. Note that in Bayesian terminology,the prevalence or base rate of a COI is known as thepriorprobability, while PPV and NPV are known asposteriorprobabilities. Conceptually, the difference between theprior and posterior probabilities associated with infor-mation added by a test score is an index of the diagnosticutility of a test. There is an entire literature concerningBayesian methods for statistical analysis of test utility.These will not be covered here, and interested readers arereferred to Pepe (2003).Needless to say, Bayes’s work predated the first diag-nostic applications of psychological tests as we know themtoday. However, although neuropsychological tests are rou-tinely used for diagnostic decision making, information onthe predictive power of most tests is often absent from bothtest manuals and applicable research literature. This is so de-spite the fact that the importance and relevance of Bayesianapproaches to the practice of clinical psychology waswell described 60 years ago by Meehl and Rosen (1955).Bayesian statistics are finally making major inroads into themainstream of neuropsychology, particularly in the researchliterature concerning symptom/performance validity meas-ures, in which estimates of predictive power have become derigueur, although these are still typically presented withoutassociated standard errors, thus greatly reducing the utilityof thedata.AssEssINgCHANgE OVER TIMENeuropsychologists are often interested in tracking changesin function over time. In these contexts, three interrelatedquestionsarise:To what degree do changes in examinee test scoresreflect “real” changes in function as opposed tomeasurementerror?To what degree do real changes in examinee test scoresreflect clinically significant changes in function asopposed to clinically trivial changes?To what degree do changes in examinee test scoresconform to expectations, given the application oftreatments or the occurrence of other events or processesoccurring between test and retest, such as head injury,dementia, or brain surgery?
Preview Mode

This document has 1120 pages. Sign in to access the full document!

Study Now!

XY-Copilot AI
Unlimited Access
Secure Payment
Instant Access
24/7 Support
Document Chat

Document Details

Subject
Medicine

Related Documents

View all