Back to AI Flashcard MakerPsychology /GRE® Psychology Measurement, Methodology and Other: Measurement and Methodology Part 2

GRE® Psychology Measurement, Methodology and Other: Measurement and Methodology Part 2

Psychology82 CardsCreated 2 months ago

Covers key concepts from GRE® Psychology Measurement & Methodology (Part 2). Focuses on research design, data analysis, and statistical tools such as correlation, regression, and the line of best fit.

Fill in the blank:

The ________ ___ _____ ____ is the line one draws on the scatterplot to best represent the relationship between the two values.

line of best fit

Tap or swipe ↕ to flip
Swipe ←→Navigate
1/82

Key Terms

Term
Definition

Fill in the blank:

The ________ ___ _____ ____ is the line one draws on the scatterplot to best represent the relationship between the two values.

line of best fit

Define:

factor analysis

It uses multiple sets of correlations to see which variable correlations cluster together to create a factor or group of variables which are presum...

Describe the difference between the null hypothesis and the research hypothesis.

  • The null hypothesis states that there is no relationship between the two values tested.

  • Th...

Fill in the blank:

The ______ _____ is the level of certainty we wish to have that there is an actual relationship between the two values in an experiment.

alpha level

This is usually set at a 1 in 20 chance or an alpha level of 0.05.

Sandy rejected the null hypothesis and believed there was a relationship between phone numbers and math ability, when in reality, it was proved that there was not a relationship. What kind of statistical error did Sandy commit?

type I error

Bobby decided to accept the null hypothesis and decided there was no relationship between IQ and a healthy diet, even though there statistically was proof that there was a relationship. What kind of error did he commit?

type II error

Related Flashcard Decks

Study Tips

  • Press F to enter focus mode for distraction-free studying
  • Review cards regularly to improve retention
  • Try to recall the answer before flipping the card
  • Share this deck with friends to study together
TermDefinition

Fill in the blank:

The ________ ___ _____ ____ is the line one draws on the scatterplot to best represent the relationship between the two values.

line of best fit

Define:

factor analysis

It uses multiple sets of correlations to see which variable correlations cluster together to create a factor or group of variables which are presumed to be measuring the same value, based on their high rates of correlation.

Describe the difference between the null hypothesis and the research hypothesis.

  • The null hypothesis states that there is no relationship between the two values tested.

  • The research hypothesis states that there is a statistically significant relationship between the two values in our experiment.

Fill in the blank:

The ______ _____ is the level of certainty we wish to have that there is an actual relationship between the two values in an experiment.

alpha level

This is usually set at a 1 in 20 chance or an alpha level of 0.05.

Sandy rejected the null hypothesis and believed there was a relationship between phone numbers and math ability, when in reality, it was proved that there was not a relationship. What kind of statistical error did Sandy commit?

type I error

Bobby decided to accept the null hypothesis and decided there was no relationship between IQ and a healthy diet, even though there statistically was proof that there was a relationship. What kind of error did he commit?

type II error

Fill in the blank:

The probablity of making a type II error is measured by the ________ level.

beta

Which statistical test should I use if I am trying to compare three different groups or more?

analysis of variance

| (ANOVA)

If I only have two groups to compare, which statistical test should I use?

T-test

Fill in the blank:

Chi-square tests are used for data that is _______ rather than numerical.

categorical

What is the most common way to perform a meta-analysis?

Gather as many sources about the topic as possible, examine for multiple themes, publish the results of the meta-analysis for the larger community.

Define:

norm-referenced testing

A test in which one's score is compared to that of all of the other test-takers, such as "Brian's score is in the 66th percentile."

Fill in the blank:

________-__________ testing, rather than norm-referenced testing, determines how much information the test-taker knows about a certain subject, such as a history final.

Domain-referenced

What are three things a test must have to be reliable?

  1. 1. dependability

  2. 2. consistency

  3. 3. repeatability

What aspect of a test are split-half reliability, alternate-form, and test-retest methods used to establish?

a test's reliability


Define:

validity

How much a test measures what it claims to measure.

What would be the best way to test content validity?

Examining the actual content of the test to make sure that it accurately and completely meets all of the facets of the construct that are being tested.

What does the face validity of the test show?

That the questions on the test will be asking questions that appear to ask questions about the subject of the test; this is the least objective form of validity.

What would be one way to to determine the criterion validity of the SAT?

Determine whether high scores on the SAT predict high GPAs in college.

Define:

construct validity

How well the test addresses what you were trying to measure.


Name two kinds of construct validity.

  1. convergent validity

  2. divergent validity

What is the difference between aptitude and achievement tests?

  • Someone's score on an aptitude test predicts future ability with training and growth.

  • Someone's score on an achievement test shows how much s/he knows right now.

What would a personality inventory be likely to contain?

  • statements about personality

  • questions that assess likes and dislikes

  • self-selected ideals

Fill in the blank:

The ________ is an intelligence test specially designed for children.

WISC

| (Wechsler Intelligence Scale for Children)

What are some special features of the Minnesota Multiphasic Personality Inventory?

It has 10 clinical subscale scores, including a score for carelessness, faking, and distorting.

Define:

empirical criterion-keying approach

This is a process for creating test questions in which the developers choose from thousands of test questions placed in groups to differentiate between sick and healthy people with a variety of scores.

Which test is the California Personality Inventory the most like and why?

The CPI is most like the MMPI, but is especially intended for test takers ages 13 to young adult.


What is a projective test?

A test with ambiguous stimuli that has a subjective scoring system because there are limitless responses that the patient can give to the presented stimuli.

Projective tests are highly controversial. Critics point out research demonstrating projective tests' lack of reliability and validity. Yet projective tests remain in use in clinical settings and used in legal and clinical decision making.

The Rorschach Ink Blot Test is a widely used projective test. Why is using the Rorshach Ink Blot Test a problematic practice?

Projective tests are highly controversial. Unfortunately, projective tests, such as the Rorschach, have been and continue to be used in making legal determinations, (e.g., custody) despite evidence that such tests lack validity for assessing mental health (e.g., the Rorshach overpathologizes, frequently mistakenly identifying people as having mental illness when they do not.)

For an in-depth discussion of the problems with using the Rorschach Ink Blot Test to assess mental health, please read this resource

To view the ink blot images, please see this resource.

Fill in the blank:

The ________ ________ _____ is a projective test in which the patient is given a series of pictures of scenes involving different people and is instructed to tell a spontaneous story about each scene.

Thematic Apperception Test

| (TAT)

The TAT was developed at Harvard in the 1930s by Murray and Morgan. Murray and Morgan used ambiguous images selected from magazines. Participants construct stories basd on individually-presented images. The test was dveloped to assess personality.

In addition to personality, the TAT has been (and contiinues to be) used to assess personal growth and mental health. However, the TAT, like other projective tests, lacks both reliability and validity. Including the TAT in a test battery can, in some circumstances, introduce enough error that it reduce the battery's overall reliability and validity.

Which projective test was especially designed for children?

blacky pictures


Define:

Rotter Incomplete Sentences Blank

Forty sentence stems that the test-taker fills out with whatever comes to mind.


What are some advantages of using projective tests?

  • Good for breaking the ice

  • Some skilled clinicians may be able to use them to get information not captured in other types of tests. (maybe)

What are some disadvantages of using projective tests?

  • Validity evidence is scarce; psychologists cannot be sure about what responses mean.

  • Expensive and time-consuming.

  • Other less expensive tests work as well or better.

What is the theme of the Strong-Campbell Interest Inventory?

It is a career placement test based around the test-taker's interests.

What were Holland's six types of interests and occupational themes?

  1. realistic

  2. investigative

  3. artistic

  4. social

  5. enterprising

  6. conventional

What did Arthur Jensen propose?

Racial differences in IQ are genetically related.

Important critique: Jensen did not adequately address other factors, including the lack of culture-fair tests, epigenetic effects, and the impact of socioeconomic status (SES) on educational opportunities and achievement. In addition, critics of Jensen's perspective note that he ignored research that was inconsistent with his hypotheses and Jensen misunderstood the nuances of heritability, resulting in Jensen making deeply flawed conclusions.

What are four factors that can undermine data quality?

  • low precision of measurement

  • the state of the participant

  • the state of the experimenter

  • variation in the environment

What is an a priori hypothesis?

It occurs if one has a predicted hypothesis about a relationship (and the direction of relationship) between variables prior to collecting data.

Findings based on an a priori hypothesis are considered stronger/more persuasive than findings based on a post hoc (after the fact) analysis. This is because a finding based on an a priori hypothesis is less likely to be the result of chance.

What are some strategies to help improve the quality of data you collect?

  • Be careful!

  • Use a standardized procedure or protocol

  • Measure something that is important and engages participants

  • When using multiple measures, be aware of order effects (Does doing A before asking B influence the answers for B?)

  • Note anything unusual about the data collection. For instance, if a fire alarm goes off during data collection,or if the participant reports being in an unusual mood or unwell, make a note of it. Similarly, if you were colecting data on mood states the day after 9/11/2001, your data would likely have been impacted by participants' reactions to current events.

Name three things that can introduce error into our research.

Culture, Biases, and Situation strongly influence our Observations, Responses, and Behaviors.

Here is a helpful way of thinking about this issue: “…the assumptions you end up making as you try to bridge the imaginative gap are, of course, your own, and the most misleading assumptions are the ones you don't even know you're making.”

Douglas Adams & Mark Carwardine, "On Meeting a Gorilla." from Last Chance to See (writing about when they went to see gorillas in the wild)

Try, in as much as you are able, to be aware of the effects of these on you.

What is the primary aim of statistics?

To rule out randomness or chance as an explanation.

Human brains have evolved to detect patterns. A by-product of being very good at pattern detection is that human beings are prone to sometimes perceive patterns, even when there are no patterns.

What is measurement error?

  • A threat to research validity; it is the cumulative effect of extraneous variables.

  • Often referred to as noise in the data and an error variance.

What are four different types of data frequently used in psychological research?

  1. Self-Report

  2. Life Outcomes

  3. Behavioral Observations

  4. Informant

Self-Report - the participants perceptions of himself or herself (e.g., data collcted from surveys or interviews).

Life Outcomes - real life verifiable facts (e.g., criminal record/history of incarceration).

Behavioral Observations - observing a person's behavior (e.g., how a participant performs on a task, such as a Stroop test or an IQ test).

Informant - asking someone who knows the person to share their perceptions (e.g., asking a parent to describe his or her child's strengths and interests).

Shows vs No Shows (and others who refuse to participate)

In voluntary research, typically some potential participants refuse to participate. Other potential participants agree to participate then do not do so (no-shows).

Why is this a problem for voluntary research?

No-shows do not provide data, so they are not represented in the data and subsequent findings.

As a group, non-participaters/no-shows probably meaningfully differ from participants. There may be relevant, important personality or demographic differences between these groups.

Thus, no-shows are a threat to study validity and the generalizability of findings.

(This is not an issue in animal research; lab mice do not have the option of deciding not to participate.)

What are “WEIRD” countries; why is this an issue?

Western, Educated, Industrialized, Rich, and Democratic.

Most psychological research is conducted in WEIRD countries (such as the U.S., Canada, and the U.K.), so findings from such research may or may not generalize to other, non-WEIRD populations.

What is the law of large numbers?

The larger the sample size, the more reliable and valid the findings, assuming there is no significant sampling error.

What is the difference between a Type I error and a Type II error?

  • Type I error occurs when a researcher incorrectly concludes that a result is significant when it is not (a false positive).

  • Type II error occurs when a researcher fails to detect a significant result that actually exists (a false negative).

Psychological research tends to focus on working to avoid making Type I errors, although both are harmful.

What is a response set or response bias, and why is it problematic for researchers?

A response set is the tendency for a participant to have a pattern in how she or he responds to questionnaire items or interview questions, and this pattern or tendency occurs independently of the content of the items. Response sets are a problem because they introduce systematic bias/error into the data set.

What are examples? Some participants tend to say yes to researchers conducting an interview (an acquiescence bias), even when the answer is unknown, ambiguous, or even no. Other participants tend to give extreme answers. In some instances, cultural differences can lead to response sets.

What is an effect size?

It measures the strength of a relationship or finding, indicating how significant the observed effect is. It can be categorized as small, moderate, or large, depending on its magnitude.

One widely used and effective measure of effect size is Cohen's d, which helps quantify the difference between two group means.

What does it mean to have multiple outcome measures, and why is it important to design studies with them when possible?

It means using more than one method to assess a dependent variable. As long as all the measures are valid, employing multiple measures significantly enhances your ability to detect effects or differences in the study, providing a more robust evaluation of the findings.

If you want to test an intervention to treat post partum depression, then you could use multiple measures, such as the BDI, a rating from a family member, and a structured clinical interview. If there is any problem collecting or interpreting a measure, having multiple outcome measures reduces the problem's impact. E.g,, what if you used only the rating from family members, and it turned out that not all of the participants have a relative close enough to them to provide a valid rating?

What is a p value?

What is an effect size?

Whereas a p value conveys the likelihood that a finding is chance, (i.e., how likely the finding is real,) an effect size conveys how big or strong that difference between the groups is.

What are some arguments against using deception in psychological experiments?

  • Informed consent for deception is not possible.

  • When does the deception stop?

  • Harms the credibility of psychology

Why is deception sometimes used in psychological research, and what safeguards exist to protect participants?

Researchers sometimes use deception when collecting data to prevent participants' awareness from influencing the results. Deception is typically employed only when being direct could significantly bias the data. Its use must be pre-approved by an Institutional Review Board (IRB), ensuring that the potential harm does not outweigh the anticipated benefits, and participants must be fully debriefed afterward.

What is a standard deviation?

A measure of how closely the data in a sample or population cluster around the mean.

The standard deviation is equal to the square root of the variance.

For a more in-depth explanation of standard deviations, see this resource.

What does item difficulty refer to in item analysis?

The proportion of test-takers who answer an item correctly.

Item difficulty ranges from 0 to 1. A higher value indicates an easier question, as more test-takers answer it correctly.

True or False:

A high item discrimination index indicates a question is effective at distinguishing between high and low performers.

True

The item discrimination index assesses how well an item can differentiate between test-takers who perform well overall and those who do not. Values closer to 1 suggest better discrimination.

Fill in the blank:

Cronbach's α is used to measure the ________ ________ of a test.

internal consistency

Cronbach's α assesses the reliability of a test by examining the average correlation among items. Higher values indicate greater internal consistency.

What is the main difference between Cronbach’s α and KR-20?

  • KR-20 is specific to dichotomous items.

  • Cronbach’s α is used for continuous or ordinal data.

Both are measures of internal consistency, but KR-20 is used for tests with binary (right/wrong) scoring.

Define:

Classical Test Theory

| (CTT)

A framework for understanding test scores based on the idea that each score is composed of a true score and error.

CTT assumes that every observed score is the sum of a true score and random error, emphasizing the importance of reliability and validity.

How does modern test theory differ from classical test theory?

Modern test theory focuses on item-level data and models the probability of a response given various item and person parameters.

Also known as item response theory (IRT), it allows for more precise measurement and analysis across different populations and test forms.

What are norms in psychological testing?

Standards derived from a large group used to interpret individual test scores.

Norms provide a context for understanding where an individual's score falls relative to a representative sample, aiding in meaningful interpretation.

Why is standardization important in psychological testing?

It ensures that testing conditions are consistent and results are comparable across different administrations.

Standardization reduces variability unrelated to the construct being measured, enhancing the reliability and validity of test results.

Name one key component that should be included in a test manual.

  • Administration instructions

  • Scoring procedures

  • Normative data

  • Reliability and validity evidence

A comprehensive test manual helps ensure standardized administration and accurate interpretation of test results.

What is test bias and how does it differ from fairness?

Test bias occurs when a test systematically disadvantages certain groups, whereas fairness involves equitable treatment and outcomes for all examinees.

Bias is a statistical property, while fairness is a broader social concept. A fair test minimizes bias and ensures valid results for all demographic groups.

What is the main difference between factorial and simple designs in psychological research?

  • Factorial designs involve more than one independent variable.

  • Simple designs involve only one independent variable.

Factorial designs allow researchers to investigate the interaction effects between multiple variables, providing a more comprehensive understanding of complex phenomena.

How do longitudinal and cross-sectional studies differ?

  • Longitudinal studies track the same participants over time.

  • Cross-sectional studies analyze data from participants at a single point in time.

Longitudinal studies are valuable for observing developmental changes and causality, while cross-sectional studies are efficient for examining differences across age groups or demographics.

True or False:

Mixed-methods research combines qualitative and quantitative approaches.

True

Mixed-methods research integrates both qualitative and quantitative data to provide a more complete understanding of research questions, leveraging the strengths of both methodologies.

Fill in the blank:

Single-case designs focus on the detailed examination of a __________ __________.

single subject (or case)

Single-case designs are often used in clinical and applied settings to observe the effects of an intervention on an individual, allowing for detailed analysis and customization of treatment.

What is a primary threat to internal validity concerning historical events?

History

History refers to external events that occur during the course of a study that could influence participants' behavior or responses, potentially confounding the results.

What does the maturation threat to internal validity entail?

  • changes within participants over time

  • Maturation involves natural changes that occur within participants over the course of a study, such as aging or learning, which can affect the outcomes independently of the experimental treatment.

Maturation can be controlled by including a control group, which helps differentiate changes due to the experimental manipulation from those occurring naturally.

What are the key components of informed consent in psychological research?

  • Purpose of the research

  • Procedures involved

  • Risks and benefits

  • Confidentiality details

  • Voluntary participation

  • Contact information for questions

Informed consent is essential to respect participants' autonomy and ensure they understand what participation entails, allowing them to make an informed decision about their involvement.

True or False:

Anonymity means that even the researchers cannot identify the participants.

True

Anonymity ensures that participants' identities are not linked to their data, enhancing the privacy and security of sensitive information.

What is the primary difference between confidentiality and anonymity in research?

  • Confidentiality means the researcher knows the participants' identities but keeps them private.

  • Anonymity means even the researcher does not know the participants' identities.

Confidentiality requires robust data protection measures to prevent unauthorized access, maintaining trust between researchers and participants.

Fill in the blank:

Debriefing should include a(n) _________ of the study's purpose and methods.

explanation

Debriefing provides participants with comprehensive information about the study, helping to alleviate any potential misconceptions and offering closure regarding their involvement.

What are best practices for maintaining test security in psychological assessments?

  • Secure storage of test materials

  • Controlled access to tests

  • Regular monitoring of test use

  • Training for test administrators

Test security is crucial to uphold the validity and reliability of assessments, preventing unauthorized access and misuse that could compromise results.

List two potential consequences of test misuse in psychology.

  • Inaccurate diagnoses

  • Unfair treatment decisions

Test misuse can lead to harmful outcomes for individuals, including misinformed clinical decisions and biased employment or educational opportunities.

What is a confidence interval in the context of statistical analysis?

A range of values derived from sample data that is likely to contain the true population parameter.

Confidence intervals provide an estimated range of values that is believed to contain the population parameter with a certain level of confidence, usually 95% or 99%.

Fill in the blanks:

In an ANOVA report, the notation 'F(2, 27) = 5.12, p < .05' indicates that there are ___ degrees of freedom for the effect and ___ degrees of freedom for the error.

2; 27

In an ANOVA report, the numbers in parentheses represent the degrees of freedom for the effect (first number) and the degrees of freedom for the error (second number).

True or False:

In regression analysis, the coefficient of determination (R2) indicates the proportion of variance in the dependent variable that is predictable from the independent variable(s).

True

(R2) values range from 0 to 1, where 0 indicates no explanatory power and 1 indicates perfect explanation of the variability of the dependent variable by the independent variables.

What does Cohen’s d measure in psychological research?

  • The size of the effect

  • The difference between two means in terms of standard deviation

Cohen's d is a measure of effect size used to indicate the standardized difference between two means. It is important for understanding the practical significance of research findings.

How is η2 (eta squared) used in the context of ANOVA?

It measures the proportion of total variance that is attributable to an effect.

Eta squared is a measure of effect size for ANOVA that indicates the proportion of the total variability in the dependent variable that is associated with the factor under consideration.