GRE® Psychology Measurement, Methodology and Other: Measurement and Methodology Part 2
Covers key concepts from GRE® Psychology Measurement & Methodology (Part 2). Focuses on research design, data analysis, and statistical tools such as correlation, regression, and the line of best fit.
Fill in the blank:
The ________ ___ _____ ____ is the line one draws on the scatterplot to best represent the relationship between the two values.
line of best fit
Key Terms
Fill in the blank:
The ________ ___ _____ ____ is the line one draws on the scatterplot to best represent the relationship between the two values.
line of best fit
Define:
factor analysis
It uses multiple sets of correlations to see which variable correlations cluster together to create a factor or group of variables which are presum...
Describe the difference between the null hypothesis and the research hypothesis.
The null hypothesis states that there is no relationship between the two values tested.
Th...
Fill in the blank:
The ______ _____ is the level of certainty we wish to have that there is an actual relationship between the two values in an experiment.
alpha level
This is usually set at a 1 in 20 chance or an alpha level of 0.05.
Sandy rejected the null hypothesis and believed there was a relationship between phone numbers and math ability, when in reality, it was proved that there was not a relationship. What kind of statistical error did Sandy commit?
type I error
Bobby decided to accept the null hypothesis and decided there was no relationship between IQ and a healthy diet, even though there statistically was proof that there was a relationship. What kind of error did he commit?
type II error
Related Flashcard Decks
Study Tips
- Press F to enter focus mode for distraction-free studying
- Review cards regularly to improve retention
- Try to recall the answer before flipping the card
- Share this deck with friends to study together
| Term | Definition |
|---|---|
Fill in the blank: The ________ ___ _____ ____ is the line one draws on the scatterplot to best represent the relationship between the two values. | line of best fit |
Define: factor analysis | It uses multiple sets of correlations to see which variable correlations cluster together to create a factor or group of variables which are presumed to be measuring the same value, based on their high rates of correlation. |
Describe the difference between the null hypothesis and the research hypothesis. |
|
Fill in the blank: The ______ _____ is the level of certainty we wish to have that there is an actual relationship between the two values in an experiment. | alpha level This is usually set at a 1 in 20 chance or an alpha level of 0.05. |
Sandy rejected the null hypothesis and believed there was a relationship between phone numbers and math ability, when in reality, it was proved that there was not a relationship. What kind of statistical error did Sandy commit? | type I error |
Bobby decided to accept the null hypothesis and decided there was no relationship between IQ and a healthy diet, even though there statistically was proof that there was a relationship. What kind of error did he commit? | type II error |
Fill in the blank: The probablity of making a type II error is measured by the ________ level. | beta |
Which statistical test should I use if I am trying to compare three different groups or more? | analysis of variance | (ANOVA) |
If I only have two groups to compare, which statistical test should I use? | T-test |
Fill in the blank: Chi-square tests are used for data that is _______ rather than numerical. | categorical |
What is the most common way to perform a meta-analysis? | Gather as many sources about the topic as possible, examine for multiple themes, publish the results of the meta-analysis for the larger community. |
Define: norm-referenced testing | A test in which one's score is compared to that of all of the other test-takers, such as "Brian's score is in the 66th percentile." |
Fill in the blank: ________-__________ testing, rather than norm-referenced testing, determines how much information the test-taker knows about a certain subject, such as a history final. | Domain-referenced |
What are three things a test must have to be reliable? |
|
What aspect of a test are split-half reliability, alternate-form, and test-retest methods used to establish? | a test's reliability |
Define: validity | How much a test measures what it claims to measure. |
What would be the best way to test content validity? | Examining the actual content of the test to make sure that it accurately and completely meets all of the facets of the construct that are being tested. |
What does the face validity of the test show? | That the questions on the test will be asking questions that appear to ask questions about the subject of the test; this is the least objective form of validity. |
What would be one way to to determine the criterion validity of the SAT? | Determine whether high scores on the SAT predict high GPAs in college. |
Define: construct validity | How well the test addresses what you were trying to measure. |
Name two kinds of construct validity. |
|
What is the difference between aptitude and achievement tests? |
|
What would a personality inventory be likely to contain? |
|
Fill in the blank: The ________ is an intelligence test specially designed for children. | WISC | (Wechsler Intelligence Scale for Children) |
What are some special features of the Minnesota Multiphasic Personality Inventory? | It has 10 clinical subscale scores, including a score for carelessness, faking, and distorting. |
Define: empirical criterion-keying approach | This is a process for creating test questions in which the developers choose from thousands of test questions placed in groups to differentiate between sick and healthy people with a variety of scores. |
Which test is the California Personality Inventory the most like and why? | The CPI is most like the MMPI, but is especially intended for test takers ages 13 to young adult. |
What is a projective test? | A test with ambiguous stimuli that has a subjective scoring system because there are limitless responses that the patient can give to the presented stimuli. Projective tests are highly controversial. Critics point out research demonstrating projective tests' lack of reliability and validity. Yet projective tests remain in use in clinical settings and used in legal and clinical decision making. |
The Rorschach Ink Blot Test is a widely used projective test. Why is using the Rorshach Ink Blot Test a problematic practice? | Projective tests are highly controversial. Unfortunately, projective tests, such as the Rorschach, have been and continue to be used in making legal determinations, (e.g., custody) despite evidence that such tests lack validity for assessing mental health (e.g., the Rorshach overpathologizes, frequently mistakenly identifying people as having mental illness when they do not.) For an in-depth discussion of the problems with using the Rorschach Ink Blot Test to assess mental health, please read this resource To view the ink blot images, please see this resource. |
Fill in the blank: The ________ ________ _____ is a projective test in which the patient is given a series of pictures of scenes involving different people and is instructed to tell a spontaneous story about each scene. | Thematic Apperception Test | (TAT) The TAT was developed at Harvard in the 1930s by Murray and Morgan. Murray and Morgan used ambiguous images selected from magazines. Participants construct stories basd on individually-presented images. The test was dveloped to assess personality. In addition to personality, the TAT has been (and contiinues to be) used to assess personal growth and mental health. However, the TAT, like other projective tests, lacks both reliability and validity. Including the TAT in a test battery can, in some circumstances, introduce enough error that it reduce the battery's overall reliability and validity. |
Which projective test was especially designed for children? | blacky pictures |
Define: Rotter Incomplete Sentences Blank | Forty sentence stems that the test-taker fills out with whatever comes to mind. |
What are some advantages of using projective tests? |
|
What are some disadvantages of using projective tests? |
|
What is the theme of the Strong-Campbell Interest Inventory? | It is a career placement test based around the test-taker's interests. |
What were Holland's six types of interests and occupational themes? |
|
What did Arthur Jensen propose? | Racial differences in IQ are genetically related. Important critique: Jensen did not adequately address other factors, including the lack of culture-fair tests, epigenetic effects, and the impact of socioeconomic status (SES) on educational opportunities and achievement. In addition, critics of Jensen's perspective note that he ignored research that was inconsistent with his hypotheses and Jensen misunderstood the nuances of heritability, resulting in Jensen making deeply flawed conclusions. |
What are four factors that can undermine data quality? |
|
What is an a priori hypothesis? | It occurs if one has a predicted hypothesis about a relationship (and the direction of relationship) between variables prior to collecting data. Findings based on an a priori hypothesis are considered stronger/more persuasive than findings based on a post hoc (after the fact) analysis. This is because a finding based on an a priori hypothesis is less likely to be the result of chance. |
What are some strategies to help improve the quality of data you collect? |
|
Name three things that can introduce error into our research. | Culture, Biases, and Situation strongly influence our Observations, Responses, and Behaviors. Here is a helpful way of thinking about this issue: “…the assumptions you end up making as you try to bridge the imaginative gap are, of course, your own, and the most misleading assumptions are the ones you don't even know you're making.” Douglas Adams & Mark Carwardine, "On Meeting a Gorilla." from Last Chance to See (writing about when they went to see gorillas in the wild) Try, in as much as you are able, to be aware of the effects of these on you. |
What is the primary aim of statistics? | To rule out randomness or chance as an explanation. Human brains have evolved to detect patterns. A by-product of being very good at pattern detection is that human beings are prone to sometimes perceive patterns, even when there are no patterns. |
What is measurement error? |
|
What are four different types of data frequently used in psychological research? |
Self-Report - the participants perceptions of himself or herself (e.g., data collcted from surveys or interviews). Life Outcomes - real life verifiable facts (e.g., criminal record/history of incarceration). Behavioral Observations - observing a person's behavior (e.g., how a participant performs on a task, such as a Stroop test or an IQ test). Informant - asking someone who knows the person to share their perceptions (e.g., asking a parent to describe his or her child's strengths and interests). |
Shows vs No Shows (and others who refuse to participate) In voluntary research, typically some potential participants refuse to participate. Other potential participants agree to participate then do not do so (no-shows). Why is this a problem for voluntary research? | No-shows do not provide data, so they are not represented in the data and subsequent findings. As a group, non-participaters/no-shows probably meaningfully differ from participants. There may be relevant, important personality or demographic differences between these groups. Thus, no-shows are a threat to study validity and the generalizability of findings. (This is not an issue in animal research; lab mice do not have the option of deciding not to participate.) |
What are “WEIRD” countries; why is this an issue? | Western, Educated, Industrialized, Rich, and Democratic. Most psychological research is conducted in WEIRD countries (such as the U.S., Canada, and the U.K.), so findings from such research may or may not generalize to other, non-WEIRD populations. |
What is the law of large numbers? | The larger the sample size, the more reliable and valid the findings, assuming there is no significant sampling error. |
What is the difference between a Type I error and a Type II error? |
Psychological research tends to focus on working to avoid making Type I errors, although both are harmful. |
What is a response set or response bias, and why is it problematic for researchers? | A response set is the tendency for a participant to have a pattern in how she or he responds to questionnaire items or interview questions, and this pattern or tendency occurs independently of the content of the items. Response sets are a problem because they introduce systematic bias/error into the data set. What are examples? Some participants tend to say yes to researchers conducting an interview (an acquiescence bias), even when the answer is unknown, ambiguous, or even no. Other participants tend to give extreme answers. In some instances, cultural differences can lead to response sets. |
What is an effect size? | It measures the strength of a relationship or finding, indicating how significant the observed effect is. It can be categorized as small, moderate, or large, depending on its magnitude. One widely used and effective measure of effect size is Cohen's d, which helps quantify the difference between two group means. |
What does it mean to have multiple outcome measures, and why is it important to design studies with them when possible? | It means using more than one method to assess a dependent variable. As long as all the measures are valid, employing multiple measures significantly enhances your ability to detect effects or differences in the study, providing a more robust evaluation of the findings. If you want to test an intervention to treat post partum depression, then you could use multiple measures, such as the BDI, a rating from a family member, and a structured clinical interview. If there is any problem collecting or interpreting a measure, having multiple outcome measures reduces the problem's impact. E.g,, what if you used only the rating from family members, and it turned out that not all of the participants have a relative close enough to them to provide a valid rating? |
What is a p value? What is an effect size? | Whereas a p value conveys the likelihood that a finding is chance, (i.e., how likely the finding is real,) an effect size conveys how big or strong that difference between the groups is. |
What are some arguments against using deception in psychological experiments? |
|
Why is deception sometimes used in psychological research, and what safeguards exist to protect participants? | Researchers sometimes use deception when collecting data to prevent participants' awareness from influencing the results. Deception is typically employed only when being direct could significantly bias the data. Its use must be pre-approved by an Institutional Review Board (IRB), ensuring that the potential harm does not outweigh the anticipated benefits, and participants must be fully debriefed afterward. |
What is a standard deviation? | A measure of how closely the data in a sample or population cluster around the mean. The standard deviation is equal to the square root of the variance. For a more in-depth explanation of standard deviations, see this resource. |
What does item difficulty refer to in item analysis? | The proportion of test-takers who answer an item correctly. Item difficulty ranges from 0 to 1. A higher value indicates an easier question, as more test-takers answer it correctly. |
True or False: A high item discrimination index indicates a question is effective at distinguishing between high and low performers. | True The item discrimination index assesses how well an item can differentiate between test-takers who perform well overall and those who do not. Values closer to 1 suggest better discrimination. |
Fill in the blank: Cronbach's α is used to measure the ________ ________ of a test. | internal consistency Cronbach's α assesses the reliability of a test by examining the average correlation among items. Higher values indicate greater internal consistency. |
What is the main difference between Cronbach’s α and KR-20? |
Both are measures of internal consistency, but KR-20 is used for tests with binary (right/wrong) scoring. |
Define: Classical Test Theory | (CTT) | A framework for understanding test scores based on the idea that each score is composed of a true score and error. CTT assumes that every observed score is the sum of a true score and random error, emphasizing the importance of reliability and validity. |
How does modern test theory differ from classical test theory? | Modern test theory focuses on item-level data and models the probability of a response given various item and person parameters. Also known as item response theory (IRT), it allows for more precise measurement and analysis across different populations and test forms. |
What are norms in psychological testing? | Standards derived from a large group used to interpret individual test scores. Norms provide a context for understanding where an individual's score falls relative to a representative sample, aiding in meaningful interpretation. |
Why is standardization important in psychological testing? | It ensures that testing conditions are consistent and results are comparable across different administrations. Standardization reduces variability unrelated to the construct being measured, enhancing the reliability and validity of test results. |
Name one key component that should be included in a test manual. |
A comprehensive test manual helps ensure standardized administration and accurate interpretation of test results. |
What is test bias and how does it differ from fairness? | Test bias occurs when a test systematically disadvantages certain groups, whereas fairness involves equitable treatment and outcomes for all examinees. Bias is a statistical property, while fairness is a broader social concept. A fair test minimizes bias and ensures valid results for all demographic groups. |
What is the main difference between factorial and simple designs in psychological research? |
Factorial designs allow researchers to investigate the interaction effects between multiple variables, providing a more comprehensive understanding of complex phenomena. |
How do longitudinal and cross-sectional studies differ? |
Longitudinal studies are valuable for observing developmental changes and causality, while cross-sectional studies are efficient for examining differences across age groups or demographics. |
True or False: Mixed-methods research combines qualitative and quantitative approaches. | True Mixed-methods research integrates both qualitative and quantitative data to provide a more complete understanding of research questions, leveraging the strengths of both methodologies. |
Fill in the blank: Single-case designs focus on the detailed examination of a __________ __________. | single subject (or case) Single-case designs are often used in clinical and applied settings to observe the effects of an intervention on an individual, allowing for detailed analysis and customization of treatment. |
What is a primary threat to internal validity concerning historical events? | History History refers to external events that occur during the course of a study that could influence participants' behavior or responses, potentially confounding the results. |
What does the maturation threat to internal validity entail? |
Maturation can be controlled by including a control group, which helps differentiate changes due to the experimental manipulation from those occurring naturally. |
What are the key components of informed consent in psychological research? |
Informed consent is essential to respect participants' autonomy and ensure they understand what participation entails, allowing them to make an informed decision about their involvement. |
True or False: Anonymity means that even the researchers cannot identify the participants. | True Anonymity ensures that participants' identities are not linked to their data, enhancing the privacy and security of sensitive information. |
What is the primary difference between confidentiality and anonymity in research? |
Confidentiality requires robust data protection measures to prevent unauthorized access, maintaining trust between researchers and participants. |
Fill in the blank: Debriefing should include a(n) _________ of the study's purpose and methods. | explanation Debriefing provides participants with comprehensive information about the study, helping to alleviate any potential misconceptions and offering closure regarding their involvement. |
What are best practices for maintaining test security in psychological assessments? |
Test security is crucial to uphold the validity and reliability of assessments, preventing unauthorized access and misuse that could compromise results. |
List two potential consequences of test misuse in psychology. |
Test misuse can lead to harmful outcomes for individuals, including misinformed clinical decisions and biased employment or educational opportunities. |
What is a confidence interval in the context of statistical analysis? | A range of values derived from sample data that is likely to contain the true population parameter. Confidence intervals provide an estimated range of values that is believed to contain the population parameter with a certain level of confidence, usually 95% or 99%. |
Fill in the blanks: In an ANOVA report, the notation 'F(2, 27) = 5.12, p < .05' indicates that there are ___ degrees of freedom for the effect and ___ degrees of freedom for the error. | 2; 27 In an ANOVA report, the numbers in parentheses represent the degrees of freedom for the effect (first number) and the degrees of freedom for the error (second number). |
True or False: In regression analysis, the coefficient of determination (R2) indicates the proportion of variance in the dependent variable that is predictable from the independent variable(s). | True (R2) values range from 0 to 1, where 0 indicates no explanatory power and 1 indicates perfect explanation of the variability of the dependent variable by the independent variables. |
What does Cohen’s d measure in psychological research? |
Cohen's d is a measure of effect size used to indicate the standardized difference between two means. It is important for understanding the practical significance of research findings. |
How is η2 (eta squared) used in the context of ANOVA? | It measures the proportion of total variance that is attributable to an effect. Eta squared is a measure of effect size for ANOVA that indicates the proportion of the total variability in the dependent variable that is associated with the factor under consideration. |