Ways of Knowing
- Tenacity: whatever belief one firmly adheres to is truth: something is true because it has always been true.
- Authority: if someone with authority says it's true, it must be true
- A priori: intuition - common sense says it's true
- Scientific method: empirical testing
- Personal experience
- Above by Charles Pierce - a 19th century mathematician
- Discuss these ways of knowing what are the strengths and weaknesses of them.
Ways to Make Informed Decisions
- Rely on personal experiences
- Consult an expert authority
- Review the literature
- Conduct research
Notes:
Say you want to make a decision--
ex.- adopt a new method of teaching reading,
modify the behavior of a depressed adolescent
try a new articulation therapy
use a new/better teacher evaluation form that will result in improved
instruction, etc.
What could we base this decision on?
need some info to be able to make a sound, informed decision
1. personal experience-
can be biased by personal feelings,
may not have knowledge- decision is beyond scope of experience,
could be quick & easy way to decide, but memory is faulty and subject to
errors in recall
2. expert authority-
even authorities are not always right,
also dealing with someone else's personal experiences, also biased,
authorities don't agree on everything
3. search the literature-
much work, time consuming,
can review past research of all authorities and get broader perspective
than from just one
4. do research-
also much work, time consuming
BUT can sometimes get a specific answer to a specific local problem by
doing action research
difference between library research and RESEARCH
The Scientific Method
1. identification of problem
2. form a hypothesis
3. deductive reasoning- decide on procedure
4. data collection and analysis
5. derive conclusion
Note: We never prove a hypothesis -- confirm or fail to confirm
Notes:
approach to acquiring knowledge in the natural sciences
- identification of problem
- hypothesis
- deductive reasoning- decide on procedure
- what would be observed if hypothesis was true? how can it be tested?
- data collection and analysis
- derive conclusion
never prove a hypothesis-- confirm or fail to confirm
- Scientific theory compare to law
- what is a theory? A coherent set of propositions that explain & predict human behavior
- characteristics of theories
explains observed phenomenon [explain why]
should be consistent with previously established knowledge
should be verifiable [can we test it?]
should stimulate further research
- Why should we as counselors or social workers be interested in scientific methods of investigation? How will you benefit from learning this stuff?
- become better practitioners
- be able to evaluate your performance as a professional
- be able to identify "bad practices"
give examples
- be able to critically review the literature
- Some of you may continue your education
put in science-practitioner model
- Science Practitioner model
- this is how you and most graduates are being trained
- Began almost 50 years ago at the Boulder Conference: model for training psychologists was set up
- Recognizes importance of integration of Theory Research Practice
Lesson 2
- Review the scientific method
The Scientific Method
1. identification of problem
2. form a hypothesis
3. deductive reasoning- decide on procedure
4. data collection and analysis
5. derive conclusion
Note: We never prove a hypothesis -- confirm or fail to confirm
Stages in the Research Process
Select problem area
Derive hypothesis(es)
Review the literature
Develop methodology
Data collection
Data analysis
Interpretation of results
Select a Problem or Research Topic
- What makes a topic a "meaningful contribution" according to Heppner?
- contributes to the professional bases.
- something that will stimulate and motivate you.
- Start reading the literature
- Find out what resources are available to you at Oswego including faculty
- Keep in mind
- your interests
- building on previous research
- the role of theory
- the utility of your research (meaningful)
Questions and Hypotheses
Question: Questions about the relationship between or among constructs.
(constructs): "an informed, scientific idea "constructed" to describe or explain a behavior" e.g., intelligence or anxiety
Hypothesis: States the expected relationship between constructs.
- Problems with setting up hypotheses
- lack of relevant and extensive knowledge about a topic.
- cant measure and test constructs
- unable or unwilling to be specific enough
General Types of Questions or Hypotheses
- Descriptive (what are things like) e.g., survey to assess counselor attitudes towards licensure
- mostly descriptive analysis can include factor or cluster analysis
- Difference questions difference between or within groups; comparison e.g., do MS counselors do better than PhD counselors in dealing with client depression
- between or within group designs
- Relationship questions (how 2 or more constructs are related) e.g., how does therapist age effect client outcome
- Must have a testable question
- Kerlinger said a research question
- asks a question about
- the relationship between 2 or more constructs that
- can be measured and tested
- A testable question identifies
- the topic
- specific constructs of interest
Types of Hypotheses
Non-directional hypothesis
Directional hypothesis
Null hypothesis
Notes:
- non-directional = there is a difference or there is a relationship
- covers 2 of 3 possible outcomes
- directional
- usually preferred to non-directional
- 2 possible outcomes
- null = hypothesis of no difference
- statistical hypothesis-- tests of statistical significance make a decision to reject or fail to reject the null hypothesis -- usually not stated in a research article
Practice on hypotheses
- Will abused children who attend play therapy sessions for 6 months do better on measures of appropriate classroom behavior than children who attend 6 months of non-play therapy?
RESEARCH QUESTION
- At the end of 6 months, there will be a difference in appropriate classroom behaviors between those children who attend play therapy sessions and those who attend non-play therapy.
NON-DIRECTIONAL
- At the end of 6 months, there will be no difference in appropriate classroom behaviors between those children who attend play therapy sessions and those who attend non-play therapy.
NULL
- At the end of 6 months, there will be children who attend play therapy sessions will do better on measures of appropriate classroom behavior than children who attend non-play therapy.
DIRECTIONAL
Selecting a Hypothesis
interest - what are you dying to know?
feasibility - is research problem "do-able"?
ethics - is research ethically tenable?
Where to get ideas for hypotheses
personal experience
textbooks
library books
journals
Operational Definitions
- after developing questions
- define all terms and constructs e.g., play therapy
- results of studies may differ depending on how terms are defined
Variables & Levels
- If some characteristic of a group is the same for all individuals it is a "constant"
- If the characteristic differs among individuals in the group it is a "variable"
- e.g., suppose we conduct a study of first year counseling students
- grade level is a constant
- experience, intelligence, introversion are examples of variables
- Types of variables
- Dependent Variable: depends on or is related to independent variable
- Independent Variable: variable that is being manipulated or changed to determine its effect on the DV
- IV and DV are terms for experimental research
- Often use predictor and criterion variable for non-experimental
- Other types of variables
- Organismic: preexisting and cant be assigned e.g., sex
- Confounding: Its existence is inferred but cant be manipulated or measured e.g., the effects of sex-role socialization on counselor empathy
- Control: an independent variable not of primary interest e.g., reading achievement (dv) and sex & teaching method (IVs) > sex may be a control variable
- Describe differences between variables and levels of variables
- e.g., Counseling method >> client-centered, psychodynamic, behavioral
- sex >> male, female
- Population vs. sample
- Often use samples for efficiency
- a population is " the totality of all element, subjects or members that possess a specified set of one or more common characteristics that define it."
- a sample is a subset of the population
- population of interest what group we are interested in making generalizations about
Independent and Dependent Variables
- Practice - Identifying Independent and Dependent Variables
1. A survey of 1712 high school seniors in 421 high schools across the country revealed that 74% percent of students think that their teachers are doing a good or excellent job. Twenty-two percent of those surveyed said that they would like to become teachers. Although the seniors reported that their teachers generally have good content knowledge and are competent, the seniors also said that many of their teachers were not as interesting or creative as they should be.
Population of interest:
Independent variable:
Levels:
Dependent variable:
2. This study investigated whether absent students whose homes received phone calls via a computer-activated message device had a better school attendance record than students whose homes were not called. 150 students in three high schools in Pirogue Parish were randomly selected to receive the messages whenever they were absent. 150 students randomly selected students served as the control group. They did not receive any absence notification. After 60 days, the attendance records for the groups were compared. Results revealed that students whose homes were called were absent significantly less than students whose homes were not called.
Population of interest:
Independent variable:
Levels:
Dependent variable:
3. The researcher hypothesized that peer evaluation as part of the writing
process would lead to improved attitudes toward writing as measured by the
Writing Attitude Scale for Students (WASS). Four intact classes of eighth
graders were randomly assigned to the treatment group. This group received peer evaluation training and utilized peer evaluation during three writing
assignments. Four randomly selected classes served as the control group. They received feedback from teachers only after their writing assignments. Both groups completed the WASS after the research period was over. Results indicate that the treatment group had significantly higher scores on the WASS than the control group.
Population of interest:
Independent variable:
Levels:
Dependent variable:
4. Children who entered kindergarten at age 5 were compared with children who entered kindergarten at age 6 on measures of academic achievement taken at grade 5. Results indicate that children who entered kindergarten at age 6 scored significantly higher on standardized tests measuring reading achievement and mathematics achievement.
Population of interest:
Independent variable:
Levels:
Dependent variable:
5. This study explored the relationships between measures of computer science aptitude, mathematics achievement, and writing achievement. For the 142 high school students in the sample, it was found that there is a moderately strong pattern of relationship (+.71) between computer science aptitude and mathematics achievement. However, there was no relationship between computer science aptitude and writing achievement.
Population of interest:
Independent variable:
Levels:
Dependent variable:
6. The effects of social skill training on a 29 year-old mentally retarded male adult were explored. The subject listened to typical social situations (such as getting a compliment or saying thank you) on audiotape and discussed the situations with a therapist. The subject's positive social verbal interactions were counted before the training, at three times during the training period, and at three times after the training had been completed. All counts were taken by observing the subject at a evening recreation time in the subject's group home. Results show that positive interactions increased during the training period but then rapidly decreased after training had stopped.
Population of interest:
Independent variable:
Levels:
Dependent variable:
7. A researcher wants to investigate the changing roles of working mothers and the pressures they and their children face. The researcher observes the
behaviors of 12 four year-old children in a day care setting for 6 to 8 hours per week for 10 months. The mothers are observed as they drop off and pick up their children. The mothers and the day care workers are interviewed. The researcher discovers several recurrent themes in the observations and interviews.
Population of interest:
Independent variable:
Levels:
Dependent variable:
Notes:
1. High school students
None --Trick question! Descriptive research doesnt have IVs and DVs
2. high school students in Pirogue Parish
computer-activated phone calls
got messages vs. did not get messages
attendance
3. eighth graders
peer evaluation
peer evaluation vs. teacher feedback
attitudes toward writing
4. school children
age of kindergarten entrance
age 5 vs. age 6
reading and math achievement
NOTE: cannot infer causality
5.high school students
None - Trick question
Correlational studies do not have independent and dependent variables
6. No population - single subject
social skill training
no training (before) vs audiotape/therapist (after)
social skills
7. working mothers, day care workers and preschoolers
None- ethnographic studies dont have IVs and DVs
Parts of a Research Article
ABSTRACT
INTRODUCTION
REVIEW OF THE RESEARCH
HYPOTHESES OR OBJECTIVES
Notes:
The organization of primary research articles follows the steps in the
scientific method
abstract --brief overview of the article-- usually 200-250 words maximum--
convenience to reader, not all journals require an abstract
1. introduction
states the problem in a general way.
cites important previous theory.
justify the importance of the study-- importance should be objectively clear
2. review of research
cite previous research -- what is the background in the field that leads to
your study?
should be evident where your research fits
--look for evidence of bias in prior research
who is the author?
what is the author's affiliation? does affiliation indicate bias
if author is strong proponent or opponent of certain theory, may be an
indication of bias
does author cite relevant research?
usually key studies will be mentioned over and over again, if these are
missing, may signify that author hasn't done a thorough review
is review of research biased toward a particular viewpoint?
are contradictory studies ignored?
is biased language used?
--how many articles in a review of research?
can't review everything, depends on journal space; general guideline, 5-10
key articles should be cited, if only briefly for articles - more in thesis
3. hypothesis -- research hypothesis is a statement of what we expect--
we make a guess about the relationships between variables or the differences
between two treatments, etc.
-may be a statement or in question form
-a good research hypothesis: 1. sets up a "testable" situation 2. gives
direction to research 3. identifies the variables of importance 4. is
grounded in theory 5. is brief but with clarity
Some studies use objectives: instead of a hypothesis: descriptive study,
ethnography
ex.- do descriptive study of teacher salary -- look at salary schedules and
policies
objectives are to describe level of salary for state and education levels,
ex.- study sex-role related prejudices in kindergartners
observe sex-role related play, record instances of peer learning of sex-role
related behaviors, look at influence of teacher
METHODS
METHODS
SAMPLING
INSTRUMENTATION
RESEARCH DESIGN
DATA ANALYSIS METHODS
RESULTS
DISCUSSION/IMPLICATIONS
Notes:
4. methods
sampling -- how was sample selected?
what does sample look like?
can't study entire population
want to get a sample that reflects the population
data collection - what data was collected
how was data collected
does data seem to be reliable and valid (Construct vs. Indicator)
statistical analysis
how was data analyzed?
5. results
data crunching results are given with level of statistical significance
6. conclusions
are conclusions warranted? or do they go beyond the results?
Look at article critique --website: Author makes statements about teaching
effectiveness, not warranted by what was investigated
do conclusions answer the research question?
do conclusions agree with previous research?
what is the future of research in this field?,
good research often generates more questions than you answer
Choosing a Design
- Research Design:
- An experimental design is the structure by which variables are positioned or arranged in the experiment (Wiersma, 1991)
- Research design is a set of plans and procedures that reduce error and simultaneously help the researcher obtain empirical evidence (data) about isolated variables of interest (Heppner et al.)
- Quantitative Research = Quantifying Variance
- We try to explain the variance in our dependent (response) variable
- Give example of "varying" scores measuring depression with the BDI using a new treatment (BDI scores vary in the group we are trying to explain how much of the variance can the attributed to our IV [method])

- Note that not much of the variance is explained in this model. Since one of the main purposes of research design is to explain variance, we would want to have control over more of the variance (be able to explain more of it)
- This can be done by placing restrictions on the research conditions, such as including control variables.
- Ways to control variance
- Randomization
- Building conditions or factors into the design as independent variables (e.g., degree of depression could categorize as high or moderate and randomly assign from those groups [differences within groups would be randomly distributed])
- Holding conditions or factors constant (e.g., time of day, therapist) takes out the influence of those variables but may restrict the generalizability
- Statistical adjustments removes the effect of the control variable (pretest scores controlled)
- Characteristics of a good research design (Wiersma)
- Freedom from bias the data and the stats computed from the data do not vary in some systematic way. Any differences could be then attributed to the independent variables. (e.g., if testing two counseling treatments and one group was severely depressed while the other group was slightly depressed bias would exist could not compare the two treatments)
- Usually handled by random selection or assignment
- Freedom from confounding bias can be introduced by the confounding of variables. 2 or more variables are confounded when their effects cant be separated (e.g., two treatment methods with two different therapists does treatment or therapist have an effect?)
- Statistical precision for testing hypotheses the data must be adequate to accurately test the hypothesis. This is achieved through proper design and good measurement.
- MAXMINCON or how to increase validity
- Maximize the variance of the variables of interest used in the research design (maximize the effects of the independent variable on the dependent variable)
- make levels of the IVs as different as possible (e.g., if doing treatment and you do all treatment groups, similarities may exist between treatments)
- Minimize the error variance (measurement error, differences between subjects)
- Error refers to any event, characteristic, or situation that is unsystematic and causes measurements to fluctuate randomly
- Standardization of treatment and measurement procedures helps to reduce error.
- Pilot research also helps smooth out problems with research that can cause error
- Control the variance of extraneous variables (control extraneous variables)
- Extraneous variables cause systematic error or bias. While error variance is random (affects subjects or measurements in no particular order), error due to extraneous variables causes measurements to be pulled in a similar direction.
- Problems with MAXMINCON in counseling research
- sometimes unethical
- sometimes produces artificial results results dont translate to the real world
- How to choose the best design
- Find out what is already known about the research topic what can you add?
- Find out how other people have made designs to explore the research question and see if you can add to the research by doing it differently
- Consider the resources available to you
- Consider the threats to validity
- Find the best match of 1-4
Experimental Designs
- Criteria for well-designed experimental study (Wiersma)
- Adequate experimental control there must be enough constraints on the conditions of the experiment so that the researcher can interpret the results (e.g., if you are testing two types of treatments with adolescents, can you control account for other variables that may confound the findings [treatment provider, setting] and isolate the effects of the experimental variable [treatment method])
- Lack of artificiality especially important in counseling research. Can the findings be extended to the "real world."
- Basis for comparison must be able to make some comparison to determine whether there is an experimental effect (control, comparison, or external group) e.g., treatment method with no comparison is the treatment better, worse, the same as no treatment?
- Adequate information data must be adequate for testing the hypotheses of the experiment
- Uncontaminated data data should be well measured and represent the experimental effects
- No confounding variables should be no other variables having an effect on the dependent or variable. If so, these variables should be separated or controlled through the experimental design or statistical manipulation.
- Representativeness results should be able to be generalized beyond the sample (except in action research). Usually done with random selection or assignment.
- Parsimony all other things being equal, a simpler design is preferred.
LESSON 3
- Validity
- Validity has been described as the degree to which inferences reflect the actual state of affairs (Heppner, Kivlighan, and Wampold,1992)
- an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions base on test scores or other modes of assessment (Messick, 1988)
- Important to both of these definitions is the concept that validity is not a measure of a test or design, rather it is a measure of the usefulness of the inferences one gathers from a measurement device or experiment.
Design Validity
- Campbell and Stanley (1963) conceptualized two types of validity in relation to research design: Internal & External validity
- Internal validity is the basic minimum without which any experiment is uninterpretable
- Internal validity is the degree to which the researcher has ensured that the conclusions stated follow from the variables studied and no other research factors (Wilkenson & McNeil)
- External validity asks the question of generalizability. To what populations, settings, treatment variables, and measurement variables can this effect be generalized.
Relationship between internal and external validity

Gelso's Research Classification

- Experimental studies have high internal validity because they allow variables to be directly manipulated causation may be inferred
- Descriptive studies have low internal validity because they do not manipulate the IVs no causation
- Field studies have high external validity because they use real people in real situations causation can't be inferred
- Lab experiments have low external validity because the conditions do not generalize outside of the lab
Notes cont. ---
- All research designs will have problems with validity.
- We must make tradeoffs between internal and external validity
- The more closely we control the study, the better the internal validity but usually at the expense of external validity
- Cook and Campbell (1979) extended the work of Campbell and Stanley by defining four types of validity:
- Statistical conclusion validity has the researcher made the correct conclusions about the relationships between variables in the study?
- Internal validity how well can conclusions be drawn from the data?
- Construct validity of putative causes and effects -- how well do the variables in the study represent the constructs being analyzed?
- External validity how well can the results of the experiment be generalized to the to groups, settings, and contexts beyond the experimental sample.
- Statistical conclusion validity
- Statistical tests are based on statistical hypotheses (give example on board of null [no relationship] and alternative [difference])
- Statistical tests are based on hypothesis about populations, not samples
- When we use samples to make generalizations about the population we can make mistakes the sample may be different than the population in some way
- Statistical tests never prove relationships exist or do not exist rather we make statements about how likely it is that relationships do exist in the population
- e.g., 65 % of Americans believe Clinton should remain in office + or - 3% means we can be 95% confident that between 62% and 68% of Americans believe Clinton should remain in office
- Threats to statistical validity
- Power power is the probability of correctly deciding that there is a relationship if one truly exists (e.g., probability of deciding that counseling method has an effect on depression if it really does)
State of Nature
Decision Null is true Null is false
|
Reject null |
Type I error (alpha) |
Correct decision |
|
Do not reject null |
Correct decision |
Type II error (beta) |
power = 1 - beta
- Low power usually due to small n (if not enough subjects in our study we may conclude after testing that method has no effect, when in reality it does)
- Violated assumptions statistical tests have assumptions. If the assumptions are not met, we may come to inaccurate conclusions.
- Fishing and error rate finding significant results due to chance
- Unreliability of measures
- Unreliability of treatment implementation for example if therapist in our example use different treatment modes, the true relationship between method and depression may be obscured
- Random irrelevancies in the experimental setting anything that may lead to differences in how experimentees respond (excluding treatment) e.g., in our methods example, if several members of the experiment also go to a support group for depression outside of the experiment.
- Random heterogeneity of respondents differences in subjects that may lead to differences in their responses (e.g., some of our participants may have many resources and may be more likely to respond to treatment).
- homogenous groups give us better control over variability but less ability to generalize results - use design and statistics to control heterogeneous groups.
- Internal validity
- History events that occurred in addition to the treatment (e.g., TV show on depression aired during the experimental period
- Maturation things that normally occur over time (growing older, hungrier, tired)
- Testing taking a test more than once (pre and post) - does the initial testing influence the outcome - act as a treatment? most problematic if testing done on single group.
- Instrumentation changes over the course of the experiment in testing instruments or observers (e.g., if checking behavior of children over time, observer may become more observant or less attentive) - also applies to diffent processes or procedures (two therapist doing different groups)
- Statistical regression measurements tend to go toward the mean over time (real low scorers will tend to score higher and high scorers will tend to score lower because other factors [ anxiety, depression] may have influenced scores in the first place)
- random assignment used to make the effects of regression the same in both groups
- Selection particularly important when we do not have random assignment to groups
- Mortality
- Interactions with selection even in cases of random assignment to groups interactions can occur. For example in our group getting the treatment is from Liverpool and the other group is from Oswego. Group 1 did not have an opportunity to see the TV show on depression while the Oswego group did. Selection X History
- Ambiguity about the direction of causal inferences especially important non-experimental designs correlation does not mean causation.
- Diffusion or imitation of treatments effects from treatment groups spread to non-treatment groups (e.g., our treatment group talks to the non-treatment group about what they are doing).
- Compensatory equalization those not getting treatments get some type of treatment elsewhere
- Compensatory Rivalry by subjects receiving less-desirable treatments desire to outperform the treatment group
- Resentful demoralization of subjects receiving less-desirable treatments opposite of #12 e.g., in our study the non-treatment group may become more depressed because they are not getting treatment
- Inadequate preoperational explication of constructs not well defined or operationalized (e.g., play therapy)
- Mono-Operation bias using only one measure of some construct with the DVs and IVs. May not capture the whole essence of the constructs being defined.
- Mono-method bias using the same method of measurement for multiple DVs or IVs. e.g., using two self reported measures of depression rather than a self-report and an family member evaluation
- Hypothesis guessing within the experimental conditions when people guess what the experiment is and comply or rebel. Give example of study with Joes depression study.
- Evaluation Apprehension people may respond in different ways because of being evaluated (may want to look good)
- Experimenter Expectancies (the experimenter responds differentially in treatment or non-treatment groups)
- Confounding Constructs & Levels of Constructs occurs when only segments of a scale are used e.g., if we want to examine the relationship between severity of depression and effectiveness of a treatment, but only select people on the high end of depression severity, we may not see the true relationship
- Interaction of different Treatments when one treatment follows another. You cant determine whether observed changes were due to one of the treatments or the interaction of the treatments.
- Interaction of Testing and Treatment treatment and pretest combine to form an effect e.g., pretest sensitizes people.
- Restricted Generalizability across Constructs we can only make generalizations about the specific constructs used in the experiment. Often we fail to look at more important constructs.
- Interaction of Selection & Treatment To what extent can results be generalized to different types of people (gender, intelligence, race,
). Generalizations can only be made to groups that were selected for the study. Better to use subjects from different groups e.g., Hollands initial vocational development research.
- Interaction of Setting & Treatment To what extent can results be generalized to different settings. Generalizations can only be made to those settings that were used for the study. Better to test at different settings.
- Interaction of History & Treatment -- To what extent can results be generalized to different time periods. We can enhance validity across time by repeating research at different times.
- Messick argues convincingly that all types of validation may be considered construct validation. He defined construct validity as "based on an integration of any evidence that bears on the interpretation or meaning of the test scores"
Validity Practice
Mary has designed and run an experiment to determine the effects of counselor well-being on client outcome. She hypothesized that counselors who were emotionally healthy would have a more positive effect on their clients (i.e., clients would do better in therapy) than counselors who were emotionally unhealthy.
Mary decided to measure counselor emotional state with a measure of trait anxiety; the Counselor-Reported Anxiety Profile. Client outcomes would be measured with a 20 item questionnaire designed to measure the clients self-reported benefits of therapy. The client self-report measured client satisfaction with their counselor, client perception of change in therapy, and client perception of their counselors involvement in therapy.
Mary recruited subjects for her study from a women's center where she worked part-time. Therapists for the study were recruited from the counseling master's program at the local university. Four counselors volunteered for the experiment and were accepted. The counselors were tested to with the Counselor-Reported Anxiety Profile. All were above average on the scale, indicating that all were experiencing higher than normal levels of anxiety. Each counselor was randomly assigned to a counseling client and therapy was conducted for 6 weeks (one 50 minute session per week). At the end of the 6th session, each client was asked to fill out the questionnaire.
Mary matched the client scores with the counselors' test results and did a correlational analysis. She did not find a statistically significant correlation and concluded that counselor emotional health was not related to therapy outcome.
- Threats to statistical validity
- Power power is the probability of correctly deciding that there is a relationship if one truly exists (e.g., probability of deciding that counseling method has an effect on
- Unreliability of measures
- Unreliability of treatment implementation for example if therapist in our example use different treatment modes, the true relationship between method and depression may be obscured
- Random irrelevancies in the experimental setting anything that may lead to differences in how experimentees respond (excluding treatment) e.g., in our methods example, if several members of the experiment also go to a support group for depression outside of the experiment.
- Internal validity
- History events that occurred in addition to the treatment (e.g., TV show on depression aired during the experimental period
- Maturation things that normally occur over time (growing older, hungrier, tired)
- Selection particularly important when we do not have random assignment to groups
- Ambiguity about the direction of causal inferences especially important non-experimental designs correlation does not mean causation.
- Diffusion or imitation of treatments effects from treatment groups spread to non-treatment groups (e.g., our treatment group talks to the non-treatment group about what they are doing).
- Inadequate preoperational explication of constructs not well defined or operationalized (e.g., play therapy)
- Mono-Operation bias using only one measure of some construct with the DVs and IVs. May not capture the whole essence of the constructs being defined.
- Evaluation Apprehension people may respond in different ways because of being evaluated (may want to look good)
- Confounding Constructs & Levels of Constructs occurs when only segments of a scale are used e.g., if we want to examine the relationship between severity of depression and effectiveness of a treatment, but only select people on the high end of depression severity, we may not see the true relationship
- Interaction of different Treatments when one treatment follows another. You cant determine whether observed changes were due to one of the treatments or the interaction of the treatments.
- Restricted Generalizability across Constructs we can only make generalizations about the specific constructs used in the experiment. Often we fail to look at more important constructs.
- Interaction of Selection & Treatment To what extent can results be generalized to different types of people (gender, intelligence, race,
). Generalizations can only be made to groups that were selected for the study. Better to use subjects from different groups e.g., Hollands initial vocational development research.
- Interaction of Setting & Treatment To what extent can results be generalized to different settings. Generalizations can only be made to those settings that were used for the study. Better to test at different settings.
- Interaction of History & Treatment -- To what extent can results be generalized to different time periods. We can enhance validity across time by repeating research at different times.
Reliability
- in order to learn something from a set of scores or measures, the scores must vary (e.g., if you want to see the effect of computer guidance on students responsiveness to counseling, the measure of student responsiveness must be different for students in the group)
- Whenever scores vary across conditions (e.g., computer group vs. non-computer group) some of the variance is hoped to be due to the condition
- however, some of the variance is always due to error (imprecise measurement or unexplained factors that influence scores)
- in the language of classical test theory, observed scores are made up of the "true" score plus error:
X = T + e
where X = an observed score
T = a true score free from any error
e = error
- error may be due to systematic or random factors
- systematic confounds
- random non-systematic errors that can occur by chance or poorly designed measures
- non-systematic means that it while the scores are not precise measurements, they do not cause scores to systematically be higher or lower than the collective true scores
- The reliability of a set of scores is the degree to which the scores are due to systematic rather than chance factors although it does not tell us what the systematic variance is due to
- Reliability also is defined as the consistency or reproducibility of a set of scores how well can a test measure the same scores on different occasions
- going back to the CTT model we can also describe the model as:
the observed variability of a set of scores is equal to the true variability plus error variance (random error)
s
o2 = st2 + se2
rxx =
st2/so2
rxx = 1 -
se2/so2
rxx is an estimate of the reliability of a set of scores
correlations range from - 1.0 to 1.0
reliability estimates are correlations, but only range from 0.0 to 1.0
reliability estimates are tied to the test-takers reliability refers to the consistency of a set of scores therefore reliability is an attribute of a set of scores and not the test itself you cannot generalize reliability to groups or settings other than those used to generate reliability coefficient
- there are methods of estimating reliability that are not bound to the test-takers but these methods are not frequently used (IRT methods)
- Ways of estimating reliability
- internal consistency
- stability
- alternate forms
- interrater reliability
- Internal consistency reliability measures of the homogeneity of items people who measure high on some construct will tend to answer items in the same direction (if items are coded in the same direction) people who measure low will answer in the opposite direction
- e.g., you have a scale to measure depression items are coded 1-5 (strongly disagree to strongly agree) high numbered items are indicative of depression depressed people will have mostly higher number responses and non-depressed people will have low number responses
- these methods of estimating reliability require only one test administration to a group of people
- common methods:
- Cronbach's alpha
- Kuder Richardson 20 (KR20)
- Kuder Richardson 21 (KR21)
- Split-halves correlate to halves of a test; there should be a positive correlation
- Stability measures the correlation between tests given at different time points (test-retest)
- used only if you expect the construct being measured to remain stable over time (state vs trait anxiety)
- if this measure is used, you need to also report the time period
- Alternate forms (Parallel forms) using two similar tests
- requires that the two tests measure the same construct
- do not see this in the literature very much
- Interrater reliability required when a measure is not scored objectively, but subjectively
- e.g., Bender-Gestalt used to diagnose depression have several experts make diagnosis and correlate the results high correlations indicate good reliability
- requires that raters use standardized protocols and often means extensive training
- Reliability is essential in establishing validity but reliability does not show validity reliability can tell you how precisely you are measuring something, but not what you are measuring
**** do bullseye example *****
- Because reliability is tied to the group it was normed on, groups outside of the norm do may not be adequately measured by the test
- e.g., PF16 measures personality characteristics of normal people people who are experiencing psychosis will not be accurately measured by the test (ceiling effect)
- the MMPI measures personality in non-normal subjects it does not do well in measuring personality characteristics of normal populations (floor effect)
- Factors that effect reliability
- test length longer tests will be more reliable, but tests that are too long may cause test-takers to become fatigued or irritated a good balance needs to be found
- homogeneity As with any correlation, heterogeneity among a group increases variance and the resulting correlation. In reliability estimates, as group homogeneity increases, true score variance decreases and the resulting proportion of true score to observed score variances decreases (Crocker & Algina, 1986).
- item difficulty related to homogeneity; if items are to difficult or too easy there will not be variance among the test-takers scores (e.g., everyone get a 100%)
Number of alternatives the fewer the number of alternate answers the better the reliability (e.g., t-f tests will tend to be more reliable than
Writing Research and Using APA
Sections of the Research Report
Title
Abstract
Introduction
Method
Subjects
Measures (or variables or instruments)
Materials
Design (or design and analysis)
Procedure
Results
Discussion (or conclusions)
References
Tables
Figures
- Titles
- A title should summarize the research
- should attempt to list the dependent and independent variables and the outcome
- be succinct
- Abstracts
- Should accurately and succinctly summarize the content of each of the sections of the report
- journal articles 100 to 150 pages
- OSU thesis less than 200 pages
- Introduction
- the book combines the introduction with the literature review.
- intro include the;
- intro to the problem:
- development of the framework for the study:
- statement of the research hypothesis
- For this class we will follow the outline provided by your syllabus because it is in a format more often used in theses
The Research Problem
Introduction
Problem statement
Literature Review
Definitions
Specific research question
Significance of Research
Methods
Experimental Survey/Ex Post Facto Qualitative
Hypothesis Hypothesis Hypothesis
Subjects/Sampling Sampling Focusing
Treatment Instrumentation Sampling
Data Gathering Data Gathering Instrumentation
Data Analysis Schedule Iterations
Schedule Schedule
Pilot Studies
Human Subjects Concerns
Limitations
The Research Problem
- Introduction:
- should be brief
- set the stage for the research
- generally prepare the reader for your research
- for proposal should be a couple of paragraphs
- Problem statement:
- show the problem importance
- show the problem in the perspective of the larger field in which it is embedded
- state the problem generality (i.e., state the generalizability of your research how the findings can be used and applied to)
- Limit the problem you must learn to focus cannot solve all the worlds problems in one study
- Literature Review
- review relevant and recent literature related to the topic
- lead the reader to the question you are going to ask. The question should be obvious and inescapable by the end of the lit review.
- no research starts from scratch a good lit review shows your grasp of what has been done and how you will advance the knowledge in the field
- in reviewing the lit, point out the flaws in the research and show how you can avoid these flaws
- if there is a theoretical base for your research, discuss it
- consider information from fields outside of your own
- make sure the review is current
- be selective in the articles you use use current and pertinent literature
- use primary sources when needed
- the overall lit review should be like a funnel > start broad and narrow the topic to focus on your questions that will be addressed in the research
- Definitions
- operationalize your terms and explain terms
- Specific research question
- Significance of Research
- How will your research add to the bases
- How does it exceed the past research
Methods
- describe how the research hypothesis will be (were) tested
- your student handbook gives an outline of how this section is to be developed for a thesis
- Subjects
- Instrumentation
- Procedure
- limitations
- I would add some things in here to make the methods section more complete
Experimental Survey/Ex Post Facto Qualitative
Hypothesis Hypothesis Hypothesis
Subjects/Sampling Sampling Focusing
Instrumentation ** Instrumentation Sampling
Treatment Data Gathering Instrumentation
Data Gathering Schedule Iterations
Data Analysis Schedule
Schedule
** NOTE: I ADDED THIS
Pilot Studies
Human Subjects Concerns
Limitations
- subjects:
- describe the subjects
- give the N
- describe how you sampled (selected) the subjects
- will discuss how to sample later
- instrumentation
- describe the instruments and methods you will use to measure the constructs under consideration
- describe why you selected these instruments and methods
- address the validity and reliability of the instruments
- include a description of any materials used in the research
- treatment
- describe the treatment sufficiently so that others may replicate your work
Boll, L. (1973). Effects of filial therapy on maternal perceptions of their mentally retarded children's social behavior (Doctoral dissertation, University of Oklahoma, 1973). Dissertation Abstracts International, 33(12-A), 6661.
Fall, M., Balvanz, J., Nelson, L., & Johnson, L. (1994). The relationship of a play therapy intervention to self-efficacy and classroom learning behaviors. Paper presented at the North Central Association for Counselor Education and Supervision, Milwaukee, WI.
Foley, J. M. Training future teachers as play therapists: An investigation of therapeutic outcome and orientation toward pupils. East Lansing, MI: National Center for Research on Teacher Learning. (ERIC Document Reproduction Service No. ED 067 794)
Wortman, P. (1994). Judging research quality. In H. Cooper & L. Hedges (Eds.), The handbook of research synthesis. (pp. 97-109). New York: Russel Sage Foundation.
Webb, N. B. (1991). Play therapy with children in crises. New York: Guilford Press.
Schaefer, C., & OConnor, K. (1983). Handbook of play therapy. New York: Wiley.
Marans, S., Mayes, L., & Colanna, A. (1993). Psychoanalytic views of childrens play. In A. Solnit, D. Cohen, & P. Neubaur (Eds.), The many meanings of play. (pp. 9-28). New Haven, CT: Yale University Press.
CW4B
USING APA STYLE --
THE MOST COMMON MISTAKES
1. Inventing your own rules for format and reference lists.
APA is a very precise style. Check the manual for the rules which are
comprehensive. Look at the samples provided. Ask questions if you run across
anything which does not fit APA rules.
2. Using incorrect margins.
APA uses 1 inch margins on all sides. The bottom margin may be adjusted if
needed, for example to avoid putting a heading on the last line of the page or
to avoid putting the last few words of a paragraph at the top of the next page.
Page headers go inside the top margin.
3. Using an author's first name or using gender specific pronouns.
APA requires nonsexist language and references at all times. Proof your work
carefully. There is no reason to ever identify the gender of a researcher or
author. 4. Formatting the title page incorrectly. Look at the samples and the
manual to be sure. The page header and the running head are two different
things.
5. Not formatting the reference page correctly. Follow the manual closely.
The reference list is in alphabetical order; check the manual if there is a
question. The reference list is not a bibliography; there should be a
correspondence between the reference list and the references in the text.
6. Using quotations.
Avoid direct quotes unless absolutely necessary. Direct quotes are rare in APA.
Summarize and paraphrase instead.
7. Using incorrect spacing.
Double space the text. All text is flush left. Do not hyphenate words. Check
spacing between headings and text.
8. Incorrect capitalization in a book or article title.
Caps are used only for the first word, proper nouns, and the first word after a
colon.
9. Putting too much information in the reference list.
Use an issue number only when warranted. Give month, season, or exact date only
when warranted. Follow the manual.
10. Using first person.
Never refer to yourself (I, we, me, etc.). You may have to resort to passive
voice. Check the manual for hints on how to strengthen your writing style.
11. Using the wrong verb tenses.
Double check verb tenses. In general, when referring to research conducted in
the past, use past tense ("Smith concluded.."). When referring to the present
state of knowledge or theory, use present tense ("These studies suggest...").
When talking about your proposed methods, use future tense ("The data will be
collected..."). Look at the samples.
give our example of article critique
Ethical Issues
Review sources of ethics: ACA, APA, NASP, ASCA
5 fundamental ethical principles
- nonmaleficence: "do no harm" -- in planning research you must be aware of harm or potential harm that may come to subjects
- may be the most basic and important principle
- if you must make the decision of helping others vs. potentially harming some, you are primarily obligated to do no harm (e.g., want to determine if you have developed a good therapy for use with severely depressed. To best test your method, you would need a control group -- however, denying treatment to a group would be potentially harming and you would be obligated to not use a no-treatment control group)
- beneficence: "do good for others"
- to do good, we must be competent in what we practice (clinically or research)
- to do good also demands that we do research to improve our knowledge of effective practices
- to do the best for our clients, we should be aware of current research and be able to evaluate good and bad research
- Autonomy
- individuals have the right to choose freely and in an informed manner whether to participate in research "informed consent"
- Justice
- Justice implies fairness: we cannot discriminate unjustly because of sex, race, ...; however we can treat differently if such treatment is warranted by differences
- e.g., cannot deny treatment or provide different levels of treatment because of clients racial background, however, we may provide different treatments or methods of treatment if such treatments are more likely to be effective.
- these determinations are made by people with biases and prejudices -- must use caution in providing different types of treatments but still use best practices if different for different groups
- justice also applies to credit for work done (articles, presentations, ...)
- Fidelity: faithfulness
- do not engage in deception (research has often relied on deception, this is a loaded principle for research) see g2b
- if you promise to provide results -- do so
- you are obliged to stick by agreements of research with clients
Issues in research ethics
As researchers we are expected to accurately report knowledge gained from research and to prevent the misuse of such knowledge
- The researcher has the responsibility of producing valid results.
- This requires knowledge of research principles
- The researcher is ultimately responsible to see that subjects receive ethical treatment and that reported results are reliable and valid.
** discuss Lynnes dissertation. does the publisher have the ethical right to refuse to publish based on fear of how that information will be determined? **
- reporting accurate results is often challenging. who wants to hear insignificant results tendency to report partial results or stretch findings based on researchers bias.
- If you as the researcher do not follow procedures accurately and consistently you may produce invalid results
- You are obligated to provide limitations and problems with your design and analysis
- You have the responsibility to store your data for an amount of time so that others can check it out (see g3e)
- you cannot make up data or results without reporting that
Publication credit
- Contributors to research should get publication credit (see and discuss section G4)
- you must recognize and reference the works of others in your research. Direct and indirect plagiarism is prohibited
Subjects:
- Federal law has mandated that behavioral research done on humans must be reviewed by Institutional Review Boards (IRB)
- at OSU, research on humans must be reviewed and approved by the Human Subjects Committee
*** discuss forms and procedures **
- It is the researcher responsibility to provide information that increases the knowledge in the field while preserving the dignity and safety of the subjects
- must identify potentials of harm ( e.g., is filling out a test going to create anxiety or depression?)
- How do you assess and deal with the potentials for harm?
- estimate the cost/benefit of the study -- do the benefits outweigh the costs; are the potential benefits greater than the potential harms?; who benefits, the client or the society
- hard for the researcher to assess this on own
- minimize the risks -- are there other ways of designing research that would reduce potential harm?
- assess risks through pilot studies (with colleagues) or role-playing
- select subjects that are less likely to be harmed (e.g., dont select severely depressed clients for participation in a study involving criticism from others)
- ALWAYS: consult with others
Informed consent:
- researchers are obligated to get informed consent from the participants
- informed consent requires that the subjects understand their obligations and the risks associated with the study before research begins
- client must have the capacity to give consent
- children must have parents consent and children must give assent (agreement)
- clients must not be pressured to participate and may terminate participation at their discretion.
- DOCUMENT clients consent
** read section G.2. ***
Deception
- although counselors and psychologists are to "avoid deception" there is not a clear guideline on when to avoid.
- some research requires deception (e.g., story telling to kids to see if their memories changed)
- if deception is to be used you must:
1. determine if the potentials for harm
2. determine if other designs could be used
3. CONSULT with others
Confidentiality
- confidentiality vs. anonymity
- must not give information in results that could break confidentiality
- must inform client on who will be privy to confidential information
- the usual limits to confidentiality
Treatment issues
- control groups, placebos, waiting list groups
- often not needed since there is some agreement on the benefits of therapy in general
- must be sure if clients have treatment delayed that the potential for harm is minimal
- sometimes use natural wait-list groups
Experimental Designs
- An experimental design is the structure by which variables are positioned or arranged in the experiment
- Experimental design requires random assignment to treatment groups
Types of true experimental designs
Posttest only control group design
Pretest posttest control group design
Solomon four group design
Posttest only control group design
RG1 X O1
RG2 -- O2
in a more general sense where there are K treatments:
RG1 X1 O1
RG2 X2 O2
. . .
. . .
. . .
RGk Xk Ok
RGk+1 -- Ok+1
Randomly Assigned Posttest
RG1 Group 1 15Ss getting RET (X1) O1 \
RG2 Group2 15Ss getting Behavior (X2) O2 - BDI
RG3 Group3 15Ss getting No Tx (-- ) O3 /
- Positive aspects of this design
- controls threats to internal validity
- does not require pretest
- can detect differences between groups but not the amount of change
- threats to external validity
- interaction of selection an X: if treatment(s) are different is it because of the particular sample or do results generalize?
- reactivity are there treatment effects because someone is in an experiment?
Pretest posttest control group design
RG1 O1 X O2
RG2 O3 X O4
. . . .
. . . .
. . . .
RGk O2k-1 Xk O2k
RGk+1 O2k+1 X O2(k+1)
- example: if we are comparing RET therapy and Behavioral Therapy with depressed subjects randomly assign 3 groups to RET, BT, and a control group pretest all groups give treatments measure outcomes
Advantages
- controls variance on DV and gives a more powerful test if we just measure on the posttest (posttest only control group design) then the variance of the outcome measure would be composed of:
- error
- differences between treatment groups
if we use measure subjects at pretest we can define 3 sources of variance
- error
- differences between treatment groups
- differences between subject (reduces the error term)
- in analysis we divide the different sources of variance by the error variance to determine the effect
- if using a pretest we can account for any differences between subjects at the start of the experiment
*** do example on board of variance calculations **
- if subjects drop out of the experiment we can see if people who dropped out were different in each group (e.g., did severely depressed subjects drop out of the treatment group but remain in the control group)
- pretests may be used to more accurately describe subjects in the study
- allows you to determine the amount of change from the treatments (and compare this to the amount of change in the control group)
- controls threats to internal validity
Weaknesses
- pretests may sensitize subjects to treatment e.g., giving a depression inventory may make a treatment more effective because it causes the subjects to think about their disorder in a different way than if they had not been pretested. The treatment interacts with the pretest to make the treatment more effective than in the "real world" (threat to external validity)
- other threats to external validity
- interaction of selection an X: if treatment(s) are different is it because of the particular sample or do results generalize?
- reactivity are there treatment effects because someone is in an experiment?
Solomon four group design
- combines a pretest-posttest with a posttest only control group design
RG1 O1 X O2
RG2 O3 -- O4
RG3 -- X O5
RG4 -- -- O6
- this design allows the researcher to check if pretesting affects posttest scores or if pretesting interacts with the treatment
Advantages
- controls threats to internal validity
- the researcher can detect if pretesting had an interaction with treatment (compare O2 to O5) external validity threat
- since there are two experiments in one it provides a natural replication thus enhancing external validity
Weakness
- cost and time and number of subjects -- practicality
Factorial Designs
- used when we want to use more than one independent variable
- basic construction all levels of each independent variable are taken in combination with the levels of the other IVs
- need at least 2 IVs
- the minimum factorial design is a 2X2 design (2 IVs and 2 Levels of each IV)
- the number of variables = the number of digits
- the number of levels of each variable is shown by each digit
- a 2X3X5 factorial design has 3 IVs IV 1 has 2 levels, IV 2 has 3 levels, and IV 3 has 5 levels
- at least one of the variable is the experimental variable and the others may be status variables
- example:
A counselor is interested in the effectiveness of two treatments with chronically anxious subjects. The treatments are progressive relaxation and systematic desensitization. He grouped the subjects into 3 groups moderate, high, and extreme anxiety. It is possible that the 3 subject groups may respond differentially to treatment (anxiety level and treatment may interact) A 2X3 factorial design was used..
PR SD
moderate 20 20
Anxiety high 20 20
extreme 20 20
Results of posttest anxiety scores may show no interaction (parallel lines) or interaction (non-parallel lines)
- Interaction: an effect on the DV such that the effect of on IV changes over the levels of another IV
- high level interactions (more than 3) are difficult (impossible) to explain
- provides the economy of a single design rather than separate designs for each IV and allows researcher to investigate interactions
- reduces error variance and gives more power to test
- adds complexity: must look at main effects and interactions
- if added variables are unrelated to DV may decrease power of the test (however if they are unrelated, you can remove them from the model)
Dependent samples designs:
- subjects can be matched on some variable to reduce the effects of the variable
- e.g., ses may be have an effect on treatment, but we are not interested in the effects of ses on treatment we can match subjects with similar ses levels and give one a treatment and the other no treatment then test for differences between the groups
Repeated measures designs (referred to as within measure designs):
- within subject designs measure variation within a unit (e.g., person)
- subjects can be their own controls
- used when you measure the subjects more than once on a dependent variable
- a simple example is when each subject is each administered all of the (K) treatments.
S1 X1O X20
XKO
S2 X1O X20
XKO
. . . .
. . . .
. . . .
Sn X1O X20
XKO
where you have K experimental treatments and n subjects
- we may also use a RM design to measure a groups repeatedly in different times and/or situations
- example: Two treatment groups (SD & PR) split by sex. Each subject receives a treatment and is exposed to four progressively anxiety provoking situations. There anxiety level is measured during each situation (M1 - M4)
|
sex |
treatment |
M1 |
M2 |
M3 |
M4 |
|
female |
SD |
10 Ss |
randomly |
assigned |
measured 4X |
|
|
PR |
10 Ss |
randomly |
assigned |
measured 4X |
|
male |
SD |
10 Ss |
randomly |
assigned |
measured 4X |
|
|
PR |
10 Ss |
randomly |
assigned |
measured 4X |
- may or may not include additional factors in this case we have a 2X2X4 factorial with repeated measures on the experimental variable with four levels
- this is a between and within factors design (explain)
- advantages:
- can make the test more powerful (another source of variance is included in the model
- lots of data (more complicated)
Counterbalanced Designs:
- we often want to balance the order of experimental treatments rather than have them administered in the same
- if the order of treatments has some effect, this effect can be balanced
- one way to do this (Heppner refers to as crossover design) is to divide the group and reverse the order of treatments between groups
- example
RG1 O1 X1 O2 X2 O3
RG2 O4 X2 O5 X1 O6
- in our example: X1 would be PR and X2 would be SD the effects of maturation or history could be removed
- if we did not balance the designs (O1 X1 O2 X2 O3 for every subject)and found the difference between O2 and O3 to be greater than O1 and O2 we would not know if the differences were because of treatment X2 or because of history or maturation
- a counterbalanced design is a special case of a repeated measures design in which the order of administering experimental treatments is varied according to some plan
- counterbalanced designs are defined by Latin Squares.
- e.g.
1 2 3 1 2 3 4
3 1 2 4 1 2 3
2 3 1 3 4 1 2
2 3 4 1
- each of the numbers would represent the subscript in a treatment
- Latin square designs have
- equal number of rows and columns
- each number is found only once in each row and once in each column
- example: counterbalanced design with a 3X3 Latin square design
- IVs: sex, treatment (SD, PR, Xanax)
- each subject is taught a method, given a stimuli, and measured
- 6 males and 6 females
|
Sex |
Subject |
Time |
|
|
|
T1 |
T2 |
T3 |
|
MALE |
S1 |
X1 |
X2 |
X3 |
|
MALE |
S2 |
X3 |
X1 |
X2 |
|
MALE |
S3 |
X2 |
X3 |
X1 |
|
MALE |
S4 |
X1 |
X2 |
X3 |
|
MALE |
S5 |
X3 |
X1 |
X2 |
|
MALE |
S6 |
X2 |
X3 |
X1 |
|
FEMALE |
S7 |
X1 |
X2 |
X3 |
|
FEMALE |
S8 |
X3 |
X1 |
X2 |
|
FEMALE |
S9 |
X2 |
X3 |
X1 |
|
FEMALE |
S10 |
X1 |
X2 |
X3 |
|
FEMALE |
S11 |
X3 |
X1 |
X2 |
|
FEMALE |
S12 |
X2 |
X3 |
X1 |
- strengths of counterbalanced designs:
- more power
- high levels of experimental control (increased internal validity)
- can use fewer subjects
- very complex, especially as the number of IVs increases
- time consuming
- ceiling and floor effects since within subject designs measure people several times, their scores may be maximized or minimized prior to the completion of the experiment
Designs Extended in Time:
- Often we want to check the effect of time on a treatment. The treatment may take a long time to kick in or it may fade over time.
- this is done by taking observations that extend past the last treatment
- e.g. in the posttest only control group design do the effects of play therapy extend beyond 3 mos. 6 mos.?
RG1 X1 O1 O2 O3
RG2 O4 O5 O6
Interpretation Practice
RG1 O1 X1 O2
RG2 O3 X2 O4
RG3 O5 X3 O6
RG4 O7 -- O8
- Results: O1
O2, O3
O4, O5
O6, O2 = O4, but O4
O6 and O1 = O3 = O5 = O7 = 08
- O1 = O3 = O4 = O5 = O6 = O7 = O8, but O1

O2
RG1 X1 O1 O2
RG2 X2 O3 O4
RG3 O5 O6
- O1 = O3, but O3 and O1
O5, and O2 = O4 = O6
- O1
O3, and O1 and O3
O5, and O2
O4, and O2 and O4
O6, but O1 = O2, and O3 = O4, and O5 = O6
RG1 O1 X O2 O3
RG2 O4 X O5
RG3 X O6 O7
RG4 X O8
RG5 O9 O10 O11
RG6 O3
- O3
O5, and O3 and O5 
O11
Quasi-Experimental Designs
- Quasi-experimental research involves the use of intact groups of subjects in an experiment, rather than assigning subjects at random to experimental treatments.
- intact groups classrooms, mental health centers, schools, therapy groups,
- problems of validity:
- lack of random assignment possibly introduces problems with the validity of an experiment (internal and external)
- internal differential selection of subjects
- external selection bias
- when considering problems of validity of quasi-experimental research, limitations should be clearly identified, the equivalence of the groups should be discussed, and possible representativeness, and generalizability should be argued on a logical basis.
- nonequivalent: means nonequivalent in a random sense. It does not mean that it will be impossible to make a case for the similarity of groups on relevant variables or characteristics.
- 1 group posttest only design
X1 O1
lacks validity. Why? (do not know if treatment had an effect)
- posttest only nonequivalent design
G1 X1 O1
G2 X2 O2
G3 X3 O3
- * may or may not include control group*
- Heppner calls this uninterpretable I would not go that far, but this design has extensive validity problems. Why? (very difficult to attribute differences to treatment)
- one group pretest-posttest design
O1 X O2
- what are the extensive validity threats? (maturation, time,
)
better designs
- pretest posttest, nonequivalent control group design
G1 O1 X1 O2
G2 O3 X2 O4
G3 O5 O6
- Why better? Pretesting can determine if groups are similar and/or can control for differences between groups.
- you can enhance the validity of this design by adding another pretest to determine if groups are maturing at different rates
G1 O1 O2 X1 O3
G2 O4 O5 X2 O6
G3 O7 O8 O9
You can also enhance by giving treatments that are expected to cause changes in different ways
G1 O1 X+ O2
G2 O3 X- O4
- e.g., (+) attn. makes ADHD children less disruptive give (+) attn to one group and give (-) attn to the other group see if changes occurred in predicted direction
- often unethical
- enhances validity by making selection X maturation effect improbable (unlikely that two groups would mature in different directions)
Cohort treatments
- given to groups that naturally follow each other (school grades, continuous parenting groups in a regional mental health center,
)
- argument that groups are similar
- e.g., doing an annual parenting class posttest group 1 posttest group 2 after implementing a new parenting method
O1 __ __ __
X O2
- you can also pretest the groups to assure they are similar
O1 O2 __ __ __
O3 X O4
Time series:
- Time series designs involve repeated measurement of one or more intact groups, with an experimental treatment inserted between two of the measurements of at least one group.
- single group time series design (simple interrupted time series design)
G O1 O2 O3 O4 X O5 O6
possible patterns
** draw examples (wiersma 143) **
- can also include a control group
G1 O1 O2 O3 O4 X O5 O6
G2 O7 O8 O9 O10 O11 O12
- enhances internal validity by establishing a comparison group
- can also insert the treatment at random or predictable patterns throughout the time series adds further internal validation
G O1 O2 X O3 O4 X O5 O6 X O7 O8
- useful where there are periodic fluctuations
- useful when there are naturally occurring situations for multiple testing
- complex analysis
Survey & Ex Post Facto Designs
- Survey research is one of the most widely used research methods in the social sciences
- It is done in a variety of forms from "status" surveys to determine the status quo to ex post facto studies that focus on the relationship of sociological and psychological variables as they occur in natural settings
- Ex post facto means "from a thing done afterwards" -- it is research done after the fact -- research takes place after groups or conditions have been formed
Survey
- The purpose of survey research is to describe, explain, or explore phenomena (Babbie, 1979)
- Surveys are frequently seen today in the form of political polls -- (ask students to give other examples of surveys)
- Counselors may use surveys to describe or explore attitudes, opinions, frequency of occurrences, ... within a specified population
- It is descriptive in nature and is generally used to explain the relationship between variables and some phenomena
- data can be collected through interviews, phone interviews, internet, mailed forms, questionnaires.
- Surveys are basically of 2 types: longitudinal and cross-sectional
- these types are distinguished by (a) point at which data collection takes place and (b) the nature of the sample
- surveys are done with samples -- when the entire population is used we have a census
Longitudinal design: involves data collection over time (i.e., data collection at two or more time periods) and at specified points in time (can be short periods or long periods)
- trend study: a longitudinal study in which a general population is studied over time.
- random samples are taken at different time points -- different samples, but all representative of the population (e.g., political polls that determine what party is likely to win an election -- done several times leading up to the election)
- cohort studies are also longitudinal and similar to trend studies
- however they are used to measure variables over time of a specific population
- e.g., if you want to measure NY counselor attitudes towards an ethical issue over 10 years -- randomly sample population every year and survey on attitudes -- however with each sampling, only use the counselors present at the time of the original sampling
- trend and cohort studies measure change in a group over time, but because random samples are selected at each time point, any individual changes are not measured -- one does not know who is causing the changes
- Panel studies collect information on a sample of individuals at different times
- the sample is called the panel -- which should be randomly selected at the onset of the study
- panel studies look can be used to establish an ordering of patterns that is helpful in establishing cause and effect
- for example, a panel of mental health consumers are question questioned on community mental health services -- if people in treatment a long time report good services, are they reporting that because the got helped, or was their positive attitude a determinant of their success. If we can measure the same people over time we can see how perceptions change and establish an order which helps in determining cause and effect
- a major disadvantage of panel designs is attrition
- it is also very time consuming for participants and researcher
Cross-sectional designs: involve collection of data at one point in time from a random sample representing some given population at that time or from more than one sample representing two or more populations (Wiersma)
- cross-sectional designs cant measure individual change because subjects are measured only once
- c-s designs can look at the difference between defined groups between two populations -- parallel-samples design
- parallel-samples designs usually occur in c-s surveys, but can be done in longitudinal studies
- e.g., (p-s) sample three groups of mental health clients -- those that go to counselors, psychologists, and social workers -- give them the same questionnaire and compare results.
|
Design |
Population |
Sampling |
|
Longitudinal |
|
|
|
Trend |
General |
Random samples at each data-collection time |
|
Cohort |
Specific |
Random samples at each data-collection time |
|
Panel |
General or Specific |
Same random sample used throughout |
|
|
|
|
|
Cross-sectional |
General or Specific & may include |
Random samples from all populations at one timepoint |
|
|
subpopulations* |
|
* if 2 or more subpopulations studied simultaneously -- parallel-samples design
Methodology of Survey Research
- first step is to define research problem and begin developing survey design
- variables to be included in the research design must be operationally defined, and the investigator should have information about the relationships between the variables -- this information is needed for constructing the survey items
- next step -- develop sampling plan (unless you will survey entire population)
- define population
- units identified for sample selection
- the sample must be selected in such a way as to make valid inferences to the population and any subpopulations
- sampling procedures can be very complex and require much effort and resources
- preparation for data collection
- if doing interviews or questionnaires -- must construct instruments
- if using tests or inventories, it is likely there are instruments available -- may have to train observers or testers
- identify the types of data you will collect and how you will analyze it
- initial drafts of questionnaires should be tried out (pilot or trial run) on a small number of people. See problems with questionnaire.
- Collect data
- stick to sampling plan
- if using interviews -- periodically have two interviewers do the same person -- get measure of agreement (interrater reliability)
- also have one interviewer redo interview (by tape) and check internal consistency (intrarater reliability)
- analyze
- may be quantitative (statistical) or qualitative (descriptive)
Steps in Conducting a Survey
|
|
Definition of the research problem |
|
1. Planning |
Operational definition of variables |
|
|
Literature review |
|
|
Development of survey design |
|
|
Definition of population |
|
2. Development & Application |
Identification of subpopulations |
|
of Sampling Plan |
Detailed sampling procedures |
|
|
Select the sample |
|
|
Develop items or select instrument |
|
3. Construction of Interview |
Development of anticipated analysis procedures |
|
Schedule or Questionnaire |
Pilot run |
|
|
Revisions of items |
|
|
Training of interviewers, observers, or testers |
|
4. Data Collection |
Conduct interviews, administer questionnaires or tests |
|
|
Follow-ups |
|
|
Initial tabulation and coding |
|
5. Translation of Data |
Coding |
|
|
Technical preparations for analysis |
|
6. Analysis |
Separate analysis of individual items or groups of items |
|
|
Synthesis, results interpreted |
|
7. Reporting Conclusion |
|
- possible problems with surveys
- survey may be poorly designed
- nonresponse bias
- failure to provide follow-up
- failure to provide synthesis of entire analysis
Questionnaires
- much effort is required develop good items and getting people to respond
- item construction -- general guidelines
- Have items relate directly to research problem, question, or hypothesis
- Items should be clear and unambiguous. Avoid jargon or vague terminology
- Include only one concept per item (use Micheles survey for example)
- Avoid leading questions
- Avoid questions loaded with social or professional desirability
- Avoid (when possible) questions that demand delicate or personal information
- Request only information that the respondent can provide. Items should fit the informational background of the respondents
- Make reading level appropriate
- Shorter items better than long items - simple better than complex
- When requesting quantitative information, ask for specific number rather than an average (e.g., "how many times did you make yourself vomit in the last 2 weeks" rather than "on average, how many times per month do you make yourself vomit")
- Response options should be mutually exclusive and exhaustive
- Avoid negative items and never use double negative items (e.g., "which of the following symptoms do you not have")
Item format
- types of items for questionnaires (discuss advantages and disadvantages [time, likelihood of response, interpretation, freedom of response])
a. forced choice or selected-response
b. open-ended
- pilot run
- allows you to uncover deficiencies with items
- does not need to be random sample
- can be done on group that is familiar with survey topic
- suggestions can be made
- items that are unclear can be revised
- items that provide little information can be revised (e.g., having people respond to the number of years in the field [0-5, 5-10, 10+] -- if most people were in the 0-5 group, you could revise the responses to [0-2, 3-5, 6-8, 9-11, 11+])
- cover letter
- should be straight forward explaining the purpose and the potential value
- assure reader that group results are to be used and people will not be singled out
- confidentiality assured (may be anonymous - but remember it is difficult to follow-up with anonymous respondents)
- use professional letterhead
- if possible, have someone of importance sign letter
- deadline for responses
- questionnaire format
- should be attractive and easy to read
- dont make it too long
- place questions in an order that will keep respondents attention
- give clear and concise instructions
- give examples if instructions are complex
- the cover letter should set the stage for responding - initially begin survey with questions directly related to research problem
- put demographics near end
- put open-ended questions at the end of survey
- give respondent an opportunity to write open-ended responses at end even if you do not plan to use that information
- number pages and items
- give name and address of person to respond to and provide sas envelope
- increasing response rate
- generally consider 70% a minimum with professional groups
- less is tolerated with general populations
- people are more likely to respond if the time and effort is low in comparison to perceived rewards -- to increase perceived rewards, the researcher can offer these rewards (Dillman, 1978)
a. being regarded positively by others
b. expressing appreciation
c. being consulted on an issue of importance to the respondent
- monetary responses do work but they are costly
- Hopkins and Gullickson (1989) found monetary rewards equally effective for professional and non-professional populations and that enclosed gratuities were more effective than rewards after completion of survey
- attractiveness and professional look will increase response
- contacting respondents prior to mailing of questionnaire
- realistic time ques (e.g., 30 minutes to fill out)
- follow-ups necessary
- timed to arrive at the respondents addresses a few days after the deadline for return specified in the cover letter
- should be planned for in advance and more than one may be necessary
- letter should be pleasant but firm
- Jackson and Schuyler (1984) found businesslike follow ups more effective than "cute" reminders
- if possible, include new questionnaire in follow-up
- determining sources of non-response
- important to know sources because it is important to identify possible bias in collected responses
- look at demographic data - may help you determine characteristics of those who did not respond
- if non-respondents are identified as certain groups, attitudes associated with those groups may be missing - leading to biased results
interview surveys
- more time consuming
- questions again may be open ended or select-response
- questions should be the same for all respondents
- interviewers must be trained and procedures standardized
- Heppner et al. use the term passive design to describe a type of ex-post facto research (measure subjects on some characteristics and correlate) -- this comes under the definition of survey research given here
- Classification research -- use cluster or factor analysis to find out underlying constructs -- useful in exploring or confirming theories or for finding latent constructs (characteristics we can not easily observe)
N=1
- Single subject designs are often difficult to interpret because of the many sources of bias
- However, ssds are often more practical and can be designed to give us valuable information
- as counselors, you will often have the opportunity to use ssds because of the number of clients seen individually
- subjects are not usually selected on a random basis, so traditional behavioral models of ssds are considered quasi-experimental
- History in psychology:
- early scientific in the human sciences was done with single subjects (e.g., Freud, Skinner)
- many of the early (early 20th century) psychological studies based on case studies were very flawed - conclusions were based on wholly uncontrolled studies of single persons (e.g., Freuds theories based on case studies -- Little Hans ...
- as the use of statistical knowledge spread to the psychology field in the mid 20th century -- the sophistication of ssds increased
- most current research is based on group comparison designs
- however the group comparison model often fails to recognize individual differences or unique subjects
- Heppner classifies ssds into two groups
- nonbehavioral designs (case study and intensive ssd)
- behavioral designs
- we will begin by looking at traditionally behaviorally oriented designs (intra-subject designs)
- intra-subject designs examine the relationship between 2 or more variables within 1 or a few subjects
- ssd commonly involve repeated measurements and use the single-variable rule -- changing only one variable (the treatment) at a time
- this means that only 1 variable, the treatment is changed during the period in which the experimental treatment is applied
- during the traditional or baseline treatment and the experimental treatment, all other conditions (e.g., length of time or number of measurements) are kept the same
- baseline: the period of time in which the traditional treatment or normal condition is in effect -- must be stable in order to make inferences about subsequent treatments
- target behaviors: the dependent variable of the design - the behavior that is being measured to determine if change occurs
- generally, measurements are made over a period of time and different phases are compared to each other (i.e., baseline - treatment or treatment 1 - treatment 2)
Designs
- the letters A & B are used to represent conditions; A indicates the baseline condition and B indicates the experimental treatment condition -- because there are no groups used, there is no group notation
- AB design:
- subject is observed under baseline condition until the dv stabilizes
- experimental treatment is introduced and subject observed the same number of times
- interpretation based on the assumption that the observations would not have changed without the introduction of the treatment
- many threats to validity: history and maturation are main concerns to internal validity
- draw example on board (general case and example [out of seat behaviors vs relaxation treatment])
OOB
A B A
|
|
Baseline (A) |
|
|
Treatment (B) |
|
|
|
|
|
|
|
|
|
O1 |
|
O2 |
Ok+1 |
|
O2k |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
TA |
|
|
TB |
|
Ta = Tb
- ABA
- the ABA design is an extension of the AB design
- it adds another baseline condition (also called reversal or withdrawal design)
- duration and number of observations kept the same for all 3 conditions
- enhances the internal validity because the pattern of results is extended
- give example on the board (disruptive behaviors vs. relaxation)
|
|
Baseline (A) |
|
|
Treatment (B) |
|
|
Baseline (A) |
|
|
|
|
|
|
|
|
|
|
|
|
O1 |
|
O2 |
Ok+1 |
|
O2k |
O2k+1 |
|
O3k |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
TA |
|
|
TB |
|
|
TA |
|
ABAB
- treatment is administered - withdrawn - and readministered
- strengthens causal argument if treatment appears to cause an effect that is reduced when it is withdrawn and enhanced when it is administered
- if trend does not return to baseline levels - then it is highly likely that other factors were influencing change (time or maturation)
- however, some treatments should not be expected to return to baseline (e.g., do grief counseling -- if you treat the client and stop treatment, we would not expect the level of grief to return to baseline if the treatment was effective)
- Heppner states that we can have no confidence in results if they do not return to baseline -- however, while many measures are not likely to return to baseline, we can also look at trends (draw examples)
- ** give example of relaxation introduced twice **
Randomized AB designs
- Treatments phases are randomly placed in series of baseline phases
- AAABAABAAAABABABAAB
- helps detect if there is a "crossover" effect - if treatments effect subsequent baselines or if baselines effect subsequent treatments
- Heppner gives example of how to analyze with factorial ANOVA -- you do not need to know this
- Typical statistical analysis is to plot the data and observe trends
- especially when trends are not extremely evident -- the likelihood of the analyzer introducing bias is great
- statistical methods include repeated measures or time series analysis
Multiple-baseline designs
- modification of the ssds in which more than one subject, situation, or behavior is measured (or a combination of these)
- helps to establish causality (i.e., treatment is causing the changes)
- multiple baseline across behaviors (2 or more dvs)
- collect data on dvs and establish stable baseline
- apply treatment to 1 of the dvs: should have an effect on targeted dv but not the other(s)
- after treatment phase (T1 = T2) another treatment phase begins targeted at next dv
- e.g. positive reinforcement has an effect on problem behaviors
- dvs= out of seat, talking at inappropriate times
- 2 week baseline (measurement of frequency of out of seat and talking behaviors)
- 2 weeks (+) reinforcement for in seat behaviors (ignore talking)
- 2 weeks (+) reinforcement for not talking
- may designate a dv as control (never apply treatment to that dv)
Multiple-baseline across subjects
- two or more subjects treated separately -- important that subjects are independent so that treatment of one does not affect treatment of other
- e.g., relaxation exercise
- dv = misbehaviors in the class
- 2 subjects
- 2 week baseline
- 2 week treatment to S1
- 2 week treatment to S2
Multiple-baseline across situations
- same as across behaviors except that situations are changed
- .g. relaxation exercise has an effect on behavior
- dvs= problem behavior in school, problem behavior at home
- 2 week baseline (measurements at home and school)
- 2 weeks relaxation sessions in school (not home)
- 2 weeks relaxation sessions at home
- may designate a dv as control (never apply treatment to that dv)
- MB designs can easily become very complicated
- the above designs were using the AB logic although more complicated designs could be used (e.g., ABAB)
- MB designs tend to improve internal validity **why? better case for causality**
- all ssds have the benefit of being practical ways of testing new ideas and studying unique cases
- disadvantages include: extensive threats to internal validity, external validity not supported by design (must be argued logically)
Case studies
- frequently done in counseling and psychological research
- descriptive in nature
- typically, a selected case is examined and the course of a treatment (especially outcomes) is described
- case studies can become more quasi experimental depending on the amount of systematic, objective, and repeated measurements of the subject
- case studies may be helpful in
- developing a preliminary hypothesis for further explanation
- capitalizing on natural contexts
- providing in depth descriptions
- focusing on uniqueness of individuals
- disadvantages
- lacks any generalizability
- researcher bias
- example "Dibs"
Qualitative designs
- may include case studies
- historical research is typically qualitative
Ethnographic research
** note on syllabus, if ethnographic research, the first step of the qualitative proposal would be a problem statement **
- roots in anthropology -- over last couple of decades, qualitative research has become increasingly popular in psychology, counseling, and educational research
- ethnography is defined in the dictionary as "a branch of anthropology dealing with the scientific description of individual cultures"
- ethnography is on the qualitative end of the qual-quant continuum -- but this does not mean that quantitative analysis must be excluded from ethnographic research
- ethnographic research involves field research and requires contextualization -- or the interpretation of results in the context of the data collection -- because of this it lacks generalizability to people "outside" of the context.
- while the other types of research that we have looked at thus far require some kind of hypothesis prior to conducting the research -- ethnography assumes that hypotheses will emerge as the data collection occurs
- data collection is inductive vs. deductive
- the researcher attempts to set aside any preconceived notions that might bias the data interpretation
- hypotheses may emerge and be refined or replaced as data collection progresses
- ethnographic research often does not start from a strong theoretical base and is not concerned with theory testing -- theory development is based on the data -- from this comes the name grounded theory
- the researcher does not stand apart from the research - rather he or she may become intimately involved with the subjects
- data collection generally involves ethnographic observation, oral histories (monologue), qualitative interviews (open ended and interactive), archival research, or critical incident reports (subject reports of significant events) (Hoshmand, 1989)
- the steps involved in qual research do not follow strict guidelines such as in quant research. the steps are usually integrated -- several steps may be done at the same time
- Validity
- Is research understandable and reasonable?
- Do participants benefit?
- Triangulation
- "importance of findings"
- typical steps in qualitative research - 2 methods (different than book)
- identification of the phenomenon to be studied
- a statement of the phenomenon to be studied (e.g., peer interactions in racially mixed classrooms of a rural high school
- statement provides a starting point and usually implies a foreshadowed problem
- foreshadowed problems may include things such as:
- interaction among students across races
- interaction among students across sexes
- role of faculty in students social interactions
- these problems give the researcher a starting point but should not be considered restrictive - they can change as study progresses
b) identification of subjects
- usually subjects that are convenient -- any generalizability would be argued on a logical basis
- conditions and settings must be considered -- how many and which students will be observed? how long will they be observed? how often will they be observed?
- usually done on the basis of convenience, time limitations, ...
c) hypothesis generation
- is a continuing activity throughout an ethnographic study
- may be refined or changed based on data
d) data collection
- observation: the observer may be identified as a researcher or may be in a disguised role
- Wolcott (1988) distinguishes between 3 participant-observer styles
- active participant: observer assumes the role of a participant
- privileged observer: observer does not assume the role of a participant but has access to the relevant activity for the study
- limited observers: used when opportunities for observation are restricted and other data collection techniques take precedence
- all data collection need not be with the participant observer role
- other sources of information including interviews, tapes, and surveys may be used
- sources of info should be primary sources
- whatever the role - observers try to be unobtrusive and not interfere with the normal activities (star treks prime directive)
- understanding and interpreting events requires the observer to attempt to experience the thoughts, feelings, and actions of the individuals under study
- triangulation: part of data collection that cuts across 2 or more techniques or sources -- a qualitative cross-validation
- basically, it is a comparison of info to determine whether or not there is corroboration
- do different sources of info or different data collection procedures converge?
triangulation involving multiple data sources
counselors faculty
students
triangulation involving multiple data collection procedures
student observations interview counselors
look at student records
Analysis
- synthesizing information from all data sources
- statistical procedures not usually used
- heavily descriptive
Drawing conclusions
- tentative hypotheses, theories, and explanations are generated during the field-work, but researchers should not draw final conclusions prematurely
- lead to a successive approximation procedure of coming to conclusions
Grounded theory
- systematically observing a phenomena as a way of generating a theory
- five steps
1. data collection
- subjects chosen to exemplify the phenomena
- may pick a comparison group to extend generalizability
2. categorization
- process of categorizing data
- start broad and narrow down to themes
- saturation is when new data conforms to existing themes
3. memoing
- record the process to aid in constructing categories and building theory
4. parsimony
- consolidate categories with ultimate goal of 1 "core" category
5. writing the theory
- should be believable
- comprehensive
- theory tied to data
- should be applicable
Analogue Research
- Naturalistic vs. experimental approaches to research
- Counseling analogue is an experimental simulation of some aspect of the counseling process involving some manipulation of the counselor, client, or process -- miniature therapy or simplification strategy
- Experimental control - High: Generalizability - Low
- Analogues fall on a continuum from high to low resemblance to the counseling situation
- Advantages:
- Control over extraneous variables
- Allows the researcher to identify, quantify and manipulate specific levels of an investigated variable (e.g., counselor disclosure)
- Disadvantages:
- generalizability: high internal validity often leads to artificiality
Process Research
- Process Research attempts to characterize what changes occur during counseling
- Process research can involve intensive single-subject, within subjects, and between subjects designs
- Process research attempts to:
- Describe changes in the client, counselor, group, family, or interaction over time.
- Specify changes in the behavior or actions of those listed above over time.
- link one or more of these process variables to outcomes.
- What to measure
- content of session: topic or subject
- what speaking is done: by whom? words used.
- how speaking is done: non-verbals
- counselors intentions. why did you do that?
- reactions? what happened when counselor/client said/did that?
- quality? how helpful? good session?
- relationship?
5. The book has an excellent reference section on different instruments that can be used for process research.
Outcome Research
- Outcome research looks at the efficacy of counseling by comparing a treatment group to a control group or to different treatment types
- Types of treatment groups:
- no-treatment control group (does not consider the placebo effect)
- placebo control groups (often, placebo groups do not provide the "expectancies" of the treatment group)
- alternate therapy control groups (meta-analysis has found that often results of studies using alternate therapy are tied to the researchers expectations)
- Measuring change: statistical vs. practical significance
Defining Variables and Collecting Data:
- One of the main focuses of research is to establish a causal relationship between the IVs and DVs
- the selection and designing of the IVs is a crucial task in establishing these relationships
- Heppner outlines 4 concerns regarding the operationalizing Ivs
- conditions or levels of the IV: need to determine the distinct levels or categories of the variable
- adequately reflecting the constructs designated as the cause in the research question -- is the IV adequately defined and does it accurately represent what you want it to
- e.g., if you are doing a study comparing directive to non-directive therapy, do the therapy protocols accurately represent these modes
- limiting the differences between conditions -- conditions should differ only on the dimension of interest, if more differences are present the variable is confounded
- e.g., if you are doing a study comparing directive to non-directive therapy and use 2 counselors (1 for each condition) -- therapy is confounded with therapist
- sometimes it is very difficult to get rid of these confounds so you may have to
a. make a logical argument that the confound is unlikely to have an effect
b. limit the generalizability of the study (e.g., in our counseling example we may assign new counselors to a condition and provide them with equal training, but our results would only generalize to new counselors)
- you want to make sure that different condition are similar in all ways (e.g., time, setting,...) except for the difference that you are investigating
4. establishing the salience of differences in conditions -- differences between conditions of the IV must be noticeable, but not too noticeable
- the differences between conditions must be great enough that any real differences between conditions can be found. e.g., if you design a study to investigate the how stress in a counseling situation affects a clients motivation to change and you have two conditions -- normal stress and increased stress. To increase the stress in one group, you tell them in the middle of a session that "you need to work harder if you want to get better". Is this stress situation significantly different than the normal stress situation to detect differences if they really exist?
- however, if the difference is too great, you risk having the subjects guess the research hypothesis for example, if you are doing a study to rate client comfort with a counselor based on attractiveness and you send in one counselor (an attractive, well dressed counselor) and rate comfort than send in another counselor who has dirty clothes, unkempt hair, slouches and has a dirty greasy face, the subjects may guess the purpose and not respond naturally
- you can check the salience of your IVs with manipulation checks
- have the subjects tell you if they noticed differences (although you might want to use some subjects for this purpose alone)
- have independent raters experts or naïve persons
- independent raters can also be used to check if other confounding factors are being measured e.g., book example of picture of Hispanic and Anglo counselors; raters could determine if counselors equally attractive so that attractiveness does not confound ethnicity
- when using people to administer treatment, make sure adequate protocols are in place to insure treatments are the same
Interpreting Results (relating to IVs)
- If you have performed manipulation checks and found that the IV did not discriminate well, any statistically significant conclusion would be difficult to explain
- there is always the possibility of confounding variables, you just try to account for them and minimize their effect
- statistically non-significant results may be due to no actual differences between the treatment groups, inadequate statistical power, insensitive instruments, poorly done statistical tests, careless procedures, bias, and many other things
- Status variables: Variables that cant be categorized as IVs (because they cant be manipulated) but are still used to see effects on DV e.g., sex, personality, race,
- Cant be used to show causation just association
Dependent Variables
- Choosing and designing the DVs is critical
- if the dv does not accurately measure the construct you have defined as the effect, the study lacks both internal and external validity
- the first step in defining the dv is to have an adequately defined, concise research question and hypothesis
- the question should clearly state what is going to be measured as the effect(s)
Multiple Dependent Variables
- it is often desirable to use multiple dvs to measure a construct because tests usually do not represent the "whole construct" being measured
- also, experiments will often have and effect on more than one construct
- MDVs does require more complex statistical analysis
Reactivity
- You do not want the test to cause some kind of reaction in the test-taker this can influence the results of the test in a systematic way
- e.g., sending a rater into a classroom children will behave differently you can overcome by desensitizing children to the rater before taking measurements
Methods of Data Collection
- Heppner et al. categorized data collection into 7 independent groups
- self-reports the subject reports on his or her behavior or thoughts
- advantages easy to administer; simple for the test-taker to use; can access areas that would otherwise be difficult to measure; can have test-taker respond to hypothetical situations that may be impractical or unethical to test
- disadvantages measurements are vulnerable to distortion by the test-taker (bias e.g., may respond to look good, please the administrator; look bad,
); the test-taker may not be responding accurately to questions because they are unaware (e.g., subject may not report that they are depressed because they attribute their signs of depression to something else like lack of sleep)
- most often used measure in the field
- ratings of other persons and events someone rates characteristics of a person or an event
- ratings have many of the same benefits as self reports
- ratings also have the benefit of being based on some kind of expertise
- primary disadvantage is systematic bias raters are reporting their perception of a person or an event (e.g., a rater rating counseling sessions may tend to see them all as poor while another may see them as all above average)
- systematic bias is also a problem when raters know what the test is about
- training, practice, multiple raters, and statistical techniques are used to reduce or account for the systematic biases of raters
- behavioral observations trained raters record behaviors
- much more objective than self reports
- physiological indexes measuring biological responses to infer psychological states
- can be more objective than self reports
- is not a direct measure of a construct
- expensive and machine dependent
- not used much in counseling research
-
- interviews
- structured interviews vs. unstructured interviews (general themes)
- can get more depth to information
- in addition to self-reports, the interviewer can make impressions
- costly and time consuming
- unstructured interviews are difficult to standardize when doing quantitative research
- projective techniques inferring personality traits from ambiguous stimuli
- associated with psychodynamic approaches
- always a problem of bias inferring personality characteristics from preconceived notions rather than measurement
- unobtrusive measures observing subjects without their awareness that they are being measured
- nonreactive
- often difficult to interpret
- sometimes unethical (getting info on a subject without their permission)
Sampling
- meaningful research requires that subjects used for the experiment be representative of the population of which you are trying to make conclusions
- the generalizability of a study is based to a large extent on the sample you have selected
- Sampling Theory
is concerned with how we select samples that represent our population of interest
- defining the population of interest is often difficult populations are often changing and not fixed
- e.g., if you are doing an experiment on depression treatment; what group do you want to make inferences about? All depressed people who are likely to attend counseling? The number of depressed people is always changing and the reasons for depression are also changing.
- Sample a subset of the population
- Bias occurs when a sample systematically differs from the population
- e.g., if you if you are doing research on the effects of studying time on grades in college and only select seniors for the sample
- Bias when all observations in a population do not have an equal chance of being selected
- Random selection insures against bias in a sample
- a random sample is a probability sample in that every population member has a nonzero probability of selection. In a simple random sample, this probability is the same for all population members
- e.g., a population has 100 members: if everyone has an equal chance of being selected the probability that a person is selected is 1/100 or 0.01.
- can use random number tables, draw number out of hat, or use random number generators
- It is usually impossible to have access to an entire population, so this type of sampling is impractical and rarely used
- instead: we
- define a target population
- create a subject pool
- select subjects
- establish validity in the absence of random sampling
- determine the # of subjects needed
- define a target population
- create a sample pool
- select subjects
- establishing validity in the absence of random sampling
- determining the number of subjects to use
- define a target population
- define the population you want to generalize to
- decide what characteristics you want to be represented
2. create a sample pool
- from the target population determine who you have access to often we use convenience sample
- select subjects
- random selection is the best but often is not practical
- try to select subjects that can best represent the target population
- establishing validity in the absence of random sampling
- external validity must be made on a logical argument basis
- randomly assign subjects to treatment groups essential for internal validity
5. determining the number of subjects to use
- relies on power will discuss later, but read this in book
Using factorial designs to increase external validity
- by using factorial designs, we can account for differences among people, settings, and designs
- including status variables makes the analysis more sensitive to differences in treatments
**** draw example of factorial design (2X2) make a box representation and plot an interaction that shows how an interaction can be significant when a main effects (tx) is not ****
***** use example: frequency of misbehaviors in classroom
|
|
treatment A |
treatment B |
|
males |
16 |
2 |
|
females |
2 |
16 |
Xbar = 9, 9 treatment effect and interaction
|
|
treatment A |
treatment B |
|
males |
16 |
5 |
|
females |
15 |
4 |
treatment effect and gender effect
- select status variables based on theory have an understanding of what may be related to the outcome this comes from the literature and your understanding of the research problem
Specific sampling designs
- we have already discussed simple random sampling (p=1/N)
- the sampling fraction is the ratio of sample size to population size if you select a sample of 10 from a population of 100 the sampling fraction is 1/10 or 0.10
Stratified Random Sampling
- if a population is not homogenous, but is made up of many subpopulations, we can use stratified sampling to insure that we get subjects from each subpopulation
- equal allocation if there are five stratas each sample from the population would be 1/5th of the total sample
- proportional allocation if strata size differs, we can hold the proportions constant across strata:
n/N = n1/N1 = n2/N2 =
+ nk/Nk
n1 + n2 +
nk = n and N1 + N2 +
Nk = N keeping the proportions the same
- stratified sampling guards against wild samples and avoids overlapping among subpopulations
- cluster sampling: random selection of clusters from the larger population of clusters
- e.g., when you have natural clusters (city blocks) and you randomly select clusters everyone in the cluster is used
- Sampling through an intermediate unit
- like cluster sampling except that after the clusters are selected subjects from within the clusters are randomly selected
- 2 stage sampling
- the probability that someone is selected is then the product of 2 probabilities the probability that a cluster is selected and the product of being selected within a cluster
Bias
Sources of bias
- investigator
- experimenter
- subject
1. experimenter & investigator
- Is it reasonable to assume that the investigator or experimenter can be totally objective? (free from bias)
- What should the investigator/experimenter do to guard against bias?
- experimenter attributes
- avoid using single experimenters for different levels of treatment
- if using more than one experimenter, check for any differences between experimenters
- describe your experimenters so that the study can be used as part of an overall database for future research and the experiment can be reproduced
- point out limitations
- experimenter/investigator expectancies
- try to keep experimenters blind to conditions (healing touch example and keeping surveyers blind to purpose of survey)
- assess how well experimenters have guessed the hypothesis of the study
- experimental procedures
- keep procedures consistent and well structured
- standardize procedures as best as possible (e.g., using counselor videos shown to subjects to insure same viewing experience)
- carefully train experimenters
- monitor experimenters throughout study
2. subject bias
- demand characteristics: cues in the research protocol that may make subjects respond in a certain way
- e.g., "the effects of alcohol abuse disorders on shame and guilt" -- title may make people respond in a certain way to alcohol abuse survey
- self-presentation: subjects may respond in some way to avoid being seen in a negative way
- motivation level: subjects may lack motivation to participate and only participate half-heartedly
- intellectual skills: not adequate intellectual skills such as reading
- psychological defenses: guarded or untrue responses based on real or perceived threat
- keep subjects blind to purpose if possible
- assure subjects that there are no right or wrong answers
- ensure confidentiality
- increase motivation with a reward (e.g., $, study results, an explanation of how the study will benefit the subject)
- use spot check items (e.g., inserting a question that tells subject to mark "strongly agree"
- evaluate reading level of subjects and tests