Changing Criterion Design: How to Demonstrate Experimental Control with Unstable Data
For candidates navigating the complex landscape of Domain D (Experimental Design) in the 6th Edition Test Content Outline, the Changing Criterion Design represents one of the most powerful yet frequently misunderstood single-subject methodologies. Unlike reversal designs that require withdrawing treatment to prove control, or multiple baseline designs that demand staggered introduction across behaviors/settings, the changing criterion design demonstrates experimental control through gradual performance shifts within a continuous intervention phase. Mastering this design is vital for avoiding high-error exam traps where item writers present scenarios involving ethically sensitive or irreversible target behaviors requiring precise, stepwise validation.
The core tenet of the changing criterion design is deceptively simple but requires rigorous application: Behavior tracks criterion. Specifically, the participant’s performance must systematically change to match each new, pre-specified performance criterion introduced during the intervention phase. This creates a functional relationship between the independent variable (the criterion shift) and the dependent variable (behavioral output) without ever removing the intervention. Understanding this directional tracking is essential for building elite clinical discrimination skills and designing interventions that promote meaningful behavioral change while maintaining ethical integrity.
The Directional Framework
Unlike abstract theoretical definitions that exist only in textbooks, the changing criterion design has distinct operational signatures in clinical data. The critical validity indicator is not merely improvement, but precise correspondence between behavior and criterion levels across multiple phases.
Key Discrimination Features:
- Criterion-Behavior Correspondence: In valid changing criterion designs, behavioral data points cluster tightly around each new criterion line. Systematic deviation indicates either poor criterion selection or lack of experimental control.
- Bidirectional Phase Shifts: The gold standard for demonstrating control involves reverting to a previously established criterion level. If behavior returns to the prior performance level when the criterion reverts, this dynamically falsifies maturation or history threats.
- Variable Phase Lengths: Using inconsistent numbers of sessions per criterion phase prevents participants from predicting shifts based on temporal patterns rather than actual criterion changes.
Real-Life Applied Examples
Understanding this design requires moving beyond textbook graphs into complex human environments. Here are three distinct clinical scenarios demonstrating valid changing criterion applications:
Academic Skill Acquisition
- Reading Fluency Building (Educational Setting): A 3rd-grade student reads at 45 words correct per minute (WCPM) during baseline. The teacher implements a reading fluency intervention with an initial criterion of 50 WCPM. After three stable sessions at 50+, the criterion increases to 55 WCPM. Following stability at 55, the criterion shifts to 60 WCPM. Crucially, after two sessions at 60, the teacher reverts the criterion back to 55 WCPM for two sessions before increasing again to 65 WCPM. The student’s WCPM tracks each shift precisely: dropping to 55 when the criterion reverted, then climbing to 65 when it increased. This bidirectional shift provides irrefutable evidence that the intervention—not seasonal maturation or test familiarity—is driving gains. Note: Without the revert phase, critics could argue the student would have improved anyway due to natural development over the 8-week period.
Health Behavior Modification
- Daily Step Count Increase (Community Health): An adult client with obesity averages 3,000 steps/day during baseline. A wearable device + coaching intervention begins with a criterion of 4,000 steps/day. After five stable days, the criterion increases to 5,000 steps/day. Following stability, the criterion jumps to 7,000 steps/day—but the client consistently achieves only 5,800 steps. Recognizing poor correspondence, the analyst lowers the criterion back to 5,500 steps/day for four sessions. The client’s step count immediately aligns with 5,500. Only after stability does the criterion increase again to 6,500 steps/day. This responsive adjustment demonstrates both experimental control AND clinical responsiveness. Rigidly maintaining an unattainable 7,000-step criterion would have produced false failure data and damaged therapeutic rapport.
Vocational Performance Shaping
- Assembly Line Production Rate (Employment Skills): A vocational trainee assembles 12 units/hour during baseline. Job coaching begins with a criterion of 15 units/hour. After stability, the criterion increases to 18 units/hour. However, production drops to 14 units/hour—below even baseline. The supervisor recognizes the jump was too aggressive and reverts the criterion to 14 units/hour (slightly above baseline) for three sessions. Production stabilizes at 14. The criterion then increases incrementally to 16, then 18 units/hour, with behavior tracking each shift. This example highlights a critical exam trap: changing criterion designs can demonstrate lack of control if criteria are poorly calibrated. Valid designs require criteria that are simultaneously challenging AND achievable.
Clinical Implications & Exam Traps
Misapplying the changing criterion design leads to flawed validity claims. If you fail to include bidirectional shifts, you cannot rule out confounding variables. Conversely, setting criteria that are too ambitious produces apparent “failure” that actually reflects poor design, not ineffective intervention.
Critical Exam Distinctions:
- Correspondence ≠Improvement: Behavior improving steadily does NOT prove experimental control in this design. Behavior must track specific criterion levels, including decreases when criteria decrease. Steady linear improvement suggests maturation, not criterion control.
- Revert Phases Are Non-Negotiable: On the BCBA exam, any changing criterion scenario lacking at least one bidirectional shift should be flagged as having weak internal validity. Reverts are the primary mechanism for ruling out history/maturation threats.
- Criterion Magnitude Matters: Criteria must represent meaningful but achievable shifts. Too-small shifts produce ceiling effects; too-large shifts produce floor effects. Both mask true experimental control.
To deepen your understanding of how these validity mechanisms interact with measurement systems, consider how criterion shifts can influence discontinuous measurement procedures overestimation underestimation artifacts. If criteria change mid-interval during partial interval recording, data may misrepresent true correspondence. Furthermore, when designing assessments, analysts must ensure they are not inadvertently creating surrogate CMO examples in real life through accidental environmental correlations that distort natural criterion-tracking behavior.
Mastering this design prevents common item writer traps. Remember: Changing Criterion = Behavior Tracks Specific Levels + Bidirectional Shifts Prove Control. By applying this logical framework, you can accurately evaluate experimental validity in complex clinical scenarios. For further practice on how these operations shift behavioral momentum, review our deep dive on behavioral momentum vs high probability request sequence to see how motivational states interact with response persistence across varied conditions.
🧠Day 14 Interactive Challenge Block
Question 1: A researcher uses a changing criterion design to increase a child’s homework completion time. Baseline averages 10 minutes. Intervention criteria progress: 15 min → 20 min → 25 min → 30 min. The child’s completion time increases steadily from 10 to 30 minutes across all phases, perfectly matching each criterion. However, no criterion was ever decreased. What is the PRIMARY threat to internal validity in this design?
A) Carryover effects from previous criteria
B) Maturation or history confounds cannot be ruled out
C) Insufficient number of data points per phase
C) Insufficient number of data points per phase
D) Lack of baseline stability
Question 2: During a changing criterion intervention to reduce vocal stereotypy, the criterion shifts from 20 occurrences/session to 15 occurrences/session. The participant’s behavior decreases to 14 occurrences/session. The next criterion increases to 18 occurrences/session, and behavior rises to 17 occurrences/session. Which validity indicator is demonstrated?
A) Poor criterion-behavior correspondence
B) Bidirectional experimental control
C) Ceiling effect masking true performance
D) Response generalization across settings
Question 3: Why is variable phase length critical in changing criterion designs?
A) It prevents participants from predicting criterion shifts based on temporal patterns
B) It ensures equal statistical power across all criterion levels
C) It allows for more data collection during difficult criterion phases
D) It satisfies IRB requirements for minimum session duration