Skip to main content

Research foundation

Growing Standard is built on 80+ peer-reviewed studies spanning assessment design, literacy, mathematics pedagogy, motivation, and cognitive science. Every major feature traces to published research. Empirical calibration and external validation are in progress pending pilot data — what follows is the research informing our design, not evidence about product outcomes.

Adaptive assessment

Adaptive Placement Flow
Round 1
Round 2
Round 3
4/5 correct
Escalate UP
2-3/5 correct
Place HERE
0-1/5 correct
Test DOWN
The research finding
Per-domain adaptive testing places students faster and more accurately than fixed-form tests, especially at the tails of the distribution. Content blueprinting ensures balanced coverage across substandards.
How we apply it
Math assessment routes 4 domains independently through graduated difficulty phases (4 strata from easy to hard), with substandard round-robin ensuring every content area is sampled. SEM-based early stopping ends testing when measurement confidence is high enough. Confidence-weighted compositing means domains measured more precisely count more in final placement. Rapid-guess filtering excludes suspiciously fast correct answers (<2s). Anchor items link test forms across windows for future IRT calibration. Aligned to Babcock & Weiss (2012) CAT stopping rules, Haberman (2008) Laplace-smoothed scoring, and Kingsbury & Zara (1989) content balancing.
Citations
  • Babcock, B. & Weiss, D. J. (2012). Termination criteria in computerized adaptive tests. Journal of Computerized Adaptive Testing.
  • Haberman, S. J. (2008). Clustering for IRT-scored tests. ETS Research Report.
  • Weiss, D. J. (2004). Computerized adaptive testing for effective and efficient measurement. Measurement and Evaluation in Counseling and Development.
  • Kingsbury, G. G. & Zara, A. R. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education.
  • Thompson, N. A. & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research & Evaluation.

Science of Reading

The research finding
Explicit instruction across the National Reading Panel's Big Five (phonemic awareness, phonics, fluency, vocabulary, comprehension) produces the strongest reading outcomes.
How we apply it
Reading assessment tests 16 skill areas across the Big Five. Per-skill proficiency is reported separately, so teachers can target interventions rather than relying on a single composite score. Passages are calibrated to F&P A-Z, Lexile, DRA, and Grade Equivalent.
Citations
  • National Reading Panel. (2000). Teaching Children to Read. National Institute of Child Health and Human Development.
  • Ehri, L. C. (2005). Learning to read words: Theory, findings, and issues. Scientific Studies of Reading.
  • Scarborough, H. S. (2001). Connecting early language and literacy to later reading disabilities. Handbook of Early Literacy Research.

Concrete–Representational–Abstract (CRA)

CRA Progression
Concrete
Ten frames
3/4
Representational
Fraction bars
3/4 + 1/4
Abstract
Symbolic
The research finding
Math concepts are learned most durably when students move from physical/visual models to representations to symbolic abstraction.
How we apply it
K-8 tiers use concrete manipulatives (ten frames, base-10 blocks, fraction bars, coins, tape diagrams) before introducing symbolic forms. A dedicated Tools shelf gives students 20+ interactive manipulatives — including algebra tiles, pattern blocks, geoboards, protractors, and array builders — available for free exploration alongside guided practice. Tier labels indicate the CRA stage so teachers can sequence instruction.
Citations
  • Bruner, J. S. (1966). Toward a Theory of Instruction. Harvard University Press.
  • Witzel, B. S., Mercer, C. D., & Miller, M. D. (2003). Teaching algebra to students with learning difficulties: An investigation of an explicit instruction model. Learning Disabilities Research & Practice.

Spaced repetition

The research finding
Expanding review intervals produce 2–3× retention compared to massed practice, with the biggest gains for long-term recall.
How we apply it
Our review queue derives SM-2 ease factors from each student's score history. Intervals start at 1 day, expand to 6 days, then scale by ease factor — capped at 60 days. A 15% trajectory bonus extends intervals when recent scores improve.
Citations
  • Ebbinghaus, H. (1885). Memory: A Contribution to Experimental Psychology.
  • Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks. Psychological Bulletin.
  • Wozniak, P. A. (1990). Optimization of learning. SuperMemo.

Growth mindset & try-first learning

Feedback Language
Process-focused
“Your strategy paid off — you broke the problem into smaller steps.”
Trait-focused
“You’re so smart!”
Try-first hints:First wrong answer lights up the hint bulb — student chooses when to use it.
The research finding
Students who believe ability is malleable and receive process-focused feedback outperform matched peers on transfer tasks. A try-first approach where wrong answers unlock hints rather than immediate penalties builds persistence.
How we apply it
Feedback language is process-focused ('Your strategy paid off') rather than trait-focused ('You're smart'). First wrong answers unlock a hint bulb instead of penalizing — students choose when to use hints, building self-regulation. Spaced review resurfaces past mistakes for retrieval practice.
Citations
  • Dweck, C. S. (2006). Mindset: The New Psychology of Success. Random House.
  • Yeager, D. S. et al. (2019). A national experiment reveals where a growth mindset improves achievement. Nature.

Passage-based reading assessment

The research finding
Reading comprehension is best measured with passage-based testlets that adapt at the passage level, not individual items. Passages must meet research-calibrated length and complexity standards, with text-dependent questions spanning multiple Depth of Knowledge levels.
How we apply it
Assessment passages are calibrated to Smarter Balanced word-count ranges by grade (G3: 300-450 words, G6: 650-850, G9-12: 800-1200). Each passage has 12-16 authored questions; the system selects 6-8 per sitting stratified by skill type — giving two equivalent test forms per passage and doubling the effective item pool. Passages are tagged by within-grade difficulty (below/at/above) for adaptive routing: easier passages in early phases, harder in escalation phases. Questions are tagged by DOK level (20% recall, 55% inference, 25% analysis) and by reading skill. Anchor items link test forms across testing windows for longitudinal calibration. Every question must fail the text-dependence test: if a student can answer without the passage, the question is rewritten.
Citations
  • Smarter Balanced Assessment Consortium. (2024). ELA/Literacy Stimulus Specifications.
  • Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet Response Theory and Its Applications. Cambridge University Press.
  • Hess, K. K. (2008). Depth of Knowledge Framework for Reading. National Center for Assessment.
  • Fisher, D. & Frey, N. (2012). Text-Dependent Questions. ASCD.

Item design & diagnostic distractors

The research finding
Multiple-choice items with diagnostic distractors — wrong answers that target specific misconceptions — yield richer information than items with merely plausible options. Three well-designed options are psychometrically equivalent to four or five.
How we apply it
Every wrong answer targets a documented error pattern: wrong-paragraph confusion, partial comprehension, background-knowledge substitution, or overgeneralization. Items follow Haladyna, Downing & Rodriguez (2002) validated taxonomy of 31 item-writing guidelines, including stem clarity, option homogeneity, and randomized correct-answer position.
Citations
  • Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines. Applied Measurement in Education.
  • Gierl, M. J., Bulut, O., Guo, Q., & Zhang, X. (2017). Developing, analyzing, and using distractors for multiple-choice tests. Review of Educational Research.
  • Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items. Educational Measurement: Issues and Practice.

Formative feedback & productive failure

The research finding
Students who attempt problems before receiving instruction outperform those taught first — but only when scaffolded feedback follows the struggle. Assessment must remain unscaffolded to preserve measurement validity.
How we apply it
Practice uses a try-first hint system: first wrong answer triggers a retry (no penalty), second wrong unlocks progressive hints, with the full explanation on the final step. Assessment passages provide no hints, no vocabulary definitions, and no corrective feedback — measuring what students know independently. This dual approach follows Shute (2008) and Kapur (2016).
Citations
  • Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research.
  • Kapur, M. (2016). Examining productive failure, productive success, and constructive failure. Cognition and Instruction.
  • Roediger, H. L. & Karpicke, J. D. (2006). Test-enhanced learning. Psychological Science.

Test-taking behavior & rush detection

The research finding
Students who answer too quickly exhibit a distinct response pattern that, when flagged, allows teachers to re-administer items and improve validity. Reading items require longer thresholds than math because of passage processing time.
How we apply it
Math items use grade-adjusted thresholds (K-2: 4s, 3-5: 3.5s, 6-8: 3s, 9-12: 2.5s). Reading items use a passage-aware formula: base threshold plus passage word count divided by grade-appropriate reading speed. Students flagged at >30% overall or >40% per skill get a rushed indicator in the teacher dashboard.
Citations
  • Wise, S. L. & Kong, X. (2005). Response time effort: A new measure of examinee motivation. Applied Measurement in Education.

Engagement & extrinsic-to-intrinsic motivation

The research finding
External rewards that feed into student identity (collections, customization) build voluntary practice habits that persist after rewards are removed.
How we apply it
Cosmetic items, companion customization, and pet adoption are the primary reward loop. Stars earned through practice are the only currency. No pay-to-win advantages exist.
Citations
  • Deci, E. L. & Ryan, R. M. (2000). The 'what' and 'why' of goal pursuits: Human needs and the self-determination of behavior. Psychological Inquiry.
  • Ryan, R. M., Rigby, C. S., & Przybylski, A. (2006). The motivational pull of video games: A self-determination theory approach. Motivation and Emotion.
Full evidence document. The complete 81-citation Evidence-Based Learning document is available to schools and districts on request. Email partnerships@growingstandard.com.