About the Strength of Evidence

We evaluate the strength of evidence for analytical results presented in studies. We primarily use the Maryland Scientific Method Scale (SMS) for evidence strength evaluation criteria. Below, we provide explanations based on the What Works Center for Local Growth (https://whatworksgrowth.org/resources/the-scientific-maryland-scale/).

Additionally, analyses that do not use rigorous causal inference methods based on experimental or quasi-experimental approaches are classified as "Level 0."

Level 1

Comparison between (a) intervention and non-intervention groups, or (b) comparison of intervention groups before and after intervention. Control variables are used to adjust for differences between intervention and non-intervention groups.

Level 2

Comparison between (a) intervention and non-intervention groups, or (b) comparison where intervention and non-intervention groups are partially but not completely aligned. Control variables or matching methods are used. At the macro level, control variables are used to control for baseline characteristics.

Level 3

Provides comparison of intervention group's pre-intervention outcomes with post-intervention outcomes, as well as comparison with outcomes of non-intervention groups (e.g., difference-in-differences or regression discontinuity). When using methods that compare before and after intervention periods, results are presented separately for intervention and non-intervention groups. Additionally, important baseline characteristics are measured and controlled for through propensity score matching, though fundamental differences may still exist.

Level 4

Interventions are conducted randomly, and differences in outcomes between intervention and non-intervention groups due to the presence or absence of intervention are examined. This should ideally involve operationalization in intervention design or baseline heterogeneity in intervention implementation timing. Measured variables should be isolated as much as possible.

Level 5

Experimental design involving randomized allocation to intervention and non-intervention groups, specifically Randomized Controlled Trials (RCT). The allocation ratio for intervention and non-intervention groups should be approximately 50-50% to examine the degree of contamination through control variable usage. This measurement should be conducted using variables that can represent appropriate differences. Control variables should be used to examine contamination from the perspective of intervention target subjects (when saying "appropriate contamination"), with baseline variables being appropriately differentiated where possible, and statistical adjustment or sampling-based post-stratification considered when necessary.

Level 0

Rather than experimental or quasi-experimental approaches, analyses based on mathematical models that combine empirical data with statistics.