Bayes factors, p values, and the replication crisis

class: left, bottom, my-title, title-slide

.title[
# Bayes factors, p values, and the replication crisis
]
.author[
### Richard D. Morey
]
.date[
### 22 September 2022
]

---

## Roadmap

* Demonstrate what Bayes factors are
* Show Bayes factors can show strong evidence for false (null) hypotheses
* Show Bayes factors show strong evidence null, for suspicious null results

### Data setup

* Two groups of size N
* Normal populations
* Equal standard deviations
* Effect size: `$\delta$` (diff. in means, in std. dev. units)
* Evidence: Student's `$t$` statistic

---

## Bayes factors in use

.pull-left[
<img src='img/Rouder_etal_2009.png' class='paperimg'/><br/>
Paper+software: over 4500 citations
]

.pull-right[
<img src='img/bayesfactor.png' width='40%;float:left;'/>
<img src='img/spss.png' width='40%;float:right;'/>
<div style='height: 50px;'></div>
<img src='img/jasp.svg' width='40%;float:left;'/>
<img src='img/jamovi.svg' width='40%;float:right;'/>
]

.footnote[
* Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t-tests for accepting and rejecting the null hypothesis. *Psychonomic Bulletin and Review, 16, 225–237*.

* Morey, R. D., Rouder, J. N., Jamil, T., Urbanek, S., Forner, K., & Ly, A. (2022). BayesFactor: Computation of Bayes Factors for Common Designs (0.9.12-4.4). https://CRAN.R-project.org/package=BayesFactor

]

---

count: false
## A start: Likelihood ratios
<img src="index_files/figure-html/tdist1_user_01_output-1.svg" width="80%" style="display: block; margin: auto;" />

---
count: false
## A start: Likelihood ratios
<img src="index_files/figure-html/tdist1_user_02_output-1.svg" width="80%" style="display: block; margin: auto;" />

---

count: false
## Effect of choosing different alternatives
<img src="index_files/figure-html/lr1_user_01_output-1.svg" width="80%" style="display: block; margin: auto;" />

---
count: false
## Effect of choosing different alternatives
<img src="index_files/figure-html/lr1_user_02_output-1.svg" width="80%" style="display: block; margin: auto;" />

---
count: false
## Effect of choosing different alternatives
<img src="index_files/figure-html/lr1_user_03_output-1.svg" width="80%" style="display: block; margin: auto;" />

---

## Likelihood ratios

.pull-left[

### Frequentist

* Use likelihood ratios as *test statistics*
* Justified through Neyman-Pearson lemma (1933) and generalizations
* Interpretation is through *error rates* of tests
]

.pull-right[

### Bayesian

* Use likelihood ratios as measure of evidence
* Justified through Bayes' theorem
* Interpretation is direct

]

---

## Likelihood ratio to Bayes factor

Probability of data is taken *averaged over the prior `$\pi$` and data model `$p$`*:

`$$p_{\cal M}(\boldsymbol y) = \int_\Theta p_{\cal M}(\boldsymbol y\mid \boldsymbol\theta)\pi_
\cal M(\boldsymbol\theta) d\boldsymbol\theta$$`

`$$BF_{10} = \frac{p_{\cal M_1}(\boldsymbol y)}{p_{\cal M_0}(\boldsymbol y)}$$`
Justified through Bayes' theorem:

`$$\frac{p(\cal M_1\mid\boldsymbol y)}{p(\cal M_0\mid\boldsymbol y)} = \frac{p_{\cal M_1}(\boldsymbol y)}{p_{\cal M_0}(\boldsymbol y)}\times\frac{p(\cal M_1)}{p(\cal M_0)}$$`
---

## Why are Bayes factors advocated?

* Direct interpretation as statistical evidence: convincingness
* Ability to show evidence for point nulls: evidence for regularity
* Applicability beyond nested models
* Model selection consistency in large sample limit
* Perception that *p* values "overstate" evidence
* Insensitivity to stopping rules

Bayes factors have been suggested as a replacement for *p* values.

---

## Two main properties of Bayes factors

.pull-left[

### Marginal

Bayes factors compute the probability of data *by averaging over the prior*.

]

.pull-right[

### Comparative

Bayes factors compare *two models* at a time.

]

<br/>

.pull-left[

**Criticism**: Averaging over a prior can lead to high error rates for some elements of a hypothesis.

Lack of *severity* (Mayo, 2018): High probability of claiming evidence for a false hypothesis.

]

.pull-right[

**Criticism**: Both models may be obviously terrible.

High probability under one hypothesis may be suspicious (too good), but BFs offer no account of this.

]

---

### "...but what's a good Bayes factor?"

<div id="rqvceunmzd" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>html {
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
}

#rqvceunmzd .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#rqvceunmzd .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#rqvceunmzd .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#rqvceunmzd .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#rqvceunmzd .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#rqvceunmzd .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#rqvceunmzd .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#rqvceunmzd .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#rqvceunmzd .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#rqvceunmzd .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#rqvceunmzd .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#rqvceunmzd .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

#rqvceunmzd .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#rqvceunmzd .gt_from_md > :first-child {
  margin-top: 0;
}

#rqvceunmzd .gt_from_md > :last-child {
  margin-bottom: 0;
}

#rqvceunmzd .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#rqvceunmzd .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#rqvceunmzd .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#rqvceunmzd .gt_row_group_first td {
  border-top-width: 2px;
}

#rqvceunmzd .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#rqvceunmzd .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#rqvceunmzd .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#rqvceunmzd .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#rqvceunmzd .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#rqvceunmzd .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#rqvceunmzd .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#rqvceunmzd .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#rqvceunmzd .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#rqvceunmzd .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-left: 4px;
  padding-right: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#rqvceunmzd .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#rqvceunmzd .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#rqvceunmzd .gt_left {
  text-align: left;
}

#rqvceunmzd .gt_center {
  text-align: center;
}

#rqvceunmzd .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#rqvceunmzd .gt_font_normal {
  font-weight: normal;
}

#rqvceunmzd .gt_font_bold {
  font-weight: bold;
}

#rqvceunmzd .gt_font_italic {
  font-style: italic;
}

#rqvceunmzd .gt_super {
  font-size: 65%;
}

#rqvceunmzd .gt_footnote_marks {
  font-style: italic;
  font-weight: normal;
  font-size: 75%;
  vertical-align: 0.4em;
}

#rqvceunmzd .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#rqvceunmzd .gt_indent_1 {
  text-indent: 5px;
}

#rqvceunmzd .gt_indent_2 {
  text-indent: 10px;
}

#rqvceunmzd .gt_indent_3 {
  text-indent: 15px;
}

#rqvceunmzd .gt_indent_4 {
  text-indent: 20px;
}

#rqvceunmzd .gt_indent_5 {
  text-indent: 25px;
}
</style>
<table class="gt_table">
  <thead class="gt_header">
    <tr>
      <td colspan="4" class="gt_heading gt_title gt_font_normal" style>Common interpretation of the magnitude of Bayes factors</td>
    </tr>
    <tr>
      <td colspan="4" class="gt_heading gt_subtitle gt_font_normal gt_bottom_border" style>For the numerator model &#8499;<sub>1</sub></td>
    </tr>
  </thead>
  <thead class="gt_col_headings">
    <tr>
      <th class="gt_center gt_columns_top_border gt_column_spanner_outer" rowspan="1" colspan="3" scope="colgroup">
        <span class="gt_column_spanner">BF range</span>
      </th>
      <th class="gt_center gt_columns_top_border gt_column_spanner_outer" rowspan="1" colspan="1" scope="col">
        <span class="gt_column_spanner">Interpretation</span>
      </th>
    </tr>
    <tr>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col"></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col"></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col"></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1" scope="col"></th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr><td class="gt_row gt_right">0</td>
<td class="gt_row gt_right">-</td>
<td class="gt_row gt_right">1/3.16</td>
<td class="gt_row gt_left">(Supports &#8499;<sub>0</sub>)</td></tr>
    <tr><td class="gt_row gt_right">1/3.16</td>
<td class="gt_row gt_right">-</td>
<td class="gt_row gt_right">1</td>
<td class="gt_row gt_left">Barely worth mentioning (for &#8499;<sub>0</sub>)</td></tr>
    <tr><td class="gt_row gt_right">1</td>
<td class="gt_row gt_right">-</td>
<td class="gt_row gt_right">3.16</td>
<td class="gt_row gt_left">Barely worth mentioning</td></tr>
    <tr><td class="gt_row gt_right">3.16</td>
<td class="gt_row gt_right">-</td>
<td class="gt_row gt_right">10</td>
<td class="gt_row gt_left">Substantial<sup class="gt_footnote_marks">1</sup></td></tr>
    <tr><td class="gt_row gt_right">10</td>
<td class="gt_row gt_right">-</td>
<td class="gt_row gt_right">31.62</td>
<td class="gt_row gt_left">Strong</td></tr>
    <tr><td class="gt_row gt_right">31.62</td>
<td class="gt_row gt_right">-</td>
<td class="gt_row gt_right">100</td>
<td class="gt_row gt_left">Very strong</td></tr>
    <tr><td class="gt_row gt_right">100</td>
<td class="gt_row gt_right">-</td>
<td class="gt_row gt_right">∞</td>
<td class="gt_row gt_left">Decisive</td></tr>
  </tbody>
  
  <tfoot class="gt_footnotes">
    <tr>
      <td class="gt_footnote" colspan="4"><sup class="gt_footnote_marks">1</sup> A Bayes factor of about 3 is often claimed to be calibrated to <i>p=0.05</i>. See Benjamin et al (2018). "Redefine statistical significance."</td>
    </tr>
  </tfoot>
</table>
</div>

.footnote[
Jeffreys, H. (1961). *Theory of probability (3rd edition).* Oxford University Press.
]

---

count: false
## Data predictions: p(y|δ)
<img src="index_files/figure-html/datapred1_user_01_output-1.svg" width="80%" style="display: block; margin: auto;" />

---
count: false
## Data predictions: p(y|δ)
<img src="index_files/figure-html/datapred1_user_02_output-1.svg" width="80%" style="display: block; margin: auto;" />

---
count: false
## Data predictions: p(y|δ)
<img src="index_files/figure-html/datapred1_user_03_output-1.svg" width="80%" style="display: block; margin: auto;" />

---

count: false
## Prior distributions: 𝜋(δ)
<img src="index_files/figure-html/prior1_user_01_output-1.svg" width="80%" style="display: block; margin: auto;" />

---
count: false
## Prior distributions: 𝜋(δ)
<img src="index_files/figure-html/prior1_user_02_output-1.svg" width="80%" style="display: block; margin: auto;" />

---

## Weighted predictions: p(y|δ)𝜋(δ)

---

## One alternative prior predictive

---

## How N affects the predictions

---

count: false
## Some possible priors
<img src="index_files/figure-html/allpriors_user_01_output-1.svg" width="80%" style="display: block; margin: auto;" />

---
count: false
## Some possible priors
<img src="index_files/figure-html/allpriors_user_02_output-1.svg" width="80%" style="display: block; margin: auto;" />

---
count: false
## Some possible priors
<img src="index_files/figure-html/allpriors_user_03_output-1.svg" width="80%" style="display: block; margin: auto;" />

---

count: false
## Null and alternative prior predictives, 𝜋(y)
<img src="index_files/figure-html/priorpred1_user_01_output-1.svg" width="80%" style="display: block; margin: auto;" />

---
count: false
## Null and alternative prior predictives, 𝜋(y)
<img src="index_files/figure-html/priorpred1_user_02_output-1.svg" width="80%" style="display: block; margin: auto;" />

---
count: false
## Null and alternative prior predictives, 𝜋(y)
<img src="index_files/figure-html/priorpred1_user_03_output-1.svg" width="80%" style="display: block; margin: auto;" />

---
count: false
## Null and alternative prior predictives, 𝜋(y)
<img src="index_files/figure-html/priorpred1_user_04_output-1.svg" width="80%" style="display: block; margin: auto;" />

---

## New data setup

* Four groups of equal size `$N$`
* Effect size: `$\omega^2$`, "proportion of non-error variance"
* Evidence: `$F$` statistic (one-way ANOVA)
* Otherwise, the same as previous setup

.footnote[
Rouder, J. N., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default Bayes factors for ANOVA designs. *Journal of Mathematical Psychology, 56, 356–374.*
]

---

## Effect size ω²

---

count: false
## Evidence for the null
<img src="index_files/figure-html/evfromt_user_01_output-1.svg" width="100%" style="display: block; margin: auto;" />

---
count: false
## Evidence for the null
<img src="index_files/figure-html/evfromt_user_02_output-1.svg" width="100%" style="display: block; margin: auto;" />

---
count: false
## Evidence for the null
<img src="index_files/figure-html/evfromt_user_03_output-1.svg" width="100%" style="display: block; margin: auto;" />

---
count: false
## Evidence for the null
<img src="index_files/figure-html/evfromt_user_04_output-1.svg" width="100%" style="display: block; margin: auto;" />

---

count: false
## Error rates 'accepting' the null
<img src="index_files/figure-html/errrates1_user_01_output-1.svg" width="100%" style="display: block; margin: auto;" />

---
count: false
## Error rates 'accepting' the null
<img src="index_files/figure-html/errrates1_user_02_output-1.svg" width="100%" style="display: block; margin: auto;" />

---
count: false
## Error rates 'accepting' the null
<img src="index_files/figure-html/errrates1_user_03_output-1.svg" width="100%" style="display: block; margin: auto;" />

---
count: false
## Error rates 'accepting' the null
<img src="index_files/figure-html/errrates1_user_04_output-1.svg" width="100%" style="display: block; margin: auto;" />

---

## The result of marginalization

* Probability of data is *averaged over* large effect sizes
* Small observed effect sizes "look" null

.pull-left[

## Frequentist

Protect from these high error rates by maximizing, not averaging.

`$$p = \max_{\theta\in\Theta_0} Pr(\text{Extreme test statistic};\theta)$$`

*Guarantees* error rates are controlled for *every* parameter value, not just on average.

]

.pull-right[

## Bayesian

* Defend need for priors
* Defend conservatism in general (this kind of error is "not so bad")
* Appeal to parsimony

]

---

## "Too good to be true" nulls?

When results are *too* null, people start to get suspicious.

> "If it were just a question of having hit the bull's eye with a single shot we might conclude...that Mendel was simply lucky, but when a whole succession of shots comes close to the bull's eye we are entitled to invoke skill or some other factor." (Edwards, 1986, p. 303)

<br/>

> "Our result suggests a level of linearity that is extremely unlikely to have arisen from standard sampling under the null hypothesis of linearity. Any actual deviation from perfect linearity would even lower these probabilities." (Fraud complaint against J. Förster, 2012, p. 23)

---

### Why be suspicious?

.pull-left[

### Frequentist

It is very unlikely to see test statistics so close to the null, particularly over and over (significance testing logic).

]

.pull-right[

### Likelihood/Bayes

> "[A]ny *pattern* which we recognize in some data and which is unexplained on the current hypothesis is a signal that we should seek an alternative hypothesis, because an alternative which accounts for it is almost bound to have a higher likelihood." (Edwards, 1986, p. 303).

But why were we suspicious in the first place...?

]

---

count: false
## Evidence from small F statistics
<img src="index_files/figure-html/smallt1_user_01_output-1.svg" width="80%" style="display: block; margin: auto;" />

---
count: false
## Evidence from small F statistics
<img src="index_files/figure-html/smallt1_user_02_output-1.svg" width="80%" style="display: block; margin: auto;" />

---
count: false
## Evidence from small F statistics
<img src="index_files/figure-html/smallt1_user_03_output-1.svg" width="80%" style="display: block; margin: auto;" />

---
count: false
## Evidence from small F statistics
<img src="index_files/figure-html/smallt1_user_04_output-1.svg" width="80%" style="display: block; margin: auto;" />

---

## Conclusions

Bayesian *marginalization* and *comparative evidence* yield radically different results from significance testing logic.

* Severity is lost: we can claim "strong" evidence `$(BF>10)$` for the null when it is false
    * Flips Benjamin et al (2018)'s argument on its head: *Bayes factors* can make it easy to find misleadingly strong evidence.
* Suspicious results are considered strong evidence for null (in spite of their unexpectedness under null)
    * Losing an intuitive forensic check is *bad* in a replication crisis.

If we **replace** *p* values with Bayes factors, we lose important frequentist checks on Bayes factors!