P-values are a common component and outcome measure in most every published observational or randomized clinical trial. However, junior faculty, fellows, and residents have little or no training in statistics and are forced to rely on the interpretation of results based solely on the authors or secondary sources. This education gap applies to an even larger audience including many physicians, researchers, journalists, and policy makers. That is a dangerous approach. Statistical analysis of data often involves the calculation and reporting of the p-value as statistically significant or not, without much further thought. But p-values are highly un-replicable and their definition is not directly associated with reproducibility. Findings from clinical studies are not valid if they cannot be reproduced. Although other methodological issues relate to reproducibility, such as statistical power to reproduce an effect, the p-value is arguably at the root of the problem given its wide variability from study to study. Many common misinterpretations and misuses of the p-value are practiced. It is essential to bring more awareness to this critical issue by providing a deeper educational understanding of the p-value to the proper interpretation of study results. Recognizing this need the American Statistical Association (ASA) recently published its first ever policy statement concerning their proper use and interpretation of p-values for scientists and researchers. This policy statement addresses the misguided practice of interpreting study results based solely on the p-value, given that it is often irreproducible in subsequent, similar studies. To further educate and illustrate this issue we investigated the irreproducibility of the p-value by using simulation software and results reported from a published randomized control trial. We show that the probability of attaining another statistically significant p-value varied quite widely on replication. We also show that power alone determines the distribution of p, and will vary with sample size and effect size. The percentage of replication means which fell within the original confidence interval (CI) from each replicated experiment revealed that the 95% CI included only 85.4% of future replication means. In conclusion, p-values interpreted solely by themselves, can be misleading if interpreted devoid of context potentially leading to biased inferences from clinical studies.
- Dance of the p-values
- Exploratory software for confidence intervals (ESCI)
- Interpreting p-values
- Irreproducibility of p-values
- Misunderstanding and misconceptions of p-values
- Null hypothesis significance testing (NHST)