Skip to main content

Innovative Methods for Psychology: Reproducible, Open, Valid, and Efficient

Periodic Reporting for period 2 - IMPROVE (Innovative Methods for Psychology: Reproducible, Open, Valid, and Efficient)

Reporting period: 2018-12-01 to 2020-05-31

With numerous failures to replicate, common misreporting of results, widespread failure to publish non-significant results or to share data, and considerable potential bias due the flexibility of analyses of data and researcher’s tendency to exploit that flexibility, psychological science is said to experience a crisis of confidence. These issues lead to dissemination of false positive results and inflate effect size estimates in meta-analyses. This leads to poor theory building, an inefficient scientific system, a waste of resources, lower trust in psychological science, and psychology’s outcomes being less useful for society.The goal of this ERC project is to improve psychological science by offering novel solutions to five vexing challenges: (1) to counter misreporting of results by using our new tool statcheck in several studies on reviewers’ tendency to demand perfection and by applying it to actual peer review. (2) to counter the biasing effects of common explorations of data (p-hacking) by professing and studying pre-registration and by developing promising new approaches called blind analysis and cross-validation using differential privacy that simultaneously allows for exploration and confirmation with the same data. (3) to counter the common problem of selective outcome reporting in psychological experiments by developing powerful latent variable methods that render it fruitless to not report all outcome variables in a study. (4) to counter the problem of publication bias by studying and correcting misinterpretations of non-significance. (5) to develop and refine meta-analytic methods that allow for the correction of biases that currently inflate estimates of effects and obscure moderation. The innovative tools developed in this project have the potential to improve the way psychologists (and other scientists) analyse data, disseminate findings, and draw inferences.
"We have made progress in further developing statcheck to detect inconsistencies in the reporting of statistical results. Much of our efforts were focused on the reporting of results of structural equation modelling (SEM). SEM is widely used across many different fields and allows for several re-computations that we have now managed to program and pilot at a few dozens of papers. We are expanding a hand-checked database of SEM papers for a wider test set and will put the data and code in the public domain when they are thoroughly debugged and verified. We are also far in designing and pre-registering the surveys in Work package A and are working with publishers to run a trial to see whether the use of statcheck helps avoids the appearance of misreported results in scientific publications.

For the reanalysis project (WP B) we created a database of psychology datasets and rigorously checked whether the data include any sensitive data and/or information that could be used to re-identify participants. We consider this relevant to a wider audience and will publish a separate paper on these important issues while not using some of these data in the reanalysis project to comply with the GDPR. We also checked data licenses and availability of computer syntaxes to ensure a timely completion of this major work package. As indicated elsewhere, the reanalysis project is running behind schedule for a host of reasons, but we expect completion of this important project before the end of the ERC project.

For creating novel analyses techniques as part of Work Package C, we built a flexible, efficient and extensible simulator which allows us to build various multi-verses based on different statistical models, tests, pre-/post-processing procedures, as well as the presence of diverse set of questionable research practices (or “p-hacking”). Currently, we are rigorously testing our tools against our own — already established — simulation studies. In the next step, while we are expanding the capabilities of our tool, we will begin designing relevant experiments and simulations in order to test the potential of differential privacy as a mechanism for generating private pseudo-samples from the original sample.

We conducted extensive simulations to consider statistical properties in small samples (which are common in experimental settings) as part of Work Package D. The first results of these simulations have been published (Maassen & Wicherts, 2019) and more work is done on power and bias with this promising approach to analysing experimental data. We also conducted a large meta-scientific study of the reproducibility of effect size computations that sheds important light on the challenges in pinpointing the main outcomes of studies in meta-analyses and meta-science. This work is currently under review after revisions. The PI also worked on building novel psychometric models to analyse interactions (Lodder et al., 2019 MBR) which appear to work very well in the context of health psychology.

Work package E concerns statistical inferences, power, and pre-registration. We have completed three studies, which are all currently under review or in revision. In the first, we found that many surveyed psychologists employ a strategy to count the number of significant and non-significant outcomes when drawing inferences of a research hypothesis being true, even though there exist much better statistical ways to draw inferences from such supposedly mixed results. We presented those findings at different conferences and submitted a revised version for publication. We checked the specificity of pre-registrations and power analyses in pre-registered studies and found that power analyses are often poorly conducted and pre-registrations are in need of improvement by using good checklists and improved guidelines. We are also setting up a large collaborative study that seeks to check whether researchers have actually followed their pre-registration. Combined these studies have thrown important light on contemporary practice and offers evidence-based guidance on how to improve scientific practice related to inferences, power, and pre-registration. A more general paper discussed these issues in a wider context (Wicherts, 2017).

In the last Work package F, we have managed to create an exciting new meta-analytic model that deals with selective outcome reporting. Simulations show very promising statistical properties. We currently applying the new method to actual data (from intelligence research) and are finalizing the paper for wider dissemination. We also applied p-uniform and other publication bias techniques to a large meta-meta-analysis (van Aert et al., 2019) and to a meta-analysis of priming studies (Lodder et al., 2019 JEPG). Finally, the postdoc worked on expanding p-uniform to enable analyses of heterogeneous effect sizes (which are common in many meta-analytic applications). The method to correct for selective outcome reporting is particularly promising as it offers an entirely new approach to dealing with one of the most well know and most serious sources of bias in how researchers analyse and report research results (in the context of significance testing this is often denoted ""p-hacking""). The general model has the potential to also correct for other types of p-hacking that have been identified in the literature. The new method could be applied to many earlier meta-analyses to create unbiased estimates of effects that are relevant for both theory and practice."
Statcheck: this novel tool allows researchers, publishers, and readers to automatically retrieve statistical results and check for them for consistency. The tool can be used not only to correct errors but also to retrieve statistical information from thousands of articles in the literature in meta-research. In IMPORVE we further extend statcheck to read and check more types of results, including those from Structural Equation Models (SEMs) which are widely used in many different fields.

Pre-registration: This methodological tool is increasingly being used to diminish potential biases in the analyses of data and reporting of statistical results. In IMPROVE, we consider how well pre-registrations currently achieve their potential and how they can be improved and adopted more widely.

Improved analyses of experimental data: In IMPROVE we develop novel, robust, and widely applicable ways to analyse experimental data. These methods avoid biases caused by researcher degrees of freedom (“p-hacking”) and lead to much better inferences and more replicable and trustworthy results.

Meta-analytic tools: in IMPROVE we develop novel meta-analytic tools that enable the correction of biases caused by the common failure to publish non-significant results and the common opportunistic use of analytic maneuverability in researchers’ quest for significance. These meta-analytic tools help correct potent biases and offer correct and more informative, and more useful summaries of existing studies in many different fields.