Since September 2019, we — Jivesh Ramduny and Mélanie Garcia — have enrolled as PhD Students in the Imaging Mind Architecture Lab at Trinity College Institute of Neuroscience (TCIN), Trinity College Dublin. Our research focuses on using conventional and state-of-the-art neuroimaging techniques (e.g. functional connectome fingerprinting, dynamic functional connectivity, machine/deep learning) to facilitate the search of potential biomarkers in psychiatric and neurodevelopmental conditions. From the beginning, we focused our attention to make our own research reproducible and openly accessible to the wider scientific community. This article introduces the creation of the ReproducibiliTea Dublin Journal Club, and the one-year discussion we had about reproducible science at TCIN.
Back in January 2020, Jivesh attended the Advanced Methods for Reproducible Science workshop in Windsor, UK which was organised by the UK Reproducible Network (UKRN). Inspired by prominent advocates for open and reproducible science, such as Marcus Munafò, Dorothy Bishop, Chris Chambers, and following in the footsteps of two of the co-founders of the ReproducibiliTea JC including Sophia Crüwell and Sam Parsons, we launched Ireland’s first ReproducibiliTea JC ReproTeaDublin at Trinity College Dublin. ReproTeaDublin is led by Jivesh Ramduny and Mélanie Garcia, and supported by our PhD mentor, Clare Kelly. The purpose of ReproTeaDublin is to bring early career researchers (ECRs), undergraduates and postgraduates, and faculty members from any discipline together to discuss (i) the reproducibility crisis in research; (ii) barriers impacting replicability and reproducibility; (iii) how to make our own work open and accessible; and (iv) how to make open science the norm in research.
Why do we have a reproducibility crisis in research?
We started by discussing growing concern about the lack of reproducible findings in psychological research and beyond. Some even claim that the majority of the published findings in the scientific literature are false. At the very least, it is thought that effects that are reported in journals are significantly inflated and that the true underlying effects may be much smaller, or may even be in opposing directions. Failure to reproduce published results in high impact journals has been attributed to questionable research practices (QRPs). For example, in a study which surveyed more than 5000 psychologists led to almost 1 in 10 research psychologists claiming that they had introduced falsified data. The majority of researchers admitted that they engaged in QRPs such as selective reporting, HARKing (hypothesis after results are known) and data dredging among others.
As researchers such as Munafò, Bishop, and Chambers have outlined hypothesis-led or confirmatory studies suffer from a high degree of irreproducibility because of a wide range of factors beyond QRPs, including: (i) failure to control biases — no randomised control, blinding; (ii) low statistical power — underpowered studies to correctly reject the null hypothesis; (iii) poor quality control — presence of outliers/noisy observations in measurements; (iv) p-hacking — doing more statistical tests to report just the significant ones (p<0.05); (v) HARKing — collecting more data after discovering statistically significant outcomes or to push “near significant” results over the boundary; and (vi) publication bias — publishing only positive findings, i.e. findings that achieve statistical significance (p<0.05) and discarding null outcomes (known as the “file drawer problem”).
In our second ReproTeaDublin meeting, we discussed The Nine Circles of Scientific Hell paper by Neuroskeptic (2012) which used Dante Alighieri’s Inferno as a metaphor to describe the fate of scientists who sin against the best research practices. As we have seen above, some researchers adopt QRPs, rather than following the best practices as norms in their own work. Others tend to oversell the importance of their work by claiming that it is a “novel” approach or “we are the first to study…” when it is not the case. It is also common for researchers to report their results without any knowledge of their post-hoc origin (e.g. how many tests were conducted, how was the data transformed to ensure normal distribution). As such, this may escalate the false discovery rate in the literature. We also saw that p-hacking is an unethical approach in addition to removing “outliers’’ in the dataset in order to satisfy one’s convenience of obtaining results well above chance. Of note, in many cases, data curation is acceptable if some data points have been corrupted by noise (e.g. motion, respiration, cardiac process, sleep). Plagiarising other people’s work is strictly considered as fraud in science and we should always avoid doing it! Meanwhile, putting away the null results in a drawer as it increases one’s chance of publishing a paper in a high impact journal is just a waste of resources. At last, it is not acceptable to partially publish the data that a researcher has been working on just because the remaining data confounds the statistical significance of the results. We are not going to even talk about inventing data… but the penalty is to go straight to the deepest depths of hell.
Is there a problem within our research system?
After considering the wide range of factors that contribute to irreproducible research, we changed gear to understand why researchers might engage in those practices in the first place. We focused on the Fast Lane to Slow Science paper by Uta Frith (2020) which posits that researchers have to adapt to the “publish or perish culture” to survive in academia — hence, the fast science culture. The “publish or perish culture” has been known to be a driving force to push academics to apply for fundings, promotions, tenure positions and build their own research labs. In other words, our research system seems to tell us that it is the survival of the fittest who will remain in academia.
Many researchers are not willing to publicly acknowledge that this fast science culture comes at a dear cost. Many ECRs are pushed to their limits to perform a series of analyses, prepare conference abstracts, write grants, teach and so forth. This creates a high level of anxiety for many as they cannot cope with their daily to-do lists which remain ever growing. This fast science culture also makes us less cautious in our work and we tend to be less updated with the latest published research in the field. Hence, if researchers are already struggling with their time, where would they find time to make their research open source?
Can we go back to doing slow science?
This is perhaps one of the most difficult questions to answer. In order to revert to doing “slow” science, one could think about their research through a long-term lens in order to achieve lifelong reputation, more secure positions (e.g. tenure) and long-term research goals. Reverting to “slow” science could also urge researchers to focus on quality assessment rather than quantity assessment. Changing the fast science culture would also need to allow for more teamwork and collaborations instead of having selected people (e.g. senior academics) to write and publish multiple papers. Maybe a less popular opinion could lead to restricting the output of researchers (e.g. amount of papers published) in different journals on a yearly basis.
How can we encourage reproducible practices?
Through metascience studies, we have been able to conclude that a major reproducibility crisis has been affecting the scientific community for the last decades. Following this observation, we wondered what tools could foster reproducibility in research, and who should participate in this transformation. In order to foster reproducibility in research, the A manifesto for reproducible science paper by Munafò et al. (2017) propose concrete solutions that could be implemented at each step of the research process: methods, reporting and dissemination, reproducibility, evaluation and incentives. In addition to researchers, these measures would involve many key stakeholders: (i) journals/publishers, (ii) funders, (iii) institutions, and (iv) regulators.
Being open to novelty is an essential quality for a scientist. However, the hope to find something might mislead us towards apophenia (i.e. the tendency to see patterns in random data), confirmation bias (i.e. the tendency to focus on evidence that is in line with our expectations or favoured explanation) and hindsight bias (i.e. the tendency to see an event as having been predictable only after it has occurred). Munafò et. al. explain that we can protect ourselves from cognitive biases thanks to various practices.
Blinding consists in voluntarily hiding or falsifying a part of your data during the experiment or the analysis. It helps experimenters to remain neutral during all the process. By defining the protocol of analysis in advance, pre-registration of the study — see more details in the next section — is also an efficient blinding method because the data and results are unknown at this step.
In addition, every researcher should remain updated on the evolution of analytical algorithms and softwares they use in order to build a consistent protocol of analysis, as well as a continuing professional education and development ethic. Sometimes, distortion of the reality might arise from investors who, willingly or otherwise, prioritize their interests. Researcher conflicts of interest and personal beliefs might lead to this issue too. The intervention of a third independent party in the implementation of the project could be a solution to save the fairness of the research. For example a committee like the Independent Statistical Standing Committee (ISSC), provided by the CHDC foundation, might help in the well development and interpretation of a statistical analysis pipeline. Likewise, collaboration and team science would improve the objectivity of studies. Furthermore, if all the members share their computational resources and datasets, it would significantly increase the statistical power, robustness of outcomes, and hence, the replicability of the study. The Collaborative Replications and Education Project (CREP) offers training, support and professional growth for students and instructors who realise replication projects in Psychology. The Pipeline describes the Pre-Publication Independent Replication (PPIR) approach that helps in making projects more reproducible. The Many Labs also helps to perform multi-site research coordinating data collection.
Reporting and dissemination
Communicating research clearly and transparently is fundamental to make it reproducible.
Munafò et al. (2017) outline two issues that are addressed by conducting pre-registering studies: publication bias — the fact that a lot more studies are carried out than published due to the devaluation of negative results, and analytical flexibility — particularly outcome switching and p-hacking. Munafò et. al. depict the strongest form of pre-registration as involving both registering the study (with a commitment to make the results public) and closely pre-specifying the study design, primary outcome and analysis plan in advance of conducting the study or knowing the outcomes of the research. This helps in differentiating both exploratory and testing steps, enabling respectively to generate hypotheses and to validate these. Pre-registration has been common in clinical medicine for several years due to the requirements by journals and regulatory bodies, such as the Food and Drug Administration (FDA) in the United States and the European Medicines Agency (EMA) in the European Union. A great support to extend this practice to other fields is provided by the Open Science Framework and AsPredicted — offering services to pre-register studies, Preregistration Challenge — offering education and incentives to conduct pre-registered research, and journals, which are adopting the Registered Reports (RRs — see more details below) format to encourage pre-registration and add results-blind peer review.
RRs are a way to document hypotheses and analysis plans before proceeding with data collection. Building RRs require defining the hypotheses, processing pipelines and statistical analyses prior to conducting the actual study. The benefit of submitting a RR is that a researcher is guaranteed publication after the RR has been approved at Stage 2 Peer Review. RRs usually undergo two review procedures: (i) Stage 1 Peer Review after designing the study and describing the hypotheses, methods (e.g. data preparation, statistical tests); and (ii) Stage 2 Peer Review after writing the report including the findings. More information on RRs can be found on the COS platform. Also, there are 250+ journals which are encouraging researchers to submit a RR including Cortex, Nature Human Behaviour, Neuroimage and Royal Society Open Science among the complete list found here.
However, submitting a RR may be difficult for some researchers as we discussed in The Preregistration Revolution paper by Nosek et al. (2017). The challenges of submitting a RR can be presented in various forms which are as follows: (i) discovery of assumption violations can happen during the analysis itself; (ii) pre-existing data which makes blinding impossible; (iii) large, multivariate and longitudinal datasets are difficult to include as it is impossible to pre-register every possible analysis or hypothesis testing; (iv) too many experiments; (vi) high-risk research; (vii) few apriori assumptions as researchers may conduct an exploratory study; and (ix) narrative selection.
Low reusability also results from a lack of clarity and transparency on what has been done. Some studies (see www.COMPare-trials.org , http://www.tessexperiments.org/) showed that few papers were actually consistent with their pre-registered reports for depicting all the experimental conditions and all the expected outcomes. Munafò et. al. also highlight that there is a great tendency to not communicate negative results, thus increasing the positive bias of research papers. Consequently, following rigorous guidelines while imparting all the observed effects exhaustively is essential to build reproducible work and publications. To help in better planning and reporting experiments, the Transparency and Openness Promotion (TOP) guidelines provide principles and standards as a basis for journals and funders. Other projects support the planification and dissemination of research in specific fields, e.g. Consolidated Standards of Reporting Trials (CONSORT), Equator Network, Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA), PRISMA-P, and Preregistration Challenge, the pre-registration recipe for social-behavioural research.
Munafò et al. (2017) assert that the accumulation of knowledge in science is strongly conditional on the reliability of research. Coming up with a well-written paper is the first step to give visibility to a project. Thus, Munafò et al. explain that being highly transparent — by sharing data, experimental protocols, analysis methods, peer reviews — scales up reliability and reproducibility, as well as the global impact of the project.
Open Science has turned easier to implement for researchers thanks to the TOP guidelines and more globally, the Open Science Framework. Sharing data and methods has become simple since the apparition of platforms like GitHub, Bitbucket and Amazon S3. Currently, there are open databases specific to various fields like OpenNeuro for neuroimaging and behavioural sciences studies (see Listing of Open Access Databases — LOADB). Additionally, Open Science has been more and more encouraged by funders and publishers, who sometimes require open data — like the journal Science and the conglomerate of Springer Nature journals — and a specific disclosure on methods — like the journal Psychological Science. The Wellcome Trust, with the launch of Wellcome Open Research, also proffers their researchers free, open-access publication with transparent post-publication peer review.
Awarding researchers by badges which was suggested by the Center for Open Science and adopted by journals such as Psychological Science has had a great positive effect on data sharing. Besides, thanks to the Peer Reviewers’ Openness Initiative, reviewers can pledge to not provide a thorough feedback and analysis for any manuscript that does not share data — except for a congruent reason. Munafò et. al. underline that many funding agencies such as the Research Councils in the United Kingdom and the National Institutes of Health (NIH) and National Science Foundation (NSF) in the United States have increased pressure on researchers to release data. To our knowledge in Ireland, the Irish Research Council (IRC) requires a data management plan from the awardees which details how the data will be stored and shared. This procedure has been prompting researchers to adopt Open Science practices.
There have been great efforts from researchers to build national and international collaborations in the hope to improve the reproducibility and replicability of measures. This has encouraged researchers to share and pool data from multiple sites which in turn led to establishing openly available datasets. For example, such efforts have been much appreciated in neuroimaging with the establishment of the following datasets: 1000 Functional Connectomes Project; ADHD200; Autism Brain Imaging Data Exchange (ABIDE); Human Connectome Project (HCP); Consortium for Replicability and Reproducibility (CoRR); UK Biobank; Health Brain Network (HBN); Adolescent Brain Cognitive Development (ABCD); IMAGEN; Enhancing Neuro Imaging Genetics Through Meta Analysis (ENIGMA); and Philadelphia Neurodevelopmental Cohort (PNC). As such, the ability to aggregate independent datasets have favoured the possibility to increase the sample sizes of neuroimaging studies by recruiting significantly more participants, thereby increasing the statistical power of the effects of interest and variance between disease states and control conditions.
Conducting reproducible analyses comes down to using open source softwares, toolboxes and pipelines. For example, many researchers are taught to conduct statistical analyses in SPSS and MATLAB. However, both softwares are proprietary and are not reproducible. Hence, attention should be drawn towards using Python and R which offers flexible, open source platforms to perform complex data analysis and data visualisation irrespective of the research field. There has been a huge push to use Jupyter Notebooks or R Notebooks to allow researchers to describe their analyses and codes with the added benefits that the notebook can be generated in a pdf or html format. Similarly, open source pipelines are becoming increasingly popular. For example, the C-PAC and fMRIPrep pipelines have been developed to allow researchers to preprocess functional neuroimaging datasets in an open source fashion using in-built state-of-the-art techniques. Both pipelines address several issues which are often subjected to the decision of a researcher such as parameter selection and order of preprocessing steps. This is particularly important as in a recently published paper, 70 teams analysed the same neuroimaging dataset and they all revealed variations in their reported findings due to differences in the methodologies — which makes reproducibility impossible!
Munafò et al. (2017) highlight how the democratisation of the Internet has made researchers the masters of disseminating their own work thanks to preprint services like arXiv for some physical sciences, bioRxiv and PeerJ for the life sciences, engrXiv for engineering, PsyArXiv for psychology, and SocArXiv and the Social Science Research Network (SSRN) for the social sciences. However, the evaluation process is essential in science. Science is rooted in good quality and replicable research, that is why traditional journals have been playing a crucial role as they act like a filter through the reviewing phase.
New open evaluation processes have emerged recently in platforms like preprint services with public comments on manuscripts, or public platforms to comment on published works like PubPeer. Result-blind review — taking into account only the context and methodology in the review — has also been endorsed by several journals. Both peer- and post-reviews accelerate the revision of a manuscript or an approach because they are much faster and less formal than the traditional journal review method. Moreover, open reviewers might build a reputation based on the quality of their reviews, whereas in journals, reviewers are anonymous.
All the three methods — open comment on preprint manuscript, traditional journal reviews, and open comment on published works — look complementary, and combining all of these might be a catalyst for improving quality and reproducibility of papers.
The main way to evaluate the value of every researcher or team or institution is by considering the publications. If the publications are “good”, the reputation amplifies and it facilitates procedures such as employment or funding applications. Munafò et. al. insist on the fact that there is an inequity on the publishable potential between different types of works.
Researchers are motivated to publish positive, novel and clean results often at the cost of accuracy and reproducibility. The fact of filing the null findings in a drawer and moving away from the study is referred to as the “file drawer problem”, in which studies that fail to reject the null hypothesis do not become part of the literature. This issue dramatically jeopardises science credibility and advances. As Munafò et. al. say: “Funders, publishers, societies, institutions, editors, reviewers and authors all contribute to the cultural norms that create and sustain dysfunctional incentives. Changing the incentives is therefore a problem that requires a coordinated effort by all stakeholders to alter reward structures.”
Examples of successful efforts towards promoting transparency and reproducibility include rewarding open-data papers with badges, results-blind reviewing on Registered Reports, and the TOP guidelines. We have also seen that funders spur the adoption of open practices through data management plan requirements. In addition, they have been offering funding opportunities for replication studies like the Netherlands Organisation for Scientific Research (NWO) and the US National Science Foundation’s Directorate of Social, Behavioral and Economic Sciences (SBE). As for institutions, they are turning their policies and education towards Open Science, and their infrastructures towards promoting data sharing.
Likewise, open-science practices are becoming part of hiring and performance evaluation (e.g. https://www.nicebread.de/open-science-hiring-practices/). Moreover, a researcher may report their null findings as a preprint (e.g. bioRxiv, PsyArXiv, MedRxiv) to allow feedback from the research community in addition to having their effort recognised by the wider scientific audience. In addition, there has been a commendable effort to see journals such as Journals of Negative Results in BioMedicine being created with the unique aim of helping researchers to publish null findings. In recent times, more journals with the likes of PloS One, Nature Human Behaviour and BMC Psychology have started to welcome publications irrespective of the results (i.e. significant or non-significant).
The efforts to promote transparent, reproducible and rigorous science have also suffered from different forms of resistances over the years. We discussed the Is the Replicability Crisis Overblown? Three Arguments Examined paper by Pashler & Harris (2012) to showcase three arguments made by prominent academics who argued that the replicability crisis has been overscaled and provide counter evidence to support the notion that the crisis is far from being overblown.
The first argument claims that there will be some non-zero rate of false positives but scientists keep this probably low by setting a relatively conservative alpha level (i.e. 5%). As an example, let us assume that there is a 10% chance that the effect of interest actually exists. If that study has a power of 80%, then the proportion of true positives can be calculated as 10% x 80% = 8%. The proportion of false positives (Type 1 error) can be calculated as 90% x 5% = 4.5%. The proportion of the published positive finding being erroneous can be derived as the proportion of false positives divided by the sum of the proportion of false positives plus the proportion of the true positives. Hence, for α = 0.05, the proportion of positive results that is false is computed as 4.5% / (4.5% + 8%) = 36%. If the power of the study reduces from 80% to 35%, then the proportion of positive results being erroneous increases to 56% for the same alpha level.
The other problem is that the prior probability of effects is usually low for many research fields including epidemiology and genome-wide association studies as researchers test a reasonably large number of effects for which, given a positive result, would proceed to publish the result and devise a narrative interpretation. In psychology, studies can be partly exploratory even when it is presented as a confirmatory template. In addition to testing many hypotheses with a low likelihood of effects, there is also a tendency to carry out flexible analytical approaches which in turn allows the true α to increase well above the nominal level.
The second argument claims that researchers in many areas of psychology carry out direct replication attempts only rarely. However, researchers frequently attempt (and publish) conceptual replications, which are often more effective than direct replications for assessing the reality and importance of findings because they test not only the validity but also the generality of the finding. It is true that direct replications are rarely conducted by researchers compared to undertaking conceptual replications. However, conceptual replication attempts are more prone to publication bias and have a tendency for any results to be described as being “interesting”.
When researchers conduct a direct replication study and fail to replicate an effect well above chance, the ideal scenario would entail publishing the null outcomes in a certain journals or make them public via a website or preprint. However, publishing non-replication studies is quite rare. But if a researcher undertakes a conceptual replication study and succeeds, then the study will be encouraged to be published by the community. In other words, a conceptual replication success is more likely to be publishable than a direct replication success which would explain why researchers may move from conducting direct replications and focus their efforts on conceptual replication studies. Thus, having a scientific culture and an incentive scheme that encourage and reward conceptual instead of direct replications would undeniably strengthen the presence of publication bias in addition to allowing us to believe in a phenomenon that does not exist, a term known as “pathological science”.
The last argument claims that science is self-correcting but slow — although some erroneous results may get published, eventually these will be discarded. Current discussions of a replication crisis reflect an unreasonable impatience. This bold statement suggests that if one waits long enough, then erroneous findings in the literature might eventually be corrected. However, evidence for this slow correction process shows that the median time between the original and the replication studies in psychology is about 4 years with only 10% of the replication studies occurring longer than 10 years. Thus, this suggests that researchers are targeting more recent studies for their replication attempts. This does not mean that researchers have moved on from their research theme if the results did not replicate. Instead, researchers may find their research interests change with time or the research questions have been successfully answered. One needs to be mindful that the presence of meta-analyses support the notion that researchers are still in touch with their research themes although they have embarked upon newer studies.
Florian Markowetz from University of Cambridge has also shown the type of responses that he has received over the years when trying to promote reproducible practices in research. The below responses have also been published in his Five selfish reasons to work reproducibly paper:
“It’s only the result that matters!”
“I’d rather do real science than tidy up my data.”
“Mind your own business! I document my data the way I want!”
“Excel works just fine. I don’t need any fancy R or Python or whatever.”
“Reproducibility sounds alright, but my code and data are spread over so many hard drives and directories that it would just be too much work to collect them all in one place.”
“We can always sort out the code and data after submission.”
“My field is very competitive and I can’t risk wasting time.”
In conclusion, it seems that erroneous results are entering the literature at an alarming rate and given the current scientific practices, it is almost certain that the proportion of false findings will continue to remain uncorrected for an indefinite period of time. This has grave consequences as erroneous findings can be propagated through textbooks and review articles. Given the rarity of replication studies regardless of the discipline, correcting these errors will need researchers to implement reforms in current practices and the reward structure. The combination of all these efforts could then make traits like rigor, transparency and reproducibility become common in research practices, and restore the prestige of science.
This first year of discussions has heightened awareness about reproducible science at TCIN. Next year, we plan to sensitise a larger pool of researchers at Trinity College Dublin by organising seminar talks and meetings with other labs. We also plan to focus on reproducible neuroscience research by joining forces with the Neuroscience Society to schedule webinars and panel discussions with faculty staff. Further, we are working towards the practical aspect of reproducible science by providing demo sessions to our community. The demo sessions would encapsulate a broad array of topics such as pre-registered reports, reproducible code using Python and R, data sharing via Github in addition to leveraging on openly available datasets and pipelines to preprocess and analyse data.
Reproducibility go Bragh!
Jivesh Ramduny & Mélanie Garcia