Software for error detection goes well beyond scrutiny. This vignette presents broadly similar packages and apps, with no claim to completeness.
Please contact me if you know about relevant software that isn’t listed here (email: jung-lukas@gmx.net).
The tools are heuristically categorized in terms of how much they have been used in the course of error detection.
Techniques that come up often in discussions of error checking.
For good reason, statcheck by Sacha Epskamp and Michèle Nuijten is the best-known error detection software. It reconstructs p-values and tests them for consistency with their respective statistic, such as t or F. Even better, it operates on PDF files automatically, enabling users to scan massive amounts of published articles. Steve Haroz built a simple edition of the statcheck web app.
James Heathers’ SPRITE algorithm reconstructs possible distributions of raw data from summary statistics. For R users, it was implemented in rsprite2 by Lukas Wallrich, building up on code by Nick Brown. Jordan Anaya developed a Python-based SPRITE app.
Recent and ongoing development in forensic metascience methods.
The unsum package implements CLOSURE, a technique by Nathanael Larigaldie that generalizes SPRITE to find all possible samples, not just a few. It is extremely fast because the core algorithm is written in Rust, which makes CLOSURE suitable even for moderately large samples.
Ian Hussey’s raft of checks for summary statistics:
TIDES, an R package and a Shiny app for a consistency test of summary statistics of data with a known minimum and maximum (e.g., Likert scales, or where the empirical min and max is known or can be well estimated). TIDES also assesses the magnitude of variability given the scale bounds.
The ANCHOR app to check for consistency between the whole sample and its subgroups.
The PORT app to test correlation tables for consistency.
The ellipse of insignificance app by David Robert Grimes tests the robustness of dichotomous outcome trials.
ScrutiPy by Nicolas Roman Posner provides a Python interface to some of scrutiny’s functionality. It also features CLOSURE and methods to recalculate confusion matrices. Like unsum, it relies on Rust implementations, so it runs very fast.
These software projects are less well known and have not been widely employed; at least not in forensic metascience. They are listed here because they might have some potential for error checking. However, their forensic utility has not been thoroughly examined.
The R package validate by Mark P.J. van der Loo provides numerous tools for data checking.
The delta-F test for linearity, a.k.a. the “Förster test”, was implemented in Dale J. Barr’s R package forsterUVA.
Several R packages leverage the Benford distribution of naturally occurring numbers to assess whether reported numbers are, in fact, natural. These packages include:
benford.analysis by Carlos Cinelli contains various sophisticated tools for inspecting data using the Benford distribution.
jfa by Koen Derks offers a full statistical auditing suite (including Benford analysis).
XLTest is a tool for auditing Excel files (but it is not free).
The Rust crate SeaCanal analyzes numeric sequences, uncovering patterns of operations that might have generated them.
Emerging from the Pruitt investigations, there is now R software for analyzing sequences:
The package twopointzerothree (by an anonymous developer) checks data for sequences of perfectly correlated numbers. These numbers are either duplicates of each other or they are duplicates offset by some constant amount; hence the name.
Similarly, the sequenceSniffer app by Anne Rutten detects repetitions in sequences.