Top 10 Data Auditor Statistical Software Requests
By Andy O’Connor, ERA Sciences

The life sciences industry relies on statistical programming software companies like SAS and Minitab for manufacturing process and supply chain optimization and regulatory submissions. These tools enable process developers to conduct complex modeling, generate reproducible reports, and comply with regulatory requirements.
Statistical analysis provides a backbone for establishing critical process parameters and helps bring variability to heel. For regulatory submissions, it offers evidence for process validation and reproducibility. However, consideration of statistical software as a GxP computerized system is often overlooked. This neglect can have significant implications given the recent increasing focus on data integrity.
Statistical programming tools in which models are custom coded and are more flexible are becoming popular, in place of software that provides models out of the box. These statistical programming tools generally mean that a programming platform must be installed and qualified before a statistical model can be developed and validated for intended use with GxP data. For example, GAMP 5 has identified that statistical programming tools can be identified as category 1 infrastructure software but excludes the business applications such as statistical models developed using these packages.
In addition, many statistical software programming tools are moving to the cloud and leveraging more complicated statistical models enabled by artificial intelligence and machine learning.
To assist you in ensuring data integrity and reliability for the statistical analysis of GxP data, we will guide you through the top 10 data integrity requests an auditor might make during an audit of a statistical software system.
Top 10 Data Integrity Requests For Statistical Software
1. GxP computerized systems inventory list
An inventory list will immediately indicate whether the statistical programming software is within the scope of GxP practices, some common details such as GAMP category and ownership such as the business and systems owners, system release date for use and risk category. Specific details might include if the programming platform is independent of the business application/statistical model.
2. Governance SOPs for data, validation, and risk
Standard operating procedures (SOPs) can be reviewed for the following:
- Control and management of records, demonstrating ALCOA+
- GxP data review processes
- GxP categorization for computerized systems and how this is documented
- Overall risk management of GxP computerized systems
- Validation policy
3. SOPs for statistical model generation and validation
Key areas of interest include:
- data governance practices from collection to controlled storage and maintenance
- procedures for managing and reviewing statistical models
- validation processes for the statistical accuracy of model calculations.
In AI/ML models, additional considerations are around training data and preventing issues like overfitting or bias.
4. Validation report for statistical models
A validation report details the model's intended use, testing outcomes, and any referenced risk assessments. Where the software provides statistical models out of the box as standard, the validation report of the software and the model may be together. There may be separate validation reports if the model is custom programmed by the business on statistical software.
5. Requirements traceability matrix (RTM)
The RTM demonstrates how user requirements for statistical models were captured and tested, linking them to validation steps and ensuring traceability.
6. Evidence of model and code review
Where statistical models are built on statistical programming software, code review processes would help determine:
- the statistical basis is sound
- code is version-controlled and maintained.
Unit tests are performed as required (particularly for independently developed/open-source components).
7. SOP(s) for software administration
Software administration practices and controls include software release management, vendor management, patient data security and the protection of human subjects, user access controls, backup/restore procedures, business continuity of data, and periodic reviews.
8. Evidence of backup and restore capabilities
The presence of successful backup and restore evidence is crucial to prevent vulnerability to data loss.
9. Quality management system (QMS) review
The QMS review covers quality records related to statistical software, including deviations, corrective actions, and change management.
10. Last completed periodic review
This looks at evidence of regular evaluations of your statistical software, confirming it remains compliant and fit for purpose. Periodic reviews will inform whether risk categorization is still correct or if additional controls now need to be applied.
Conclusion
The No. 1 FDA finding is lack of procedures, so, as part of a data integrity audit, ensure you have the requisite procedures to ensure that your data is controlled.
Preparing for a statistical software audit can be daunting, but a systematic approach ensures compliance while supporting the reliability of your drug development processes.
A version of this article first appeared on ERA Science’s blog. It is republished here with permission.
About The Author:
 Andy O’Connor is a director at ERA Sciences with over 16 years of experience in risk management and technology within the life sciences industry. He focuses on quality and IT governance for companies transitioning from clinical to commercial manufacturing. He holds an honors degree in science from University College Dublin. As a software developer, Andy has contributed to numerous enterprise application projects and hosts client training events and workshops on risk and data integrity.
Andy O’Connor is a director at ERA Sciences with over 16 years of experience in risk management and technology within the life sciences industry. He focuses on quality and IT governance for companies transitioning from clinical to commercial manufacturing. He holds an honors degree in science from University College Dublin. As a software developer, Andy has contributed to numerous enterprise application projects and hosts client training events and workshops on risk and data integrity.
 
                