Skip to content


Ensuring Trust and Efficacy

5 April 2024

Ensuring Trust and Efficacy: The Critical Role of Validation in AI models used to increase our understanding of Human Physiology 

In an era where technology increasingly intersects with our health, the integration of artificial intelligence (AI) holds immense promise for revolutionising our understanding of human physiology. However, one crucial aspect cannot be overlooked: validation. Ensuring the accuracy, reliability, and safety of AI models is paramount to their successful integration into the way we describe, diagnose or discover things about human physiology. 

Data Validation

Traditional approaches in machine learning and AI have often focused more on models than data but the classic principle of garbage in, garbage out demonstrates that validation must start with data. 

High-quality, well-labelled datasets are crucial for validating model predictions against known benchmarks. Yet, the challenge intensifies in the context of a generalised foundation model encompassing broad aspects of human physiology.  Here, both labelled and unlabelled data are invaluable. Unlabelled data can help uncover patterns and relationships across different physiological modalities. 

In both cases, diverse and representative data that spans various demographics (e.g., health conditions, age groups, genders, and ethnic backgrounds) is crucial to ensure a model’s applicability across the spectrum of human conditions and populations.  By systematically reporting on the representativeness of the training data, Prevayl can better gauge the broader applicability and reliability of our model. 

Model Validation and Evaluation

Beyond the training phase, continuous validation of the model’s output is essential to maintain its reliability and effectiveness. Healthcare AI models often operate in dynamic environments, where real-time data streams and patient variables can significantly impact their performance. Regular validation checks, comparing model predictions against ground truth data, help identify any deviations or discrepancies, allowing for timely adjustments and improvements. 

Moreover, ensuring explainability of model output is critical for clinicians to understand the rationale behind AI-generated recommendations or decisions, promoting trust and facilitating clinical adoption. Tools such as RAGAS and metrics such as SHAP (SHapley Additive exPlanations) values can be utilised to determine explainability and accuracy in models. RAGAS traditionally is used in a retrieval augmented generation model architecture, but Prevayl intends to use retrieval metrics such as context precision and context recall to evaluate our transformer-based model’s ability to take multimodal inputs and generate an accurate output. 

The output will then pass through an LLM to provide further context to the prediction. This final generated output can be evaluated by measuring its faithfulness (how factually accurate is the generated answer) and answer relevancy (how relevant is the generated answer to the question). These metrics, when interpreted, should reflect how well the model will perform in practice and as a result are heavily reliant on the quality of the test bench it is run through.  

Setting The Industry Standard

Prevayl’s intention is to develop a HumanHealthTestBench that can be seen as an industry standard to evaluating AI based models focussed on human health. The test bench will consist of data that can be passed through all the fundamental parts of an artificial medical intelligence in isolation (ablation studies) and throughout the whole pipeline.  

Independent Validation

In addition to internal validation measures, seeking independent validation from third-party experts is crucial to enhance the credibility and trustworthiness of any AI model. Independent validation by domain specialists and regulatory bodies provides valuable insights into the model’s utility, safety, and ethical implications. Collaborating with external stakeholders not only validates the model’s efficacy but also fosters transparency and accountability in its deployment. Furthermore, incorporating explainable AI techniques enables stakeholders to interpret and scrutinize the model’s output, fostering confidence in its reliability and aiding in decision-making processes. 

In the rapidly evolving landscape of our understanding of human physiology, Prevayl believe that validation emerges as a cornerstone for ensuring the trust, efficacy, and ethical integrity of AI-driven solutions. From validating training data to scrutinising model output, seeking external validation and championing explainability, every step contributes to building robust AI models that can transform our understanding of human physiology. Our intention is straightforward: to lead the way in validation.