By Hannelore Aerts, Data Quality Programme Manager

The amount of health data being generated across care organisations is growing exponentially. These data are collected not only by healthcare professionals, but increasingly also by devices and patients themselves.

Health data are generally collected with the primary goal of providing high-quality patient care. However, they are also often re-used to further improve patient outcomes and to foster innovation; for example to enable personalised medicine, to feed into artificial intelligence algorithms or decision support systems, or to drive clinical research. In parallel, clinical pathways should be constantly monitored to optimise the efficiency of care organisation.

To allow valid and reliable decision making based on health data, it is of utmost importance to ensure the data are of high quality. In this blog article, we describe our take on what constitutes data quality, where to put the threshold for data to be considered of high quality, and how we have formalised this into an assessment framework.

What do we mean with data quality?

There is much international literature proposing a diversity of dimensions and methodologies to describe and measure the complex multidimensional aspects of data quality [1]–[6]. Across studies, little agreement exists about the exact definition and meaning of data quality dimensions. Despite differences in terminology, though, many of the proposed dimensions and solutions aim to address conceptually similar data quality features.

To obtain a consensus on the dimensions of data quality that matter most within the health ecosystem, i~HD has brought together an international, multi-disciplinary team of domain experts in electronic health records, health data quality, assessment methodologies and certification programmes.

Following a review of the existing literature, our data quality task force identified nine frameworks for quality assessment of health data [3], [5]–[12]. From these frameworks, nine data quality dimensions were selected based on iterative discussions that were held during a series of workshops with clinical care, clinical research and ICT leads from 70 European hospitals. The selected data quality dimensions were deemed most important to assess the quality of health data if this data is to be useful for patient care, for organisational learning and for research.

DimensionWhat could go wrong?
CompletenessIf data is incomplete, so will be your findings and insights. Even a single missing piece of information in a medication or allergy list could result in a potentially dangerous treatment being suggested by clinical decision support systems.
ConsistencyWhen data is not entered in the correct format, standard queries will not be able to interpret the information. As a result, these data items will be discarded during the performance of any automated procedure, leaving potentially important knowledge gaps.
CorrectnessClinical decisions based on incorrect information will be equally misguided. At times, the error might be obvious for human interpreters (e.g., height and weight values inverted) or clinicians (e.g., consistently normal kidney function tests for patient on dialysis). Nevertheless, standard analytic procedures and support systems might fail to detect such inaccuracies and provide erroneous advice.
UniquenessDuplicated data can bias population analysis results. In addition, partially duplicated files can create confusion about which data to use for clinical decision making, wasting valuable time.
StabilityData collected across different sites or over time is often aggregated during analyses, without paying much attention to possible differences across sources or trends over time. However, results will be distorted when multi-site or longitudinal variability is not properly taken into account.
TimelinessData is often not recorded or updated in real-time. When information is not entered into the right system in time, though, it can get lost or lose its relevance. For example, a medication list updated from a pharmacy subsystem too late after a patient’s discharge might no longer be up to date in case of a rapid re-admission.
ContextualisationWithout annotation of the acquisition context, data might not be interpretable. For example, blood glucose level thresholds as indication for gestational diabetes differ for a random check versus assessment while fasting. In the absence of this contextual information, test results can be ambiguous, requiring additional or repeated assessments.
TrustworthinessMultiple data quality issues, manifested by misguided clinical decisions or biased research findings, cause damage to an organisation’s reputation, diminishing opportunities for future collaborations and project involvement.
RepresentativenessIn order to draw valid research conclusions, the study sample should be representative of the target population.

How good is good enough?

A second important aspect related to the definition of data quality is what level of measured quality can be considered high or good quality, or perhaps even good enough quality. As implied by the ubiquitous slogan “fitness for purpose” that is often used alongside data quality, this critical threshold varies depending on what the data will be used for. Although fitness for purpose in relation to data quality has been discussed in the literature [13], [14], there has been little work done so far to formalise what this means.

Should data quality be higher for clinical decision making compared to research?
In order to properly take care of an individual patient, it is crucial to know whether a treatment course or operative procedure has led to the desired functional improvement. If that information is missing or incorrect, downstream care decisions could be suboptimal, especially when relying on clinical decision support systems. In contrast, if you were looking to examine clinical outcomes following a subset of different treatment options, your investigation query would probably remain accurate enough if there was a small percentage of missing or incorrect data in the database.

On the other hand, changes in coding practice or system updates can be reflected in patients’ data, without posing a serious threat to affect clinical care decisions. Aggregating such data for research purposes, though, without taking into account existing changes over time, can severely bias research findings.

Our data quality assessment framework

Together with our clients, we embark on a data quality journey to make sure data are fit for their intended purpose; whether that is to enhance the quality or efficiency of clinical care or to participate in a clinical trial. Since no universal rules can be formalised to define high-quality data, we customise our data quality services for each individual project. In particular, we organise a scoping meeting together with the client, to discuss which data they want to have assessed for its quality, and what purpose the data should serve. Depending on their needs and questions, we then agree upon a selection of most relevant data quality dimensions, and corresponding quality thresholds that can characterise good quality data for the project at hand.

More information?

Check our data quality page or get in touch with our Data Quality
Programme Manager at


[1] C. Batini, C. Cappiello, C. Francalanci, and A. Maurino, “Methodologies for data quality assessment and improvement,” ACM Comput. Surv., vol. 41, no. 3, 2009, doi: 10.1145/1541880.1541883. 

[2] S. G. Johnson, S. Speedie, G. E. Simon, V. Kumar, and B. L. Westra, “A Data Quality Ontology for the Secondary Use of EHR Data,” 2015. 

[3] M. G. Kahn, M. A. Raebel, J. M. Glanz, K. Riedlinger, and J. F. Steiner, “A Pragmatic Framework for Single-site and Multisite Data Quality Assessment in Electronic Health Record-based Clinical Research,” Med Care, vol. 50, no. 0, pp. 1–7, 2012, doi: 10.1038/jid.2014.371. 

[4] S.-T. Liaw et al., “Towards an ontology for data quality in integrated chronic disease management: A realist review of the literature,” Int. J. Med. Inform., vol. 82, no. 1, pp. 10–24, 2013, doi: 10.1016/j.ijmedinf.2012.10.001. 

[5] N. G. Weiskopf and C. Weng, “Methods and dimensions of electronic health record data quality assessment: Enabling reuse for clinical research,” J. Am. Med. Informatics Assoc., vol. 20, pp. 144–151, 2013, doi: 10.1136/amiajnl-2011-000681. 

[6] C. Sáez, J. Martínez-Miranda, M. Robles, and J. M. García-Gómez, “Organizing data quality assessment of shifting biomedical data,” Stud. Health Technol. Inform., vol. 180, pp. 721–725, 2012, doi: 10.3233/978-1-61499-101-4-721. 

[7] T. Botsis, G. Hartvigsen, F. Chen, and C. Weng, “Secondary Use of EHR: Data Quality Issues and Informatics Opportunities.,” AMIA Jt. Summits Transl. Sci. Proc., vol. 2010, pp. 1–5, 2010, [Online]. Available: 

[8] M. N. Zozus et al., “Assessing Data Quality for Healthcare Systems Data Used in Clinical Research (Version 1.0),” NIH Collaboratory, 2014. 0.pdf. 

[9] S. Davoudi et al., “Data Quality Management Model (2015 Update) – Retired,” J. AHIMA, vol. 86, no. 10, pp. 62–65, 2015, [Online]. Available: 

[10] M. G. Kahn et al., “A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data,” eGEMs (Generating Evid. Methods to Improv. patient outcomes), vol. 4, no. 1, p. 18, 2016, doi: 10.13063/2327-9214.1244. 

[11] F. Bray and D. M. Parkin, “Evaluation of data quality in the cancer registry: Principles and methods. Part I: Comparability, validity and timeliness,” Eur. J. Cancer, vol. 45, no. 5, pp. 747–755, 2009, doi: 10.1016/j.ejca.2008.11.032. 

[12] M. Sariyar, A. Borg, O. Heidinger, and K. Pommerening, “A practical framework for data management processes and their evaluation in population-based medical registries,” Informatics Heal. Soc. Care, vol. 38, no. 2, pp. 104–119, 2013, doi: 10.3109/17538157.2012.735731. 

[13] S.-T. Liaw, J. Taggart, S. Dennis, and A. Yeo, “Data quality and fitness for purpose of routinely collected data–a general practice case study from an electronic practice-based research network (ePBRN).,” AMIA Annu. Symp. Proc., vol. 2011, pp. 785–794, 2011. 

[14] M. W. Reynolds, A. Bourke, and N. A. Dreyer, “Considerations when evaluating real-world data quality in the context of fitness for purpose,” Pharmacoepidemiol. Drug Saf., no. April, pp. 1–3, 2020, doi: 10.1002/pds.5010.