Skip to Main Content

Evidence Based Practice

This tutorial was developed by staff at Duke University Medical Center Library and the Health Sciences Library at the University of North Carolina at Chapel Hill. It is used with permission by Saint Louis University Medical Center Library.

Appraising diagnostic test studies

Evaluating the Validity of a Diagnostic Test Study


 Are the results valid?

 1. Did participating patients present a diagnostic dilemma?

The group of patients in which the test was conducted should include patients with a high, medium and low probability of having the target disease. The clinical usefulness of a test is demonstrated in its ability to distinguish between obvious illness and those cases where it is not so obvious or where the diagnosis might otherwise be confused. The patients in the study should resemble what might be expected in a clinical practice.

2. Did investigators compare the test to an appropriate, independent reference standard?

The reference (or gold)  standard refers to the commonly accepted proof that the target disorder is present or not present. The reference standard might be an autopsy or biopsy. The reference standard provides objective criteria (e.g., laboratory test not requiring subjective interpretation) or a current clinical standard (e.g., a venogram for deep venous thrombosis) for diagnosis. Sometimes there may not be a widely accepted reference standard. The author will then need to clearly justify their selection of the reference test. 

3. Were those interpreting the test and reference standard blind to the other results?

To avoid potential bias, those conducting the test should not know or be aware of the results of the other test.

4. Did the investigators perform the same reference standard to all patients regardless of the results of the test under investigation?

Researchers should conduct both tests (the study test and the reference standard) on all patients in the study regardless of the results of the test in question. Researchers should not be tempted to forego either test based on the results of only one of the tests. Nor should the researchers apply a different reference standard to patients with a negative results in the study test.

Key issues for Diagnostic Studies:

  • diagnostic uncertainty
  • blind comparison to gold standard
  • each patient gets both tests


What are the results?



Reference Standard
Disease Positive

Reference Standard
Disease Negative

Study Test
True Positive False Positive
Study Test
False negative True Negative


Sensitivity = true positive / all disease positives 

measures the proportion of patients with the disease who also test positive for the disease in this study. It is the probability that a person with the disease will have a positive test result. 

Specificity = true negative / all disease negatives 

measures the proportion of patients without the disease who also test negative for the disease in this study. It is the probability that a person without the disease will have a negative test result. 

Sensitivity and specificity are characteristics of the test but do not provide enough information for the clinician to act on the test results.  Likelihood ratios can be used to help adapt the results of a study to specific patients. They help determine the probability of disease in a patient.

Likelihood ratios (LR):

LR + = positive test in patients with disease / positive test in patients without disease

LR - =  negative test in patients with disease / negative test in patients without disease

Likelihood ratios indicate the likelihood that a given test result would be expected in a patient with the target disorder compared to the likelihood that the same result would be expected in a patient without that disorder.

Likelihood ratio of a positive test result (LR+) increases the odds of having the disease after a positive test result.

Likelihood ratio of a negative test result (LR-) decreases the odds of having the disease after a negative test result.


How much do LRs change disease likelihood?

LRs greater than 10 or less than 0.1 cause large changes
LRs 5 – 10 or 0.1 – 0.2 cause moderate changes
LRs 2 – 5 or 0.2 – 0.5 cause small changes
LRs less than 2 or greater than 0.5 cause tiny changes
LRs = 1.0 cause no change at all


How can I apply the results to patient care?


Will the reproducibility of the test result and its interpretation be satisfactory in your clinical setting?
Does the test yield the same result when reapplied to stable participants?

Do different observers agree about the test results?

Are the study results applicable to the patients in your practice?
Does the test perform differently (different LRs) for different severities of the disease?
Does the test perform differently for populations with different mixes of competing conditions?

Will the test results change your management strategy?
What are the test and treatment thresholds for the health condition to be detected?

Are the test LRs high or low enough to shift posttest probability across a test or treatment threshold?

Will patients be better off as a result of the test?
Will patient care differ for different test results?
Will the anticipated changes in care do more good than harm?

Based on:  Guyatt, G. Rennie, D. Meade, MO, Cook, DJ.  Users’ Guide to Medical Literature: A Manual for Evidence-Based Clinical Practice, 2nd Edition 2008.