The interpretation of a screening examination ultimately falls into one of two categories: normal or abnormal. These judgments, in turn, ultimately are divided among four categories based on either the determination or estimation of the underlying reality of having or not having breast cancer at the time of screening. A normal interpretation may be a true-negative or a false-negative.
A positive interpretation may be a true-positive or a false-positive. As noted in the introduction, screening tests for breast cancer divide women into those who are likely to have the disease and those who are unlikely to have the disease. By convention, each of these initial findings is given a summary determination on the basis of the patient’s status at 1 year after the screening examination and are defined as follows: true-positive (TP) - cancer diagnosed within 1 year after biopsy recommendation based on an abnormal mammogram; true-negative (TN) - no known cancer diagnosed within 1 year of a normal mammogram; false-negative (FN) - cancer diagnosed within 1 year of a normal mammogram; false-positive (FP) - no known cancer diagnosed within 1 year of an abnormal mammogram. There are three definitions of false-positives: FP1 refers to a case recalled for additional imaging evaluation of an abnormal finding on a screening mammogram in which no cancer was found within 1 year or a recalled case was not shown to be malignant within 1 year.
The additional imaging may take place at the same time as the screening examination or at a later date. FP2 refers to a case in which no known cancer was diagnosed within 1 year after an abnormal mammogram and recommendation for biopsy or surgical consultation. This definition is based on the recommendation for biopsy alone - a biopsy may or may not be done. FP3 refers to a case in which benign disease is found at biopsy within 1 year after an abnormal mammogram and biopsy. This last definition of FP is similar to the definition proposed by the American College of Radiology (ACR) breast imaging reporting and data system, described later in the section on Quality Assurance.
These summary categories are the basis elements for measuring the performance of screening programs. Because the large majority of women who undergo mammography examinations do not have breast cancer, nearly all normal interpretations (true-negatives) are accurate. True-positives are obviously measured in the near term by biopsy results. False-negatives or false-positives are based on the assumption that cancer would have been detected, or was not present, on the basis of the presence or absence of histologic confirmation of disease within 1 year.
Sensitivity is a measure of the probability of detecting breast cancer when it is truly present. It is the proportion of patients found to have breast cancer within 1 year of screening who were identified as having an abnormality at the time of screening [sensitivity = TP/(TP + FN)]. Another method of measuring sensitivity has been outlined by Day and Walter and is based on the ratio of observed (invited group) to expected (control group) rates after omitting results from the first screening round as a basis for estimating the proportion of cases that would be expected to appear as clinical cancers.
Specificity is the probability of correctly identifying a patient as normal when no cancer exists and is the proportion of all patients not found to have breast cancer within 1 year of screening who had a normal interpretation at the time of screening [specificity = TN/(TN + FP)]. Positive predictive value (PPV) of a screening test is the proportion of all screening cases that result in a diagnosis of cancer and varies according to the criteria for a false-positive interpretation (PPV1–3 = TP/TP + FP1–3). PPV1 is the rate of cancer diagnosis among all women with an abnormal mammogram (FP1). PPV2 is the rate of cancer based on the subset of women with abnormal mammograms who receive a recommendation for biopsy or surgical consultation (FP2). PPV3 is the proportion of all screening examinations that result in a diagnosis of cancer based on biopsies performed (FP 3). According to the Agency for Health Care Policy and Research (AHCPR), approximate targets for PPV1 are 5% to 10%, whereas for PPV2 or PPV3 the target is 25% to 40%.
Sensitivity estimates derived from the trials have shown that the sensitivity of mammography was higher in women aged 50 and older than in women aged 40 to 49. For women in their 40s, sensitivity ranged from 53% to 81%, whereas for women who were 50 and older it ranged from 73% to 88%. These sensitivity estimates are based on a variety of screening protocols and screening intervals, of which the latter is likely to be the major factor in observed age group differences. If the screening interval is wider than the estimated mean sojourn time, the false-negative rate (the complement of sensitivity) is adversely influenced due to the comparatively higher rate of interval cancers that arise within the screening interval.
When Swedish Two-County data are adjusted for the screening interval and estimated age-specific mean sojourn time, the magnitude of these differences diminishes, ith sensitivity estimated to be 86% for women aged 40 to 49 and approximately 93% for women aged 50 to 59. Furthermore, service screening programs offer a glimpse into current estimates of sensitivity when screening intervals are actually tailored to the estimated sojourn time. When women are screened for breast cancer at appropriate intervals, the estimated differences in sensitivity between younger and older women diminish considerably. In a screening program in Albuquerque, New Mexico, representing more than 100,000 women, sensitivity was 85.3% for women aged 40 to 49 and 87.7% for women aged 50 and older.
In the University of California at San Francisco screening program, the sensitivity was 86.7% for women aged 40 to 49 and 93.6% for women aged 50 to 59. In each of these instances, a difference in the sensitivity of mammography in younger versus older women is still seen, but the data show that the sensitivity of the test in the two age groups is more similar than different when screening intervals reflect the underlying mean sojourn time. In each case, the large majority of cancers are detected at the time of screening.
Specificity is also an important measure of the efficacy of a screening program, and even small differences in program specificity can mean a large difference in program efficacy and costs. Data from the Centers for Disease Control and Prevention’s (CDC) National Breast and Cervical Cancer Early Detection Program, representing more than 200,000 women in hundreds of facilities across the United States, provides a good picture of comparative program specificity by age.
The rate of abnormal mammograms among women aged 40 to 49 was 5.8%, and among women aged 50 to 59 it was 5.6%. In the series from San Francisco mentioned previously, the abnormal rate among women aged 40 to 49 was 6.4%, and among women aged 50 to 59 it was 6.8%. Other data series show similar patterns. These series also show a similar pattern in specificity for false-positives based on biopsy results and thus higher PPV with increasing age. Because the prevalence of disease is lower in women in their 40s, the yield of cancers from these procedures is naturally lower. Thus, although the rate of false-positive results based on additional mammographic views and referral for biopsy is very similar in younger and older women, the cost-effectiveness of screening improves with increasing age.
Robert A. Smith and Carl J. D’Orsi
R. A. Smith: Cancer Screening, Department of Cancer Control, American Cancer Society, Atlanta, Georgia
C. J. D’Orsi: Diagnostic Radiology, University of Massachusetts Memorial Medical Center, Worchester, Massachusetts