Research on the Comparative Effectiveness of
Medical TreatmentsOver the past 30 years, federal spending on Medicare and Medicaid has roughly tripled as a share of gross domestic product (GDP), rising from about 1.3 percent in 1975 to about 4 percent in 2007. According to the Congressional Budget Office’s (CBO’s) projections, under current policies such spending will reach about 12 percent of GDP by 2050—but substantial uncertainty surrounds that estimate.1 If costs per enrollee continued growing over the next four decades as quickly as they have grown over the past four—about 2.5 percentage points faster than per capita GDP—then federal spending on those programs would reach about 17 percent of the economy. If, instead, costs per enrollee did not exceed the growth of GDP, those federal costs would reach about 6 percent of GDP in 2050 solely because of demographic changes (see Figure 1). As those figures indicate, the rate at which health care costs grow relative to income is the most important determinant of the country’s long-term fiscal balance; it exerts a significantly larger influence on the budget over the long term than other commonly cited factors, such as the aging of the population or the coming retirement of the baby-boom generation.2
Federal Spending for Medicare and Medicaid as a Percentage of Gross Domestic Product Under Different Assumptions About Excess Cost Growth
Source: Congressional Budget Office.
Note: Excess cost growth refers to the number of percentage points by which the growth of annual health care spending per beneficiary is assumed to exceed the growth of nominal gross domestic product per capita.
Rising health care costs represent a challenge not only for the federal government but also for private payers. Indeed, trends in both sectors reflect many of the same underlying forces—including the development and spread of new and more-expensive medical technologies—so controlling those federal costs over the long term will be difficult without addressing the forces that are also causing private costs for health care to rise. Total health care spending, which consumed about 8 percent of the U.S. economy in 1975, currently accounts for about 16 percent of GDP, and that share is projected to reach nearly 20 percent by 2016. About half of overall health spending in the United States is now publicly financed, and half is privately financed.
A variety of evidence suggests that opportunities exist to constrain health care costs both in the public programs and in the rest of the health system without adverse health consequences. Perhaps the most compelling evidence of those opportunities involves the substantial geographic differences in spending on health care—both among countries and within the United States—which do not translate into higher life expectancy or measured improvements in other health statistics in the higher-spending regions. For example, Medicare’s costs per beneficiary vary significantly among different regions of the country, but much of the variation cannot be explained by differences in the population, and the higher-spending regions perform no better on available measures of average health outcomes than the lower-spending regions do.
Furthermore, hard evidence is often unavailable about which treatments work best for which patients and whether the added benefits of more-effective but more-expensive services are sufficient to warrant their added costs—yet the current health system tends to adopt more-expensive treatments even in the absence of rigorous assessments of their impact. Indeed, the extent of the variation in treatments may be greatest when evidence about their relative effectiveness is lacking. Together, those findings suggest that better information about the costs, risks, and benefits of different treatment options, combined with new incentives reflecting the information, could eventually alter the way in which medicine is practiced and yield lower health care spending without having adverse effects on health. Over the long term, the potential reduction in spending below projected levels could be substantial.
Generating evidence that compares treatments is what research on "comparative effectiveness" does. This Congressional Budget Office (CBO) paper makes the following main points about the options that are available for an expanded federal role in supporting and organizing such research and about the impact that research could have on spending for health care:
■
Because any private-sector entity (such as a health plan) has only a limited incentive to produce or pay for information that could benefit many entities—including its competitors—an argument can be made for a larger federal role in coordinating and funding research on comparative effectiveness. In addition, because federal health insurance programs play such a large role in financing medical care and account for such a large share of the budget, the federal government itself has an interest in generating evaluations of the effectiveness of different approaches to health care.
■
If policymakers wanted to expand federal efforts to study comparative effectiveness, the endeavor could be organized in different ways—for instance, by augmenting an existing agency, by establishing a new agency, by supporting an existing quasi-governmental organization, or by creating a new public-private partnership. In choosing an organizational arrangement and a mechanism to provide federal funds to it, trade-offs could arise between the entity’s independence from political pressure and its accountability to policymakers and other interested parties. Efforts to bolster comparative effectiveness research would be more likely to change medical practice if the organization coordinating the research was respected and trusted by doctors and other professionals in the health sector.
■
The level of funding required for a new or augmented entity would depend largely on what its additional activities involved. Synthesizing existing studies or analyzing available data on medical claims would be less expensive than conducting new head-to-head clinical trials to compare treatments but could also yield less definitive results—and therefore might have a smaller impact on medical practice. Clinical trials could be more persuasive but also more time-consuming, and there is probably a limit to how many comparative trials could be undertaken effectively at any given time. If privacy concerns could be addressed, having more health records available in electronic form would facilitate the use of such data for research.
■
Studies might need to compare not only broadly different treatment options—such as surgery versus drug therapy—but also different approaches to the same basic treatment—such as different levels of follow-up care after surgery. Studies that included an analysis of cost-effectiveness would probably have a larger impact than ones that compared only clinical effectiveness, because they would highlight cases where more-expensive treatments or approaches provided added benefits that were modest compared with their added costs (at least for some types of patients).
■
To affect medical treatment and reduce health care spending in a meaningful way, the results of comparative effectiveness analyses would not only have to be persuasive but also would have to be used in ways that changed the behavior of doctors, other health professionals, and patients. For example, the higher-value care identified by comparative effectiveness research could be promoted in the health system through financial incentives—the payments doctors receive or the cost sharing that patients face. Making substantial changes in payment policies or coverage rules under the Medicare program to reflect information on comparative effectiveness would almost certainly require legislation.
■
Making such substantial changes in the delivery of health care could prove difficult and controversial for a number of reasons. To inform new systems of incentives—designed to discourage the use of more costly treatments that provided little or no added benefits— the results of effectiveness studies would have to be sufficiently robust to minimize the risk of overlooking subgroups of patients who could benefit greatly from a treatment. Even with an expanded evidence base, some patients and providers might object to the use of such incentives, and keeping pace with new treatments and procedures would be an ongoing challenge.
■
Generating additional information about comparative effectiveness and making corresponding changes in incentives would seem likely to reduce health care spending over time—potentially to a significant degree. The precise impact, however, depends on several factors and is difficult to predict. Given the time necessary to conduct the research, to alter incentives in a manner reflecting the results, and to affect behavior through those changes, any potential for substantial cost savings from new research would probably take a decade or more to materialize. Even so, generating additional information comparing treatments would tend to reduce federal health spending somewhat in the near term—but that effect may not be large enough to offset the full costs of conducting the research over that same time period.
The Current State of Comparative Effectiveness Research
In weighing options to expand and reorganize research efforts, it is useful to define what comparative effectiveness research means and to consider the arguments for an expanded federal role in conducting such research. Related issues include the reasons why the current stock of research on comparative effectiveness is limited and why treatments and procedures can gain wide use even when evidence about their relative effectiveness is lacking. Reviewing past and current research efforts—by private and public organizations in the United States and by other countries—also sheds light on several issues and challenges likely to arise in any future U.S. efforts. To the extent that past and current efforts are seen as inadequate, careful consideration of those shortcomings would inform the choice of an organizational approach and funding mechanism for new federal activities.
What Is Comparative Effectiveness?
As applied in the health care sector, an analysis of comparative effectiveness is simply a rigorous evaluation of the impact of different options that are available for treating a given medical condition for a particular set of patients. Such a study may compare similar treatments, such as competing drugs, or it may analyze very different approaches, such as surgery and drug therapy. The analysis may focus only on the relative medical benefits and risks of each option, or it may also weigh both the costs and the benefits of those options. In some cases, a given treatment may prove to be more effective clinically or more cost-effective for a broad range of patients, but frequently a key issue is determining which specific types of patients would benefit most from it. Related terms include cost–benefit analysis, technology assessment, and evidence-based medicine, although the latter concepts do not ordinarily take costs into account.
While some information about the effectiveness of new drugs, medical devices, and procedures is usually available, rigorous comparisons of different treatment options are less common. Drugs and devices must be certified as safe and effective by the Food and Drug Administration (FDA) before they can be marketed in the United States, but with certain exceptions the regulatory process for approving those products does not evaluate them relative to alternatives.3 Furthermore, physicians commonly prescribe drugs for "off-label" uses—that is, for treatments that have not been certified by the FDA. For drug manufacturers, the costs of conducting additional trials to demonstrate safety and efficacy for a broader set of patients or conditions may outweigh the benefits from the increased sales that would result; in particular, the potential gains from finding a favorable result for a different population would have to be weighed against the risk that safety and efficacy could not be demonstrated conclusively.
Medical procedures, which account for a much larger share of total spending on health care than drugs and devices do, can achieve widespread use without extensive clinical evaluation. In many cases, it may be reasonable to assume that the benefits of a treatment will be similar for related conditions or a broader group of patients. Without hard evidence, however, decisions about what treatments to recommend often depend on the individual experience and judgment of physicians. Various reasons have been cited to explain why the use of new medical technologies can spread even in the absence of proof about their effectiveness and why health costs tend to increase as a result; those reasons include fee-for-service payment of physicians (common in the private sector and prevalent in Medicare, that payment method typically gives doctors a financial incentive to provide more-expensive care) as well as enthusiasm for the newest technology on the part of both doctors and patients.4 Furthermore, patients with insurance typically pay only a small share of the costs of their treatments, so their incentives to weigh the costs against the benefits are limited—a trade-off inherent in having insurance protection.
A recent example of a comparative effectiveness study indicates that careful analysis can sometimes disprove widely held assumptions about the relative merits of different treatments. The study, which involved patients who had stable coronary artery disease, compared the effects of two treatments: an angioplasty with a metal stent combined with a drug regimen versus the drug regimen alone.5 Patients were randomly assigned to receive the two treatments, and although the study found that patients treated with angioplasty and a stent had better blood flow and fewer symptoms of heart problems initially, the differences declined over time.6 More importantly, it found no differences between the two groups in survival rates or the occurrence of heart attacks over a five-year period.
Other examples of studies comparing the clinical effectiveness of different treatment options illustrate the types of findings that they can generate:
■
One recent trial found that older, relatively inexpensive drugs for treating high blood pressure (known as diuretics) were more effective in preventing cardiovascular disease in patients age 55 or older than commonly used newer drugs known as angiotensin-converting enzyme inhibitors and calcium channel blockers.7
■
Another trial compared the effects of surgery to reduce lung volume for patients suffering from emphysema—a treatment that had anecdotal support but lacked hard evidence about its effectiveness—with standard medical therapy for that disease. For many patients, lung surgery increased their risk of death slightly and did not improve their functional status, but for patients with certain types of lung problems and a limited capacity for exercise, the surgery yielded small net improvements in their quality of life (though not in their survival rates).8
■
A trial of two statin drugs, which was sponsored by the maker of one of those drugs, found that its competitor’s product was more effective both at lowering cholesterol levels and at reducing the risk of mortality—illustrating the point that comparative trials can be risky for manufacturers to conduct.9
■
Recent studies have found that magnetic resonance imaging combined with mammography is more effective than mammography alone in detecting breast cancer for women with certain genetic markers that indicate a substantial increased risk of contracting that disease; the impact of that difference on survival rates, however, could not be measured.10
The range of findings that those studies yielded highlights several characteristics of research on comparative effectiveness. First, studies can examine not only treatments for health problems but also different procedures to screen for the presence of a disease. Second, the findings may have broad applicability or may pertain only to a very specific subset of patients and may also vary in the outcomes considered—such as effects on mortality or other measures of health gains.
Third, studies are often based on clinical trials, in which eligible patients are randomly assigned to the treatments under review—but there are several other methods available to compare treatments, each with its own strengths and weaknesses. Clinical trials can yield persuasive findings but can also be relatively costly and time-consuming to conduct. In particular, a trial designed to determine whether two treatments differ in their effectiveness may require a large number of enrollees to be followed for an extended period in order to generate results that are statistically significant. Less expensive approaches include systematic reviews of the evidence about treatment options, which are essentially meta-analyses of all available studies, and studies that use medical claims data, which can be used to follow large groups of patients who have already received different treatments. The impact of systematic reviews can be limited, however, by the fact that they simply reflect existing evidence, and studies using claims data can be subject to bias because the treatments are not randomly assigned to comparable patients.
The studies cited above focus on relative clinical effects, and not cost-effectiveness. For reasons discussed below, gauging cost-effectiveness as well as clinical effectiveness is sometimes controversial, and some observers believe that the two considerations are in separate fields. But cost-effectiveness analysis appears to be well within the scope of research on comparative effectiveness—and has been applied to many of to the treatments discussed above. For example, an additional analysis of lung-volume-reduction surgery, which focused on the patients likely to benefit from the surgery, found that it would be cost-effective if its benefits persisted for 10 years but might not be so if those benefits dissipated after three years.11 (That study did not follow patients for a decade and therefore had to estimate the future benefits.) Similarly, another study examined the cost-effectiveness of more-expensive screening mechanisms for breast cancer and found that it varied substantially with the age of the patient.12
More generally, the relative cost-effectiveness of treatment options is clear when a less expensive treatment yields comparable or superior health gains. In other cases, however, determining whether the additional medical benefits of a more expensive treatment warrant their added costs is complex. Typically, the benefits of different treatments are summarized as an increase in life expectancy or, more commonly, as an increase in quality-adjusted life years (QALYs) to account for effects on morbidity as well as mortality. That calculation reflects estimates of how much people value improving their health or avoiding various side effects, which are combined to create a single metric. By convention, cost-effectiveness analyses report results as the cost per QALY gained, so a lower dollar amount indicates a more cost-effective service. If that metric is used to determine whether specific health procedures are covered by an insurance program, choosing a cost-effectiveness threshold can be a controversial endeavor—but that need not be the manner in which such research is applied.
Research in the Private Sector
In the United States, most of the formal research that is done to examine the effects of drugs or medical devices is conducted by the manufacturers of those products in the course of their development; as noted, however, it is the exception rather than the rule that those studies directly compare treatments or products.13 Nevertheless, various other private organizations have also produced assessments and comparisons of some treatments. (Analyses conducted in other countries represent another source of information about treatments; see Box 1.)
Box 1. Research on Comparative Effectiveness in Other Countries
Other developed countries also face challenges financing health care costs and have taken various steps to assess the comparative effectiveness of treatments. Unlike the United States, many of those countries establish overall budgets for their national health systems and regularly use the data on comparative effectiveness that are available to help determine the treatments and procedures to be covered and, in some cases, the payment rates. Despite differences in other countries’ health insurance systems, the approaches that they have taken to organizing and funding those research and review activities could have lessons for any increased U.S. efforts.
Perhaps the best known example of an agency that assesses comparative effectiveness is the National Institute for Health and Clinical Excellence (NICE), which was established in 1999 as part of the United Kingdom’s National Health Service (NHS). It analyzes both the clinical effectiveness and cost-effectiveness of new and existing medicines, procedures, and other technologies and provides guidance on appropriate treatments for specific diseases or types of patients. To date, NICE has published appraisals of over 100 specific technologies, guidance on the use of about 250 medical procedures, and about 60 sets of treatment guidelines—a substantial but not exhaustive list. If NICE approves a drug, device, or procedure, it must be covered by the NHS, but local health authorities make coverage decisions about treatments that NICE has not yet evaluated. With a staff of about 200 and an annual budget of about 30 million pounds (roughly $60 million), NICE does not fund new clinical trials or other forms of primary data collection. Instead, it commissions systematic reviews of existing research on clinical effectiveness and combines those findings with models of cost-effectiveness. Clinical trials are funded by the British Ministry of Health but (as in this country) data on total spending in the United Kingdom for research on comparative effectiveness are hard to come by.
Other countries such as Australia, Canada, France, and Germany have similar review processes, though the organizational and financing arrangements vary—and in several cases, the structures have recently been changed.1 For example, France established a new agency in 2004 to bring together a number of related activities, including the evaluation of drugs, devices, and procedures, publication of clinical guidelines, accreditation of providers, and dissemination of medical information. Germany established a new agency in 2000 that conducts technology assessments and a new Institute for Quality and Efficiency in 2004 that evaluates health care services. Discussions about the use of comparative effectiveness in those countries sometimes focuses on their review processes for prescription drugs, but their efforts generally encompass all forms of acute medical care. (For all the attention they receive, drug costs represent less than 15 percent of health care spending in the United States—so research that focused only on medications would miss the vast majority of services and would not be able to compare drug therapy with surgical procedures or other interventions.)
Although those countries all have government-run health care systems, they have taken different approaches regarding the placement of and funding for their assessment bodies. In the United Kingdom and Australia, the agencies are part of the government’s health departments; France and Canada have established independent not-for-profit organizations; and Germany has taken a mixed approach (the Institute for Quality and Efficiency is independent, but the technology assessment agency is an arm of the health ministry). Financing arrangements vary correspondingly: Funding in the United Kingdom and Australia comes from their health departments, whereas Germany’s independent institute is funded by a levy on inpatient and outpatient health care services (which are mainly reimbursed by the country’s regional health insurance funds), and the French agency gets its funding from a combination of taxes on promotional spending by drug companies, government subsidies, and accreditation fees. Health ministries in Australia, Canada, France, and Germany also help fund clinical trials and other forms of primary research, but total spending related to comparative effectiveness in those countries is also difficult to estimate.
Given the interest that has developed in many countries, it is not surprising that several international organizations have become involved in comparative effectiveness research. The best known may be the Cochrane Collaboration—a nonprofit organization that has a network of volunteers who conduct systematic reviews of treatments. Many of its activities are organized through centers located around the world, including one in the United States. Founded in 1993, the Cochrane Collaboration maintains an accessible database that now contains more than 4,500 reviews; its limited funding comes primarily from subscription fees for its quarterly journal. Any new or expanded U.S. entity that would organize and fund research on comparative effectiveness would probably draw upon Cochrane’s findings and the results of research conducted in other countries (to the extent such research was applicable to U.S. patients).
1. For additional information, see Institute of Medicine, Learning What Works Best: The Nation’s Need for Evidence on Comparative Effectiveness in Health Care (September 2007), Appendix 2, available at www.iom.edu/ebm-effectiveness.Several private-sector organizations exist primarily or exclusively to assess medical treatments and technologies. One prominent example is the Technology Evaluation Center that is part of the Blue Cross Blue Shield Association. Its analyses are based on systematic reviews of the available literature and therefore rely on clinical trials or other studies that have already been conducted. (In such reviews, more weight is given to studies that are judged to be of higher methodological quality.) The center produces about 20 to 25 new assessments of drugs, devices, and other technologies each year; the analyses consider clinical effectiveness but generally do not assess cost-effectiveness.
For-profit private-sector firms that specialize in technology assessments represent another source of analysis. Hayes, Inc., is one of the larger firms in the field. Such firms also conduct systematic reviews and evaluate medical and surgical procedures, drugs, and devices in return for a fee or on a subscription basis. Organizations that are similar but operate as nonprofit entities—sometimes affiliated with academic or medical centers—include the ECRI Institute and the Tufts-New England Medical Center’s Cost-Effectiveness Analysis Registry (which provides an extensive list of the cost-effectiveness ratios that are available from published studies).
In addition, private health plans—most commonly, larger or more integrated ones—conduct their own reviews of evidence and sometimes undertake new analyses of comparative effectiveness using claims data for their enrollees.14 Health plans may choose to publicize the results, or they may decide to keep their findings confidential and use them to shape their policies regarding coverage of and payment for the treatments in question. For example, health plans usually have an entity known as a pharmacy and therapeutic committee that considers the evidence regarding the relative effectiveness of different prescription drugs and makes recommendations about which ones should be covered (that is, included on formularies) or given preferred status. An example of a more public and collaborative effort is the HMO Research Network, a consortium of more than a dozen health maintenance organizations from different parts of the country; started in the mid-1990s, it brings together researchers to share findings and, in some cases, uses data from several plans as the basis for analysis.15
Notwithstanding those current efforts, the private sector generally will not produce as much research on comparative effectiveness as society would value. The knowledge created by such studies is costly to produce—but once it is produced, it can be disseminated at essentially no additional cost, and charging all users for access to that information is not always feasible. As a result, private insurers and other entities conducting research on comparative effectiveness often stand to capture only a portion of the resulting benefits and therefore do not invest as much in such research as they would if they took into account the benefits to all parties. In health plans that do not have exclusive provider networks, some of the benefits probably "spill over" to other health plans using the same doctors, because physicians tend to use a similar approach to care for all of their patients. Even if organizations could keep their findings confidential, so that they captured all of the benefits, some duplication of effort would probably occur. In such a situation, research constitutes a "public good," and economists have long recognized a role for government to increase the supply of such research toward the socially optimal level.
Another reason for the limited availability of information on comparative effectiveness is that public-sector health insurance programs—which collectively account for about 40 percent of all health care spending—have not sought to make extensive use of it. In particular, the Medicare program has made only limited use of comparative effectiveness data in making decisions about which treatments to cover and how much to pay for them. It stands to reason that the limited demand for such research from such a prominent payer has constrained the supply correspondingly. Conversely, increasing the amount of credible and objective research that was available could facilitate moving Medicare toward what former program administrator Mark McClellan has called a "fee-for-value" system rather than a fee-for-service one. (Options to incorporate research findings into Medicare’s coverage and payment policies, along with the issues they raise, are discussed in the final section.)
Past and Current Federal Efforts
In the United States, the federal government has a rather long but somewhat checkered history of involvement in comparative effectiveness research and related efforts. Federal efforts date at least to the late 1970s and the short-lived National Center for Health Care Technology. Established in 1978 as part of the Department of Health, Education, and Welfare, it was given a broad mandate to conduct and promote research on health care technology, and it included an advisory board appointed by the Secretary to assist in setting research priorities. The center sponsored or cosponsored major evaluations of coronary artery bypass graft surgery, dental radiology, and cesarean delivery and made about 75 recommendations to the Medicare program about coverage. The center ceased operations at the end of 1981, however, reflecting changes in priorities for the new Administration and the Congress as well as opposition from some provider and industry groups.16
In that same period, the Office of Technology Assessment (OTA) was created as an advisory agency to the Congress, covering a broad set of issues, including health care. Given the agency’s focus on evaluating technologies, much of its work would now be called research on comparative effectiveness; over the years, it studied a variety of health care topics, including the costs and benefits of screening tests for several diseases. OTA also produced an extensive review and analysis of the issues involved in and options for improving evidence about the clinical effectiveness and cost-effectiveness of medical treatments.17 For a variety of reasons, however—having little to do with its health care studies specifically but instead reflecting broader questions about the agency’s role—OTA was eliminated in 1995.
More recently, the Agency for Health Care Research and Quality (AHRQ) has been the most prominent federal agency supporting various types of research on the comparative effectiveness of medical treatments. Established in 1989 as the Agency for Health Care Policy and Research, AHRQ is an arm of the Department of Health and Human Services (HHS).18 It currently has a staff of about 300 and an annual budget of over $300 million, which primarily funds research grants to and contracts with universities and other research organizations covering a wide range of topics in health services.
AHRQ has undertaken a number of initiatives related to comparative effectiveness. One such step—initially taken in collaboration with the American Medical Association and America’s Health Insurance Plans, a coalition of insurance companies—has been the creation of a national clearinghouse for treatment guidelines, which are designed to summarize the available medical evidence on the appropriate treatments for various conditions. AHRQ has also endorsed about a dozen evidence-based practice centers around the country. Generally affiliated with a university, those centers analyze and synthesize existing evidence about treatments and technologies. Although many studies sponsored by AHRQ have examined only the relative clinical benefits of different treatments, some have also analyzed their cost-effectiveness. Research on comparative effectiveness has accounted for only a modest portion of AHRQ’s budget, though.
As with other agencies examining the effectiveness of medical treatments or evaluating medical technologies, support for AHRQ has varied over time. In the mid-1990s, controversies arose after an agency-sponsored research team concluded that there was insufficient evidence to support certain spinal surgeries and, on the basis of that work, the agency issued practice guidelines for the treatment of back pain.19 Strong opposition from back surgeons, along with broader questions about the value of the research that the agency had funded and other factors, led to proposals to eliminate the agency. Ultimately, the agency was retained, but its funding for fiscal year 1996 was reduced from prior levels (see Table 1). Since then, its overall budget has generally been maintained, at least in nominal terms, or increased. Again in 2002, however, the House of Representatives voted to cut off all funding for AHRQ, though in the end the agency received a small increase in its fiscal year 2003 appropriation.
Requested, Proposed, and Actual Funding for the Agency for Health Care Research and Quality
Agency’s Request House Proposal Senate Proposal Appropriation 1991 109 88 138 115 1992 122 115 127 120 1993 125 118 130 128 1994 158 148 158 154 1995 171 154 166 162 1996 194 66 127 125 1997 144 125 144 143 1998 149 149 143 147 1999 171 171 171 171 2000 206 175 211 204 2001 250 224 270 270 2002 306 306 291 299 2003 250 0 314 304 2004 279 304 304 304 2005 304 304 319 319 2006 319 319 324 319Source: Congressional Budget Office based on data from the Department of Health and Human Services, Agency for Health Care Research and Quality.
Most recently, section 1013 of the Medicare Modernization Act of 2003 authorized AHRQ to spend up to $50 million in 2004 and additional amounts in future years to conduct and support research with a focus on "outcomes, comparative clinical effectiveness, and appropriateness of health care items and services (including prescription drugs)" for Medicare and Medicaid enrollees. The actual funding appropriated for that initiative has been $15 million per year. Using that funding, AHRQ has established an "Effective Health Care" program consisting of three main functions: reviewing and synthesizing existing evidence (using its evidence-based practice centers); generating new information using a set of approved research centers (such as the HMO Research Network) that have access to data from medical claims and electronic medical records; and publishing findings in formats that are geared to the differing needs of clinicians, patients, and policymakers.
Other federal agencies also engage in various activities related to comparative effectiveness research—efforts that receive less attention than AHRQ’s activities but that are probably larger in dollar terms. The Department of Veterans Affairs (VA) has a very substantial research program that reviews evidence from the medical records of its patients, focusing particularly on the clinical effectiveness of treatments. The department also sponsors evidence reviews through a technology assessment program and helps fund clinical trials—including the study comparing stents to drug therapy mentioned above. Indeed, over the past 30 years, some of the most influential clinical trials have been supported by and conducted in the VA health system, including the first major trials that demonstrated the value of bypass surgery over medical therapy for some forms of coronary artery disease as well as head-to-head studies of drugs that treat prostate enlargement. Another source is the National Institutes of Health (NIH), part of HHS, which is the leading federal sponsor of medical research—primarily in the form of clinical trials. Although comparative effectiveness is not a focus of that research, over the years NIH has sponsored a number of trials that compare treatments directly.
The Centers for Medicare and Medicaid Services (CMS) has helped to sponsor a limited amount of research on comparative effectiveness (for example, it covered the medical costs of the study of lung-volume-reduction surgery). When making decisions about what services are covered, however, CMS generally considers only whether devices and procedures are clinically effective. It has sponsored some studies comparing the effectiveness of different treatments but has done so largely to determine whether to establish separate payment rates for similar treatments. For example, CMS is currently cosponsoring a trial with NIH that may eventually compare the effects of daily dialysis for kidney patients with the conventional treatment of dialysis three times per week.20 If daily dialysis proves more effective for certain patients, CMS could modify its payment policy to cover the additional costs of more frequent treatment for those patients.
Estimating the total amount that is spent in the United States each year on research that compares the effectiveness of medical treatments is difficult. According to one recent analysis, the federal government spent about $1.5 billion in 2005 on all health services research, a broader category that includes some of the work on comparative effectiveness but also encompasses many other types of studies.21 For example, that total included AHRQ’s entire budget of roughly $300 million, whereas the funding devoted to the agency’s effective health care program has been $15 million per year. At the same time, that aggregate figure may not include all federal funding for comparative trials or other efforts that are outside the traditional scope of health services research.
Estimating private expenditures is even more challenging. Although drug and device manufacturers spend billions of dollars each year on clinical trials aimed at demonstrating the safety and efficacy of new products, the vast majority of those efforts contribute to comparisons of treatments only indirectly. Data are simply not available on how much is spent by private organizations such a health plans, medical specialty societies, and technology assessment centers to compare medical treatments and procedures. Nevertheless, one recent study estimated that less that $2 billion is spent annually on comparative effectiveness research in this country—and even that rough estimate is subject to uncertainty.22
The Consequences of Limited Information
Whether the cause is limited supply or limited demand, the relative scarcity of rigorous data about comparative effectiveness has several effects. First and foremost, it means that decisions about what treatments to use often depend on anecdotal evidence, conjecture, and the experience and judgment of the individual physicians involved. In many cases, that basis may be sufficient; as some observers have noted, it is not necessary to conduct a randomized trial to determine whether to use a parachute when jumping out of an airplane. But if the benefits of a treatment—or risks of not providing it—are less obvious, the lack of hard data makes determining the appropriate choice of treatment difficult. Although estimates vary, some experts believe that less than half of all medical care is based on or supported by adequate evidence about its effectiveness.23
Evidence about treatments’ effectiveness remains limited even though the number of rigorous studies has grown substantially in recent decades. To illustrate that point, one study simply examined the number of articles that were published each year in peer-reviewed medical journals that reported results from randomized trials.24
Between 1966 and 1995, that number increased dramatically, from about 100 to nearly 10,000—with about half of the cumulative total over that period having been produced between 1990 and 1995. But even if the proportion of treatments based on hard evidence has increased as a result, the share remains relatively low. Furthermore, having the evidence base keep pace with the rapid development of new medical treatments and technologies will remain an ongoing challenge.
Another important effect of limited evidence—indeed, an indicator of that scarcity—is that the use of certain treatments and the types of care provided vary widely from one area of the country to another. For example, even after adjusting for differences in the age, sex, and race of Medicare enrollees, researchers at Dartmouth found about a fourfold variation in the share receiving a coronary artery bypass graft; and those differences were not correlated with rates of heart attacks in each region.25 At the same time, those researchers found that overall surgery rates did not vary systematically; areas with above-average rates for certain procedures had below-average rates for others. Those differences in the use of treatments reflect at least in part the local practice norms that have arisen in each area, and the apparent variation in those norms indicates that there is not sufficient evidence to determine which approach is most appropriate.
Geographic differences in the types of care provided can remain substantial even among patients who turn out to be in their last six months of life. (Examining that period is an analytic approach that can be used in an effort to control for differences in the prevalence and severity of diseases patients have, on the grounds that large groups of patients who are nearing death are likely to have comparable health problems regardless of where they live.) For example, such patients spend nearly 20 days in the hospital over those last six months, on average, in the highest-use areas, compared with an average of about six hospital days in the lowest-use areas. Similarly, the average number of visits to physicians in that period is as high as 50 in some of the highest-use regions and as low as 16 in some of the lowest-use regions.26
The observed variations in the use of services correspond to substantial differences in Medicare spending per enrollee in different parts of the country (see Figure 2). In 2003, average costs ranged from about $4,500 in the areas with the lowest spending to nearly $12,000 in the areas with the highest spending (those averages were adjusted to account for differences in the age, sex, and race of Medicare beneficiaries in the various areas). Some of those differences in spending reflect varying rates of illness as well as differences in the prices that Medicare pays for the same service, which are adjusted on the basis of local costs for labor and equipment in the health sector. But according to the Dartmouth researchers, differences in illness rates account for less than 30 percent of the variation in spending among areas, and differences in prices can explain another 10 percent—indicating that more than 60 percent of the variation is due to other factors.27 Other studies have found that a larger share of the variation in spending can be accounted for by differences in health status and demographic factors, but even so, the remaining differences are substantial in dollar terms.28
Medicare Spending per Capita in the United States, by Hospital Referral Region, 2003
Source: The Dartmouth Atlas of Health Care.
Note: Numbers in parentheses refer to the number of hospital referral regions with per capita spending in each interval.
Of particular relevance to the issue of comparative effectiveness, there is some evidence that the degree of geographic variation in treatment patterns is greater when less consensus exists within the medical community about the best treatment to use. For example, patients who have fractured their hip need to be hospitalized, and there is relatively little variation in admission rates for Medicare beneficiaries with that diagnosis—but for hip replacements and for knee replacements, more discretion is involved and the surgery rates vary more widely (see Figure 3). And there appears to be even more variation in the rates of back surgery—a treatment whose benefits have been the subject of substantial questions. Determining what share of any geographic variation in the use of procedures is due to differences in the treatments that doctors recommend and what share is due to differences in underlying illness rates is challenging, however, so the comparison of procedures may be sensitive to the manner in which the differences in illness rates are estimated.29
Rates of Four Orthopedic Procedures Among Medicare Enrollees, 2002 and 2003
(Standardized discharge ratio, log scale)
Source: Dartmouth Atlas Project, The Dartmouth Atlas of Health Care.
Notes: In the figure, each point represents a hospital referral region; the country was divided into about 300 such regions on the basis of where Medicare enrollees typically receive their hospital care.
The points indicate how the rate at which the procedure is performed (per 1,000 Medicare enrollees) in each referral region compares with the national average rate (which has been normalized to 1.0). Differences in procedure rates were adjusted to account for differences among regions in the age, sex, and race of enrollees and for measures of illness rates.
The implications of the observed variations in treatments and spending depend importantly on their relationship to health outcomes. If life expectancy and other measures were better in the areas with higher spending, that result would imply that increased spending in the low-cost areas would yield health benefits. One recent and well-designed study examined differences in hospital spending in Florida and found that areas with higher spending had lower mortality rates among Medicare patients who were treated in the emergency room for a heart attack.30 Using data on Medicare enrollees nationwide, however, another study found that higher-spending regions did not, on average, have lower mortality rates than the lower-spending regions, even after adjustments to control for differing illness rates among patients and regions.31 That study also found that higher spending did not slow the rate at which the elderly developed functional limitations (reflecting their ability to take care of themselves). Although more research is needed about the impact that differences in spending have on patients’ morbidity and quality of life, perhaps using more-extensive measures of health outcomes, those findings suggest that spending in the high-cost areas could be reduced without adverse effects on the overall health of residents in those areas.
How much could spending be reduced? Some estimates of the potential savings from reducing the variations in treatments are quite large, although questions remain about what mechanisms could achieve those savings and what the effects on health would be. The Dartmouth researchers have suggested that Medicare spending—and perhaps all health spending in the country—could be cut by about 30 percent if the more conservative practice styles used in the lowest-spending one-fifth of the country could be adopted nationwide.32 While they note the need for more research about the specific steps needed to reduce spending levels without harming health, their analysis indicates that the added spending is not contributing to better health outcomes. Other studies suggest that overall health might not suffer in the process of changing practice patterns but that patients who would benefit most from more-expensive treatments might be made worse off as a result, while patients who would do better with treatments that were less expensive would gain.33
Other studies of geographic variation indicate that there may be room to reduce spending without harming health in both high-use and low-use areas of the country. One older study, for example, had independent panels of doctors conduct after-the-fact reviews of the medical charts of Medicare enrollees who had had certain surgeries.34 In areas with high use of the procedures, the study found that the share of surgeries that was clinically appropriate ranged from about 35 percent to about 70 percent; the remainder were either clinically inappropriate or of equivocal value. In low-use areas, the share considered appropriate ranged from about 40 percent to about 80 percent. In other words, the share of procedures deemed appropriate was slightly higher in the low-use areas, but that share was well below 100 percent in both high-use and low-use areas.
Options for Organizing and Funding New Federal Research Efforts
The approach that is taken for organizing and funding any increased federal efforts to support research on comparative effectiveness could play an important role in determining their impact. Some approaches would seek to insulate those efforts from political pressure by setting up an organization at "arm’s length" from the government and by providing a dedicated source of financing. Many of the options that have been proposed seek to coordinate and centralize existing activities through one entity—which would tend to give any conclusions it reached more weight—but developing several competing sources of information about comparative effectiveness could also have value.
Specific options that have been put forward for organizing federal research on comparative effectiveness include the following (each of which could have many variants):35
■
Expanding the role of an existing agency that already conducts or oversees research on health services generally—and comparative effectiveness specifically—such as AHRQ or NIH.
■
Creating or "spinning off" a new agency, either within the Department of Health and Human Services or as an independent body that is part of either the executive or the legislative branch. The Federal Trade Commission and the Medicare Payment Advisory Commission (MedPAC) are potential models for such an option.
■
Augmenting an existing quasi-governmental organization, such as the Institute of Medicine or the National Research Council. Such entities are often Congressionally chartered, but they are not subject to regular governmental oversight.36 Even so, the Institute of Medicine receives most of its funding from government agencies, which is provided to finance specific studies that have been requested.
■
Establishing a new public–private partnership to oversee and direct research. That option could be structured in various ways, but one such approach would be to set up a federally funded research and development center (FFRDC). FFRDCs are not-for-profit organizations that can accept some private payments but that get most of their funding from a federal agency that provides oversight and monitoring.
Regardless of the type of organization, several potential mechanisms (either individually or in combination) could be used to fund research on comparative effectiveness. Federal spending could be authorized and appropriated annually, as with other discretionary programs. Alternatively, funding could be drawn from Medicare’s Hospital Insurance trust fund (which is financed primarily by payroll taxes) or specified as a percentage of mandatory federal outlays on health insurance programs.37 Instead of or in addition to using existing sources of revenues, another option would be to require direct contributions from the health sector. For example, a new tax on health insurance premiums or other payments within the health sector could be established, with the resulting revenues dedicated to research on comparative effectiveness.
Trade-offs might arise between an entity’s independence, credibility with the medical profession, and ability to reach controversial conclusions, on the one hand, and its accountability and responsiveness to policymakers and to other interested parties, on the other. For example, funding through appropriations would allow lawmakers to assess the new entity’s contributions and accomplishments and to balance spending on those efforts against other federal priorities on an annual basis. But some observers have raised concerns that relying on annual appropriations would leave a new entity vulnerable to outside pressure and thus reluctant to undertake controversial studies or to reach conclusions that might generate opposition from affected groups. Indeed, the elimination of agencies engaged in such research that were funded by annual appropriations—or in the case of AHRQ, the occasional threat of elimination or substantial cuts in funding—may suggest the need for a different arrangement.
Alternatively, housing the new activities in an organization that was separate from the federal government and establishing automatic or dedicated funding mechanisms would give a new entity greater autonomy and potentially more influence on doctors and other health professionals. To be sure, lawmakers could change any funding formula that had been established—as is done frequently in Medicare—mitigating the degree to which the entity would lack oversight. Even with automatic funding, policymakers would want to periodically review the activities they were funding either to consider changes in the levels of spending or to adjust any funding formula to keep dedicated resources in line with spending trends—which could also provide a vehicle for pressure from interest groups. Nevertheless, automatic or dedicated funding mechanisms would tend to limit the influence of political pressure to some extent. But such mechanisms also would raise questions about how the entity set its priorities and allocated resources—and how it would be held accountable for those decisions. A nongovernmental organization might be able to act more quickly than a federal agency, but that speed could come at the expense of transparency.
Under any option, an advisory board (or governing council) could be established to serve several functions: providing guidance to the entity and establishing priorities for its research projects; creating an independent process for reviewing and possibly approving the findings that resulted from that research; and serving as a channel for interested parties to participate. For example, the board could include representatives of major federal health programs, private insurers, health care providers, advocacy groups for patients, and drug and device makers—as well as members of the general public and disinterested policy experts. Alternatively or in addition to including various stakeholders, a regular process could be established for getting input from interested parties. An example of that type of structure is the U.S. Preventive Services Task Force (see Box 2).
Box 2. The U.S. Preventive Services Task Force
The U.S. Preventive Services Task Force was established in 1984 by the Department of Health and Human Services to produce recommendations about which preventive health care services should be routinely provided to individuals who do not have any symptoms of a given disease. Such services include immunizations, tests to screen for the presence of diseases, and behavioral counseling (such as programs that encourage smokers to quit).1
The size and composition of the task force has varied over time, ranging from 10 to 20 members; the members are not federal employees but have generally been practicing clinicians. The task force’s work is currently supported by the Agency for Health Care Research and Quality (AHRQ), with an annual budget of about $3 million. As a rule, the task force does not fund studies that evaluate preventive services but instead relies on existing evidence. Two research centers that AHRQ has designated generate summaries of that evidence—which are similar to but perhaps not as rigorous as systematic evidence reviews. Given the available time and resources, the task force has not sought to review all preventive services but instead has assigned priority to services that address significant health problems, that are likely to have new evidence available, or that have generated controversy about their use.
In developing its recommendations, the task force considers both the strength of the evidence and the magnitude of the expected benefits and risks. Risks can include adverse reactions to vaccines, false-positive test results that lead to unnecessary or even harmful follow-up care, and complications from invasive test procedures—which can have substantial aggregate effects even if their probabilities are low, because preventive services may be provided to very large numbers of people. The task force’s recommendations cover which types of asymptomatic individuals should receive the services, taking into account how the risk of contracting a condition or disease varies by age, sex, and other factors.
The task force has presented its recommendations in a periodic series of reports, the most recent of which covers about 60 specific services. Those services are now given a letter grade, as follows:
- A, for services that are strongly recommended on the basis of solid evidence that the benefits of improved outcomes outweigh the risks of harm;
- B, for services that are recommended on the basis of reasonable evidence of net benefits;
- C, for services with no recommendation because the balance of benefits and risks is too close;
- D, for services that should not be routinely provided because the evidence indicates the services are ineffective or that the risks outweigh the benefits; and
- I, for services that do not have sufficient evidence on which to base a recommendation.
Initially, when formulating recommendations, the task force did not take into account the costs of providing preventive services or their cost-effectiveness.2 According to one recent summary, however, the task force now "considers the total economic costs that result from providing a preventive service, both to individuals and to society, in making recommendations, but costs are not the first priority."3 Although some immunizations against a disease have been shown to reduce total spending on health care, many other preventive services appear to increase spending on net—either because of the costs of providing those services to large segments of the population (only some of whom will be found to have the disease) or because the overall effects on treatment costs are modest. Analyses of cost-effectiveness would shed light on how the health benefits of preventive services compared with those increases in spending.
1. For a general discussion, see Eileen Salinsky, Clinical Preventive Services: When Is the Juice Worth the Squeeze? Issue Brief No. 806 (Washington, D.C.: National Health Policy Forum, August 24, 2005).
2. See Somnath Saha and others, "The Art and Science of Incorporating Cost-Effectiveness in Evidence-Based Recommendations for Clinical Preventive Services," American Journal of Preventive Medicine, vol. 20, no. 3 (April 2001), pp. 36–43.
3. Russell P. Harris and others, "Current Methods of the U.S. Preventive Services Task Force," American Journal of Preventive Medicine, vol. 20, no. 3 (April 2001), pp. 21–35.In designing such an oversight group, a number of issues would arise. The types of participants on any board and the manner in which members were chosen and replaced would have to be determined carefully to avoid giving one perspective undue influence. Similarly, conflict-of-interest rules governing the entity’s staff would probably be needed. Trade-offs could exist between the extent to which many views and interests were represented and the ability of the council or board to make timely decisions or to reach consensus on contentious issues. Whether any oversight group was involved in reviewing or approving the results of research projects or focused instead on which projects to initiate and what those reviews entailed would also affect the entity’s staffing requirements and the types of expertise that board members needed.
Another organizational issue is whether to establish a single or highly centralized entity or, instead, to design a more loosely coordinated system encompassing several distinct centers to produce independent analyses. Many of the options that have been proposed seek to centralize research activities through one entity—partly to address concerns about the lack of coordination among current U.S. efforts. An advantage of that centralized approach is that it would tend to give more weight to any conclusions reached. At the same time, that potential for having a greater impact could also lead the organization to adopt findings that were watered down to reach consensus; even if the entity did not have a formal approval process and instead simply released any results of approved projects, a single agency might be more reluctant to pursue research into more contentious questions. A decentralized approach could give individual research centers more latitude and encourage more competing perspectives to emerge. However, a more pluralistic approach could also involve some redundant efforts and, if it yielded any conflicting findings, would leave users with the task of reconciling the results.
An additional consideration—particularly if a new entity was created—would involve start-up costs and other implementation challenges. If funds were directed through an existing federal agency, some ongoing costs for additional staffing would be incurred, but the basic support infrastructure would largely exist already. By contrast, establishing a new agency or public–private partnership could require a greater effort before research could begin. At the same time, a quasi-governmental organization or public–private partnership could have more flexibility to develop and maintain its staff than a new or existing federal agency would have. Creating a new source of revenues (such as a tax on health insurance premiums) to help fund research on comparative effectiveness would also involve time and administrative costs.
Among existing organizations, their relative strengths and weaknesses could affect which one was best suited for new research efforts. NIH has extensive experience overseeing clinical trials but may not see research on comparative effectiveness as central to its mission of expanding the frontiers of biological and medical knowledge. AHRQ has substantial expertise in many areas of comparative effectiveness but has limited experience managing trials, and some observers have raised concerns about the impact that significantly expanded research about comparative effectiveness might have on that agency’s other research endeavors. For its part, the Institute of Medicine is widely respected but does not have an extensive organizational capacity to conduct or oversee primary research, and some observers believe its consensus-building process could make timely action difficult.
Among the options for a new entity, establishing an FFRDC has generated some interest, partly on the grounds that it would be somewhat insulated from political pressure. But most of an FFRDC’s funding would have to come from a federal agency, so it is not clear why its activities (most of which, presumably, would also be contracted out to private researchers) would be subject to less pressure than the activities of an agency receiving direct funding. The argument is sometimes made that private contributions would make private payers more likely to accept and use the results of the research. If such contributions were voluntary, however, the incentives to make them would be modest because the benefits of the research would accrue to many parties. If such contributions were instead required, then the arrangement would be essentially equivalent to having the government collect the money and appropriate the funds via a federal agency.
More generally, competing perspectives exist about how the relative roles of public and private payers in funding research on comparative effectiveness would affect perceptions about the results of that research. In some quarters, the findings of research funded by the government are seen as reflecting political pressure, perhaps to accommodate the views of interest groups or to support budgetary objectives. Those concerns could be attenuated to some degree if the agency conducting the research was not also a payer for health care, such as CMS. At the same time, other observers have raised concerns about privately sponsored research, which is also seen as advancing cost-cutting objectives (if sponsored by insurers) or as promoting the interests of drug and device manufacturers and of providers of health services.
Options for Comparing the Effectiveness of Treatments
The appropriate organizational form for any new or expanded federal entity, along with the mechanism and level of funding, may depend in large part on what activities it would carry out. For example, analyzing existing data would require a different set of skills, and would cost less, than overseeing new clinical trials that compared different treatments. In addition to setting priorities among the various methods of research, a new or expanded entity would have to define the scope of its analyses—both the types of comparisons it would commission and the questions that analyses would address. In particular, would the organization focus only on trying to determine which treatments conferred the greatest medical benefits, or would it also assess which treatments were most cost-effective? Whatever approach was taken, the manner in which the results were communicated to doctors, patients, and health insurers could play an important role in determining the impact on medical practice.
Federal efforts to assess different treatment options could be pursued in a variety of ways. Options range from synthesizing existing research—a process known as a systematic review—to conducting new studies using data that are already available to funding new head-to-head clinical trials. Although those options are not mutually exclusive—indeed, they could all be pursued at the same time—each one presents certain challenges, with potential trade-offs arising between the costs of the activities and the value of the information they provide.
Systematic Reviews of Existing Research. The approach that would probably be easiest to implement would be to review and summarize the results of existing studies in a systematic and rigorous way. For example, even though existing studies may only compare a single treatment to a placebo, the results of several studies of individual therapies could in some cases be combined to measure those treatments against one another. That effort could also critically assess the strengths and weaknesses of the available evidence and seek to reconcile conflicting findings or determine what the preponderance of the evidence indicated. Such reviews would be comparable to some of the work that AHRQ is already undertaking and to some current efforts based at universities or other public and private research centers such as ECRI and Hayes, Inc. One advantage of this approach is its relatively low expense; a single systematic review might cost a few hundred thousand dollars.
Because the evidence base for comparing treatment regimens is itself limited, however, how much additional insight can be gleaned from systematic reviews of existing research is not clear. Data from clinical trials that had already been conducted would naturally be the focus of any systematic review, because trials can provide the clearest evidence about a treatment’s effects, but such studies also have limitations. Some analyses have indicated that clinical trials sponsored by interested parties—which is often the only source of such data—are more likely than independent studies to find favorable results.38
Another potential limitation is that existing information may not be sufficient to reach definitive conclusions. Studies may be difficult to compare or reconcile, either because they use different methodologies or analyze different populations of patients, or simply because they yield conflicting findings. For example, a number of independent studies have examined different screening techniques for colorectal cancer, each of which provides an estimate of the cost per enrollee for each increase in QALYs. But according to a recent review of those studies, the results varied to such an extent that reaching a definitive conclusion about which technique was most effective or most cost-effective was difficult (see Table 2).39
Cost-Effectiveness of Different Screening Methods for Colorectal Cancer
Screening Method Lowest Highest Colonoscopy Every 5 Years 17,316 36,612 Every 10 Years 10,633 26,693 Fecal Occult Blood Testing Annually 4,643 25,860 Every 3 Years 2,942 10,861 Sigmoidoscopy Annually 1,391 a 1,391 a Every 3 Years 16,318 20,727 Every 5 Years 14,384 b 42,310 &nbs