Thursday, April 30, 2009

Luck that Looks Like Logic? Statins (Rosuvastatin), the Cholesterol Hypothesis, and Causal Pathways

The Cholesterol Hypothesis (CH), namely that the association between elevated cholesterol (LDL) and cardiovascular disease and events is a CAUSAL one, and thus that intervening to lower cholesterol prevents these diseases has seduced mainstream medicine for decades. However, much if not most of the evidence for the causality of cholesterol in atherogenesis and its reversal by lowering cholesterol derives from studies of "Statins" or HMG-CoA-reductase inhibitors; indeed the evidence that lowering LDL cholesterol (or raising HDL) through other pathways has salutary effects on cardiovascular outcomes is scant at best as has been chronicled on this blog (see posts on torcetrapib and ezetimibe/Vytorin). Not myself immune to the beguiling allure of the CH, I admit that I take Niacin, in spite of normal HDL levels and scant to no trustworthy evidence that, in addition to raising HDL and lowering LDL, it will have any primary (or secondary or tertiary) preventative effects for me.

In yesterday's NEJM, Glynn et al report the results of analysis of data on a secondary endpoint from the JUPITER trial of Rosuvastatin. (http://content.nejm.org/cgi/content/abstract/360/18/1851 .) The primary aim of the trial was to determine if Rosuvastatin was effective for primary prevention of cardiovascular events in people with normal cholesterol levels and elevated CRP levels. The secondary endpoint described in the article was the occurrence of venothromboembolism during the study period. Because I see no obvious evidence of foul play, and because this study was simply impeccably designed, conducted, and reported, I'm going to hereafter ignore the fact that it was industry sponsored, and that there is probably some motive of "off-label promotion by proxy" (http://medicalevidence.blogspot.com/2008/06/off-label-promotion-by-proxy-how-nejm.html .) here...

Lo and behold: Rosuvastatin lowered venothromboembolism rates. The difficulties posed by ascertainment of this outcome notwithstanding, this trial has convincing evidence of a statistically significant reduction in DVT and PE event rates (which were very low - ~0.2%/100 persons/year) during the four year period of study. And this does not make a whole lot of sense from the standpoint of the CH. There's something more going on. Like an anti-inflammatory property of Statins. Which is very interesting and noteworthy and worthwhile in its own right. But I'm more interested in what kind of light this sheds on the validity of the CH.

Because of my interest in the fraility of the normalization hypothesis/heuristic (the notion that you just measure something and then raise or lower it to the normal range and make things ALL better) I am obviously a reserved skeptic of the Cholesterol Hypothesis, which was bolstered by if not altogether reared by data from trials of statins. And these new data, combined with emerging evidence that statins may have salutary effects on lung inflammation in ARDS and COPD, among perhaps others, make me wonder - was it just pure LUCK rather than a triumph of LOGIC that the first widely tested and marketed drug for cholesterol happened to both reduce cardiovascular endpoints AND lower cholesterol, even though not necessarily as part of the same causal pathway? Is it just "true, true, and unrelated?" Are they the anti-inflammatory properties or some other piece of the complex biochemical effects of these drugs on the body that leads to their clinical benefits? Other examples come to mind: Is blood pressure lowering just an epiphenomenon of another primary ACE-inhibitor effect on heart failure? Because these effects appear to be superficially and intuitively related does not mean that they are an obvious causal pathway.

What if things had happened another way. What if Statins had eluded discovery for another 20-30 years. What if study of the cholesterol hypothesis meanwhile proceeded through evaluation of Cholestyramine, Cholestipol, Niacin, and other drugs, and what if it had been "disconfirmed" by failure of these agents to reduce cardiovascular outcomes? These hypotheticals will be answerable only after more study of Statins and other drugs as well as their mechanisms. The data presented by the Harvard group as well as their other work with CRP are but one leg of a long journey toward elucidation of the biological mechanisms of atherogenesis, coagulation, and downstream clinical events.

Tuesday, April 21, 2009

Judicial use of DNA "evidence" and Misuse of Statistics: The Prosecutor's Fallacy

A recent article in the NYT described the adoption by the judicial system of a technology that began as a biomedical research tool (I resist to some extent the notion that DNA technology has directly been a boon to clinical patient care.) (See: http://www.nytimes.com/2009/04/19/us/19DNA.html.) This powerful technology, when used appropriately in appropriate circumstances, provides damning evidence of guilt because of its high specificity - the probability of a coincidental match is stated to be as low as 1x10-9. Thus, in a case such as that of the infamous (and nefarious) OJ Sipmson, in which there is strong suspicion of guilt BEFORE the DNA evidence is evaluated, a positive match, in the absence of laboratory error or misconduct (neither of which can be routinely discounted - see: http://www.nytimes.com/2001/09/26/us/police-chemist-accused-of-shoddy-work-is-fired.html) essentially proves, beyond any reasonable doubt, the genetic identity of the person to whom the sample belongs. (Yes, that does indeed mean that OJ Simpson is the perpetrator of the heinous murder of Nicole Brown Simpson, he said unapologetically.)

In the case of old OJ, he was one among perhaps 10, let's say 100 suspects. Let's assume that the LAPD had their act together (this also requires a leap of faith) and that the perpetrator is among the suspects that have been rounded up, but we have no evidence to differentiate their respective probabilities of guilt. Thus, each of the 100 has a 1% probability of being guilty, on the basis of circumstantial evidence alone, or a relation to or relationship with the victim(s) or just being in the wrong place at the wrong time, whatever. Given that 1% probability of guilt, we can make a 2x2 table representing the the probability of guilt given a positive test, which is ultimately what we want to know. I don't know the sensitivity of DNA fingerprinting, but it doesn't really matter because the high specificity of the test drives the likelihood ratio. I will assume it's 50% for simplicity:


In this "population" of 100 suspects (by suspects, I mean persons whose probability of having committed the crime is enhanced over that of a random member of the overall population by virtue of other evidence), even if all 100 suspects have equiprobable guilt, a DNA "match" is damning indeed and all but assures the guilt of the matching suspect (with the caveats mentioned above.)

But consider a different situation, one in which there are no convincing suspects. Suppose that the law enforcement authorities compare a biological sample with a large DNA database to look for a match. Note that we do not use the term "suspect" here - because it implies that there is some suspicion that has limited this population from the overall population. When a database (of unsuspected persons) is canvassed, no such suspicion exists. Rather, a fishing expedition ensues, and the probabilities, when computed, come out quite different. Suppose there are DNA samples from 100 million individuals in the database, and the entire database is canvassed. Now our 2x2 table looks like this:


Whereas in our previous example of a population of "suspects" guilt was all but assured based on a "match", in this example of canvassing a database, guilt is dubious. But what do you suppose will happen in such an investigation? Who will suspend his judgment and conduct a fair investigation of this "matching" individual, who is now a "suspect" based only on "evidence" from this misused test? How tempting will it be for detectives to selectively gather information and see reality through the distorted lens of the "infallible" DNA testing? How can such a person hope to exonerate himself?

This is the Prosecutor's Fallacy. It bolsters arguments by the ACLU and others that the trend of snowballing DNA sample collection should be curtailed, and that limits should be placed on canvassing efforts to solve crimes.

One way to limit the impact of the Prosecutor's Fallacy and false positive "matches" from canvassing efforts would be to force investigators to assign certain profiles to the imaginary "suspect" whom they hope to find in the database and to canvas a subgroup of the database that matches those characteristics. For example, if the crime occurred in Seattle, the canvassing effort could be limited to a subset of the database that lived in or near Seattle, since it is unlikely that a person in Baltimore committed the crime. Other characteristics that are probabilistically associated with certain crimes could be used to limit broad canvassing efforts.

As the use of medical technology expands both inside and outside medicine, we have a responsibility to utilize it wisely and rationally. The strategy of database screening and canvassing is reckless, unwise, and unjust, and should be summarily and duly curtailed.

Wednesday, April 8, 2009

The PSA Screening Quagmire - If Ignorance is Bliss then 'Tis Folly to be Wise?

The March 26th NEJM was a veritable treasure trove of interesting evidence so I can't stop after praising NICE-SUGAR and railing on intensive insulin therapy. If 6000 patients (40,000 screened) seemed like a commendable and daunting study to conduct, consider that the PLCO Project Team randomized over 76,000 US men to screening versus control (http://content.nejm.org/cgi/reprint/360/13/1310.pdf) and the ERSPC Investigators randomized over 162,000 European men in a "real-time meta-analysis" of sorts (wherein multiple simultaneous studies were conducted with similar but different enrollment requirements and combined; see: http://content.nejm.org/cgi/reprint/360/13/1320.pdf.)   This is, as the editorialist points out a "Hurculean effort" and that is fitting and poignant - because ongoing PSA screening efforts in current clinical practice represent a Hurculean effort to reduce morbidity and mortality of this disease and this reinforces the importance of the research question - are we wasting our time? Are we doing more harm than good?

The lay press was quick to start trumpeting the downfall of PSA screening with headlines such as "Prostate Test Found to Save Few Lives" . But for all their might, both of these studies give me, a longtime critic of cancer screening efforts, a good bit of pause. (Pulmonologists may be prone to "sour grapes" as a result of the failures of screening for lung cancer.)

Before I summarize briefly the studies and point out some interesting aspects of each, allow me to indulge in a few asides. First, I direct you to this interesting article in Medical Decision Making "Cure Me Even if it Kills Me". This wonderful study in judgment and decision making shows how difficult it is for patients to live with the knowledge that there is a cancer, however small growing in them. They want it out. And they want it out even if they are demonstrably worse off with it cut out or x-rayed out or whatever. It turns out that patients have a value for "getting rid of it" that probably arises from the emotional costs of living knowing there's a cancer in you. I highly recommend that anyone interested in cancer screening or treatment read this article.

This article invokes in me an unforgettable patient from my residency whom we screened in compliance with VA mandates at the time. Sure enough, this patient with heart disease had a mildly elevated PSA and sure enough he had a cancer on biopsy. And we discussed treatments in concert with our Urology colleagues. While he had many options, this patient agonized and brooded and could not live with the thought of a cancer in him He proceeded with radical prostatectomy, the most drastic of his options. And I will never forget that look of crestfallen resignation every time I saw him after that surgery because he thereafter came to clinic in diapers, having been rendered incontinent and impotent by that surgery. He was more full of self-flagellating regret than any other patient I have seen in my career. This poor man and his experience certainly jaded me at a young age and made me highly attuned to the pitfalls of PSA screening.

Against this backdrop where cancer is the most feared diagnosis in medicine, we feel an urge towards action to screen and prevent, even when there is a marginal net benefit of cancer screening, and even when other greater opportunities for improving health exist. I need not go into the literature about [ir]rational risk appraisal other than to say that our overly-exuberant fear of cancer (relative to other concerns) almost certainly leads to unrealistic hopes for screening and prevention. Hence the great interest in and attention to these two studies.

In summary, the PLCO study showed no reduction in prostate-cancer-related mortality from DRE (digital rectal examination) and PSA screening. Absence of evidence is not evidence, however, and a few points about this study deserve to be made:

~Because of high (and increasing) screening rates in the control group, this was essentially a study of the "dose" of screening. The dose in the control group was ~45 and that in the screening group was ~85%. So the question that the study asked was not really "does screening work" but rather "does doubling the dose of screening work". Had there been a favorable trend in this study, I would have been tempted to double the effect size of the screening to infer the true effect, reasoning that if increasing screening from 40% to 80% reduces prostate cancer mortality by x%, then increasing screening from 0% to 80% would reduce it by 2x%. Alas this was not the case with this study which was underpowered.

~I am very wary of studies that have cause-specific mortality as an endpoint. There's just too much room for adjudication bias, as the editorialist points out. Moreover, if you reduce prostate cancer mortality but overall mortality is unchanged, what do I, as a potential patient care? Great, you saved me from prostate cancer and I died at about the same time I would have but from an MI or a CVA instead? We have to be careful about whether our goals are good ones - the goal should not be to "fight cancer" but rather to "improve overall health". The latter, I admit, is a much less enticing and invigorating banner. We like to feel like we're fighting. (Admittedly, overall mortality appears to not differ in this study, but I'm at a loss as to what's really being reported in Table 4.) The DSMB for the ESRCP trial argue here that cancer specific mortality is most appropriate for screening trials because of dilution by other causes of mortality, and because screening for a specific cancer can only be expected to reduce mortality for that cancer. From an efficacy standpoint, I agree, but from an effectiveness standpoint, this position causes me to squint and tilt my head askance.

~It is so very interesting that this study was stopped not for futility, nor for harm, nor for efficacy, but because it was deemed necessary for the data to be released because of the [potential] impact on public health. And what has been the impact of those data? Utter confusion. That increasing screening from 40% to 80% does not improve prostate specific mortality does not say to me that we should reduce screening to 0%. In fact I don't know what to do, nor what to make of these data. Especially in the context of the next study.

In the ERSPC trial, investigators found a 20% reduction in prostate cancer deaths with screening with PSA alone in Europe. The same caveats regarding adjudication of this outcome notwithstanding, there are some very curious aspects of this trial that merit attention:

~This trial was, as I stated above, a "real-time meta-analysis" with many slightly different studies combined for analysis. I don't know what this does to internal or external validity because this is such an unfamiliar approach to me, but I'll be pondering it for a while I'm sure.

~I am concerned that I don't fully understand the way that interim analyses were performed in this trial, what the early stopping rules were, and whether a one-sided or two-sided alpha was used. Reference 6 states that it was one-sided but the index article says 2. Someone will have to help me out with the O'Brien-Fleming alpha spending function and let me know if 1% spending at each analysis is par for the course.

~As noted by the editorialist, we are not told what the "contamination rate" of screening in the control group is. If it is high, we might use my method described above to infer the actual impact of screening.

~Look at the survival curves that diverge and then appear to converge again at a low hazard rate. Is it any wonder that there is no impact on overall mortality?


So where does this all leave us? We have a population of physicians and patients that yearn for effective screening and believe in it, so much so that it is hard to conduct an uncontaminated study of screening. We have a US study that is stopped prematurely in order to inform public health, but which is inadequate to inform it. We have a European study which shows a benefit near the a priori expected benefit, but which has a bizarre design and is missing important data that we would like to consider before accepting the results. We have no hint of a benefit on overall mortality. We have lukewarm conclusions from both groups, and want desperately to know what the associated morbidities in each group are. We are spending vast amounts of resources and incurring an enormous emotional toll on men who live in fear after a positive PSA test, many of whom pay dearly ("a pound of flesh") to exorcise that fear. And we have a public over-reaction to the results of these studies which merely increase our quandary.

If ignorance is bliss, then truly 'tis folly to be wise. Perhaps this saying applies equally to individual patients, and the investigation of PSA screening in these large-scale trials. For my own part, this is one aspect of my health that I shall leave to fate and destiny, while I focus on more directly remediable aspects of preventive health, ones where the prevention is pleasurable (running and enjoying a Mediterranean diet) rather than painful (prostatectomy).

Sunday, April 5, 2009

Another [the final?] nail in the coffin of intensive insulin therapy (Leuven Protocol) - and redoubled scrutiny of single center studies

In the March 26th edition of the NEJM, the NICE-SUGAR study investigators publish the results of yet another study of intensive insulin therapy in critically ill patients: http://content.nejm.org/cgi/content/abstract/360/13/1283 .

This article is of great interest to critical care practitioners because intensive insulin therapy (Leuven Protocol) or some diluted or half-hearted version of it has become a de facto standard of care in ICUs across the nation and indeed worldwide; and because it is an incredibly well-designed and well-conducted study. My own interest derives also from my own [prescient] letter to the editor of the NEJM after the second Van den Berghe study (http://content.nejm.org/cgi/content/extract/354/19/2069 , the criticisms I levied against this therapy on this blog after another follow-up study recently showed negative results (http://medicalevidence.blogspot.com/2008/01/jumping-gun-with-intensive-insulin.html ), and in a recent paper railing against the "normalization heuristic" (http://www.medical-hypotheses.com/article/S0306-9877(09)00033-4/abstract ). The results of this study also add to the growing evidence that intensive control of hyperglycemia in other settings may not be beneficial (see the ACCORD and ADVANCE studies.)

The current study was designed to largely mirror the enrollment criteria and outcome definitions of the previous studies, had excellent follow-up, had well described and simple statistical analyses with ample power, and is well reported. Key differences between it and the original Van den Berghe study were the lack of high-calorie parenteral glucose infusions, and its multicenter design. This latter characteristic may be pivotal in understanding why the initially promising Leuven Protocol results have not panned out on subsequent study.

The results of this study can be summarized simply by saying that it appears that this therapy is of NO benefit and actually probably kills patients, in addition to markedly increasing the rate of very very severe hypoglycemia (6.3% increase, P<0.001). In contrast to Van den Berghe's second study in medical patients, there were no favorable trends towards reduction in ICU length of stay, time on the ventilator, or reduced organ failures. In short, this therapy appears to be a complete flop.

So why the difference? Why did this therapy, which in 2001 appeared to have such promise that it enjoyed rapid and widespread [and premature] adoption fail to withstand the basic test of science, namely, repeatability? I think that medical history will judge two factors to be responsible. Firstly, the massive dextrose infusions in the first study markedly jeporadized the external validity of the first (positive) Van den Berghe study - it's not that intensive insulin saves you from your illness, it saves you from the harmful caloric infusions used in the surgical patients in the first study.

Secondly, and this is related to the first, single center studies also compromise external validity. In a single center, local practice patterns may be uniform and idiosyncratic, so that the benefit of any therapy tested in such a center may also be idiosyncratic. Moreover, and I dare say, investigators at a single center may have more decisional latitude and control or influence over enrollment, ascentainment of outcomes, and clinical care of enrolled patients. The so-called "trial effect" whereby patients enrolled in a trial receive superior care and have superior outcomes may be more likely in single center studies. Such effects are of increased concern in trials whre total blinding/masking or treatment assignment is not possible. (Recall that in the Van den Berghe study, kan endocrinologist was consulted for insulin adjustments; in the current trial, a computerized algorithm controlled the adjustments.) Moreover still, for single center studies, investigators and the instutution itself may have more "riding on" the outcome of the study, and collective equipoise may not exist. As an "analogy of extremes", just for illustrative purposes, if you wanted to design a trial where you could subversively influence outcomes in a way that would not be apparent from the outside, would you design a single center study (at your own institution where your cronies were) or a large multicenter, multinational study? Which design would allow you to have more influence?

I LOVE the authors' concluding statement that "a clinical trial targeting a perceived risk factor is a test of a complex strategy that may have profound effects beyond its effect on the risk factor." This resonates beautifully with our conceptualization of the "normalization heuristic" and harkens to Ben Franklin's sage old saw that "He is the best physician who knows the worthlessness of the most medicines." I think that we now have more than ample data to assure us that intensive insulin therapy (i.e., targeting a blood sugar of 80-108) is a worthless medicine, and should be largely if not wholly abandoned.

Addendum 4/7/09: Also note the scrutiny of the only other "positive" study (with mortality as the primary endpoint) in critical care in the last decade: Rivers et al; see: http://online.wsj.com/article/SB121867179036438865.html .