Performance Statistics for Doctors
How New Low Back Pain Metrics Continue Turning Doctors into Data Clerks Instead of Healers
We all want a high-quality spine surgeon.
Wouldn’t it be nice if surgeons had statistics like baseball players? Imagine pulling up a website and seeing batting averages, earned run averages, and on-base percentages for doctors. Pick the best one. Problem solved.
It is an admirable goal. In 2015, Congress mandated physician quality reporting under what became the Merit-based Incentive Payment System. Doctors report their statistics. Medicare penalizes those who score poorly.
Except medicine is not baseball. In baseball, a neutral scorekeeper tracks every hit and every out. The game has discrete outcomes. A home run is always good for the batter and always bad for the pitcher. It is a zero-sum contest with standardized rules.
Yet, in healthcare, there is no neutral scorekeeper. Doctors are largely responsible for documenting and submitting their own metrics. The “opponent” is disease, which is not standardized. Patients are not interchangeable. One patient’s success may be another patient’s disappointment. A small reduction in pain may be life-changing for one person and meaningless for another.
Despite this complexity, Medicare has invested billions of dollars over the past few decades developing and refining quality metrics. The Centers for Medicare and Medicaid Services (CMS) must decide which outcomes matter, how to measure them, and how to adjust for how sick patients are at baseline. Universities and consulting firms have made billions off CMS to design and validate these formulas.
A recent example shows how messy this becomes in practice.
Medicare is rolling out a mandatory program called the Ambulatory Specialty Model for Low Back Pain. Under this model, anyone who treats back pain is graded on a series of quality metrics. If you’re a spine surgeon in one of the selected regions and you see enough Medicare patients with back pain, this is now mandatory or your entire Medicare paycheck gets docked up to 12%.
The program mandates depression screening using a specific instrument such as the PHQ-9. It requires documentation of body mass index and a follow-up plan if outside normal range. It forces clinics to track and report high-risk medication use. And, lastly, it requires the use of a licensed commercial outcomes platform called Focus on Therapeutic Outcomes, or FOTO.
Now imagine seeing a spine surgeon for a herniated disc in 2027. Before you even discuss your MRI, the clinic must screen you for depression with a specific nine-question form, document your BMI with an approved ‘follow-up plan’ if it’s off, flag any high-risk meds, and feed your answers into a licensed commercial database called FOTO—because Medicare grades the surgeon on all of it. Miss the boxes, and up to 12% of every Medicare dollar the surgeon earns (not just your visit) gets cut. No patient on earth chooses their surgeon based on PHQ-9 compliance.
All of these may be reasonable clinical considerations. A competent surgeon should care about depression, obesity, medication safety, and outcomes. But caring about them is different from turning them into payment metrics. No patient chooses a spine surgeon based on how accurately that surgeon completes a PHQ-9 depression form. No one asks how their surgeon scores on BMI documentation compliance.
This is where Goodhart’s Law applies: when a measure becomes a target, it ceases to be a good measure. As economist Charles Goodhart first observed, any statistical regularity tends to collapse once you start using it for control.
Once reimbursement is tied to a depression screening rate, the incentive shifts. The goal is no longer thoughtful assessment of mental health. The goal is documented compliance. Once outcomes are tied to a specific platform, the incentive is no longer improving recovery. The incentive is ensuring the right boxes are checked in the right software.
Risk adjustment adds another layer of distortion. Programs attempt to adjust scores based on how sick patients are. In theory, this prevents doctors who treat complex patients from being penalized. In practice, however, it creates a powerful incentive to document patients as severely ill as possible. More coded diagnoses increase measured risk. Higher measured risk can make outcomes look better relative to expectations. One study showed that documentation intensity alone can increase margins by 40%. There’s a better return on investment for improving coding over improving care.
Meanwhile, the system-wide costs are real. Independent physician practices spend billions of dollars annually reporting quality measures. In an average outpatient clinic, physicians spend 2.6 hours per week on metric reporting, and nonphysician staff spend another 12.5 hours per doctor. Hospitals devote entire teams to quality reporting, with an average community hospital employs multiple full-time staff solely for compliance. Larger institutions have reported devoting more than 100,000 person-hours annually to metric reporting.
Despite this investment, the evidence that these programs meaningfully improve outcomes is poor. Government Accountability Office reports have questioned whether current metrics meet strategic objectives. Less than half of endorsed measures demonstrate clear clinical validity. Physician performance scores under these programs are inconsistently correlated with actual patient outcomes. Some high-profile programs, such as the Hospital Readmissions Reduction Program, have even been associated with unintended increases in mortality for certain conditions. Even patient satisfaction scores, widely embraced as a measure of quality, have been associated in some studies with higher costs and worse mortality.
This metric obsession also accelerates consolidation. Large hospital systems can spread compliance costs across thousands of physicians. Independent practices cannot. When reporting requirements become too burdensome, small practices sell to larger systems. Patients do not see a line item labeled “quality compliance” on their bill, but they do see higher facility fees after consolidation.
Ironically, Medicare’s founding statute includes language stating that “nothing in the program authorizes federal officers to exercise supervision or control over the practice of medicine or the manner in which medical services are provided.” Yet increasingly, clinical workflows are shaped not by medical judgment alone, but by compliance with centrally designed metrics.
Doctors are not baseball players. Patients are not box scores. A home run is universally good for a hitter. In medicine, success is individualized. One patient may want maximal pain relief. Another may prioritize avoiding surgery. Another simply wants to be heard.
Reducing these human encounters to a series of standardized forms risks confusing documentation with care. Yes, quality and accountability matter, but when measurement becomes an end in itself, it can crowd out the very professionalism it seeks to improve.
As Mark Twain said, “Not everything that counts can be counted. And not everything that can be counted counts.”


