Goodhart’s law isn’t as useful as you might think (2023)

140 points by yagizdegirmenci 8 months ago

If you are interested in these ideas, you should know that this essay kicks off a series of essays that culminates, a year later, with an examination of the Amazon-style Weekly Business Review:

https://commoncog.com/becoming-data-driven-first-principles/

https://commoncog.com/the-amazon-weekly-business-review/

(It took that long because of a) an NDA, and b) it takes time to put the ideas to practice and understand them, and then teach them to other business operators!)

The ideas presented in this particular essay are really attributed to W. Edwards Deming, Donald Wheeler, and Brian Joiner (who created Minitab; ‘Joiner’s Rule’, the variant of Goodhart’s Law that is cited in the link above is attributed to him)

Most of these ideas were developed in manufacturing, in the post WW2 period. The Amazon-style WBR merely adapts them for the tech industry.

I hope you will enjoy these essays — and better yet, put them to practice. Multiple executives have told me the series of posts have completely changed the way they see and run their businesses.

pinkmuffinere 8 months ago

Thanks for the note! This comment is a reminder to myself to read the series
- js2 8 months ago
  
  FYI, you can also upvote or favorite a comment, and then view those upvoted/favorited comments from your profile (same for submissions). Favorites are public.
  
  pinkmuffinere 8 months ago
  
  Ah thankyou! Didn't realize there is a favorites list, will make use of that

ang_cire 8 months ago

This doesn't really touch on the core of the issue, which is business expectations that don't match up with reality.

Business leaders like to project success and promise growth that there is no evidence they will or can achieve, and then put it on workers to deliver that, and when there's no way to achieve the outcome other than to cheat the numbers, the workers will (and will have to).

At some point businesses stopped treating outperforming the previous year's quarter as over-delivering, and made it an expectation, regardless of what is actually doable.

uxhacker 8 months ago

The article actually addresses this directly through Wheeler's distinction between "Voice of the Customer" (arbitrary targets/expectations) and "Voice of the Process" (what's actually achievable). The key insight is that focusing solely on hitting targets without understanding the underlying process capabilities leads to gaming metrics. Amazon's WBR process shows how to do this right - they focus primarily on controllable input metrics rather than output targets, and are willing to revise both metrics and targets based on what the process data reveals is actually possible. The problem isn't having targets - it's failing to reconcile those targets with process reality.
osigurdson 8 months ago

I think the problem is dimensionality. Business leaders naturally work in low dimensional space - essentially 1D increase NPV. However, understanding how this translates to high dimensional concrete action is what separates bad business leaders from good ones.

thayne 8 months ago

> Let’s demonstrate this by example. Say that you’re working in a widget factory, and management has decided you’re supposed to produce 10,000 widgets per month...

It then discusses ways that the factory might cheat to get higher numbers.

But it doesn't even mention what I suspect the most likely outcome is: they achieve the target by sacrificing something else that isn't measured, such as quality of the product (perhaps by shipping defective widgets that should have been discarded, or working faster which results in more defects, or cutting out parts of the process, etc.), or safety of the workers, or making the workers work longer hours, etc.

bachmeier 8 months ago

Just a side note that this usage isn't really the application Goodhart had in mind. Suppose you're running a central bank and you see a variable that can be used to predict inflation. If you're doing your job as a central banker optimally, you'll prevent inflation whenever that variable moves, and then no matter what happens to the variable, due to central bank policy, inflation is always at the target plus some random quantity and the predictive power disappears.

As "Goodhart's law" is used here, in contrast, the focus is on side effects of a policy. The goal in this situation is not to make the target useless, as it is if you're doing central bank policy correctly.

jjmarr 8 months ago

I can confirm this. We've standardized Goodhart's law creating a 90-day rotation requirement for KPIs. We found that managers would reuse the same performance indicators with minor variations and put them on sticky notes to make them easier to target.

Spivak 8 months ago

If your managers are doing that it's a strong signal your KPIs are a distraction and your managers are acting rationally within the system they're been placed.
They need something they can check easily so the team can get back to work. It's hard to find metrics that are both meaningful to the business and track with the work being asked of the team.
- musicale 8 months ago
  
  What kind of KPIs aren't either disconnected from what people actually do or validation of Goodhart's law?
  
  marcosdumay 8 months ago
  
  The ones not used to judge people.
  You can look at revenue and decide "hey, we have a problem here" and go research what's causing the problem. That's a perfectly valid used for a KPI.
  You can do some change by something like the Toyota process, saying "we will improve X", make the change, and track X out to see if you must revert or keep it. That is another perfectly valid use for a KPI.
  What you can't do is use them to judge people.
  
  stoperaticless 8 months ago
  
  Many of them.
  Its easy to fake one metric, it harder to consistenly paly around 100 of them.
  (But then it’s no longer KPIs probably, as one looking at the data needs to recognise that details and nuance are important)
hilux 8 months ago

Wow. That is an extremely cool idea - new to me.
Do you have enough KPIs that you can be sure that these targets also serve as useful metrics for the org as a whole? Do you randomize the assignment every quarter?
As I talk through this ... have you considered keeping some "hidden KPIs"?
- jjmarr 8 months ago
  
  I'm riffing on password rotation requirements and the meta-nature of trying to make Goodhart's law a target. I could've been a bit more obviously sarcastic.
  
  keeganpoppen 8 months ago
  
  well i think it also highlights that the seeds of truly interesting ideas are often buried in or can be found in jokes/intentionally bad ideas... or, at least, nearly all my best ideas originated from joke ideas-- ones that were not even intended to be constructive, instructive, etc. ... just jokes-- trying to reflect some truth about the world in a way that cuts to the core of it.
  can't say what the deep idea in this case is per se (haha (maybe the other commenter can shed light on that part)), but i guess if you have enough KPIs to be able to rotate them you have yourself a perpetual motion machine of the same nature as the one that some genius carried down from the mountain on stone tablets that we can sustain maximum velocity ad infinitum by splitting our work into two week chunks and calling them "sprints"... why haven't marathoners thought of this? (h/t Rich Hickey, the source of that amazing joke that i butcher here)
  maybe consciousness itself is nothing more than the brain managing to optimize all of its KPIs at the same time.
  
  jjmarr 8 months ago
  
  The more I think about it, the more I like the idea of constantly changing our methods of assessment.
  A 90 day target is questionable, but regularly changing the metrics are a good way to keep people from gaming them.
  
  not_knuth 8 months ago
  
  I mean, Poe's Law [0] and all, but I was quite surprised your comment was interpreted as anything but saracasm.
  [0] https://en.wikipedia.org/wiki/Poe%27s_law
  
  TheRealPomax 8 months ago
  
  Anyone who's worked a few jobs would read the comment and go "sure, I've worked under managers like that". It's not obvious sarcasm when the description is just something that happens.

godelski 8 months ago

Goodhart's law is often misunderstood and the author here seems to agree and disagree. Goodhart's law is about alignment. That every measure is a proxy for the thing you are actually after and that it doesn't perfectly align.

Here's the thing, there's no fixing Goodhart's Law. You just can't measure anything directly, even measuring with a ruler is a proxy for a meter without infinite precision. This gets much harder as the environment changes under you and metrics' utility changes with time.

That said, much of the advice is good: making it hard to hack and giving people flexibility. It's a bit obvious that flexibility is needed if you're interpreting Goodhart's as "every measure is a proxy", "no measure is perfectly aligned", or "every measure can be hacked"

nrnrjrjrj 8 months ago

I want to block some time to grok the WBR and XMR charts that Cedric is passionate about (for good reason).

I might be wrong but I feel like WBR treats variation (looking at the measure and saying "it has changed") as a trigger point for investigation rather than conclusion.

In that case, lets say you do something silly and measure lines of code committed. Lets also say you told everyone and it will factor into a perforance review and the company is know for stack ranking.

You introduce the LOC measure. All employees watch it like a hawk. While working they add useless blocks of code an so on.

LOC commited goes up and looks significant on XMR.

Option 1: grab champagne, pay exec bonus, congratulate yourself.

Option 2: investigate

Option 2 is better of course. But it is such a mindset shift. Option 2 lets you see if goodhart happened or not. It lets you actually learn.

kqr 8 months ago

These ideas come from statistical process control, which is a perspective that acknowledges two things:
(a) All processes have some natural variation, and for as long as outputs fall in the range of natural process variation, we are looking at the same process.
(b) Some processes apparently exhibit outputs outside of their natural variation. when this has happened something specific has occurred, and it is worth trying to find out what.
In the second case, there are many possible reasons for exceptional outputs:
- Measurement error,
- Failure of the process,
- Two interleaved processes masquerade as one,
- A process improvement has permanently shifted the level of the output,
- etc.
SPC tells us that we should not waste effort on investigating natural variation, and should not make blind assumptions about exceptional variation.
It says outliers are the most valuable signals we have, because they tell us we are not only looking at what we thought we were, but something ... else also.
shadowsun7 8 months ago

This is accurate. https://xmrit.com/articles/gift-exceptional-variation/

tqi 8 months ago

> I immediately glommed onto this list as a more useful formulation than Goodhart’s Law. Joiner’s list suggests a number of solutions: > Make it difficult to distort the system. > Make it difficult to distort the data, and

If companies knew how to make it difficult to distort the system/data, don't you think they would have done it already? This feels like telling a person learning a new language that they should try to sound more fluent.

abetusk 8 months ago

The article goes into (what I consider) actionable methods. Specifically:
* Create a finance department that's independent in both their reporting and ability to confirm metrics reported by other departments
* Provide a periodic meeting (for executives/mangers) that reviews all metrics and allows them to alter them if need be
* Don't try to provide a small number of measurable metrics or a "north star" single metric
The idea being that the review meeting of 500+ gives a better potential model. Further, even though 500+ metrics is a lot to review, each should be reviewed briefly, with most of them being "no change, move on" but allows managers to get a holistic feel for the model and identify metrics that are or are becoming outliers (positively or negatively correlated).
The independent finance department means that the reporting of bad data is discouraged and the independent finance department coupled with the WBR and its subsequent empowerment, allow for facilities to change the system.
The three main points (make difficult to distort the system, distort the data and provide facilities for change) need to be all implemented to have an effect. If only the "punishment" is provided (making it difficult to distort the system/data) without any facility for change is putting too much pressure without any relief.
llamaimperative 8 months ago

If they knew how to do it and that they should. I think Goodhart’s Law is useful to know about because what it’s really suggesting is that people are shockingly good, probably much better than you thought, at distorting the system.

osigurdson 8 months ago

Goodharts law basically states that false proxies are game-able. The solution is to stop wasting time on tracking false proxies. Instead, roll up your sleeves and do something.

godelski 8 months ago
```
  > that **false** proxies are game-able.
```
You say this like there are measures that aren't proxies. Tbh I can't think of a single one. Even trivial.
All measures are proxies and all measures are gameable. If you are uncertain, host a prize and you'll see how creative people get.
- osigurdson 8 months ago
  
  Try gaming NPV.
  
  godelski 8 months ago
  
  This is a joke, right?
  https://en.wikipedia.org/wiki/Net_present_value#Disadvantage...
  
  osigurdson 8 months ago
  
  Not at all. Don't conflate NPV predictions vs current NPV where all cashflows are known.
  
  godelski 8 months ago
  
  Well for one, it's hard not to conflate when you're expecting me to know what you mean as opposed to what you say.
  Second off, you don't think it's possible to hide costs, exaggerate revenue, ignore risks, and/or optimize short term revenue in favor of long term? If you think something isn't hackable you just aren't looking hard enough
  
  osigurdson 8 months ago
  
  What, in your opinion, is any for profit business ultimately trying to (legally) maximize? I'd say it is NPV. That can mean taking risks, optimizing short term revenue in favor of long term (and vice versa).
  
  TheCoelacanth 8 months ago
  
  Generally share price. NPV has a connection to that, but it's not the only factor investors are considering when buying and selling.
  
  godelski 8 months ago
  
  I'm not an expert in this space so I'm not comfortable giving precise examples. If you want a naive "I've read Wikipedia and a few articles via Google searching" I can give you that.
  But I'm suspicious of a claim that it can't be hacked. I've done a lot of experimental physics and I can tell you that you can hack as simple and obvious of a metric as measuring something with a ruler. This being because it's still a proxy. Your ruler is still an approximation of a meter and is you look at all the rulers, calipers, tape measures, etc you have, you will find that they are not exactly identical, though likely fairly close. But people happily round or are very willing to overlook errors/mistakes when the result makes sense or is nice. That's a pretty trivial system, and it's still hacked.
  With more abstract things it's far easier to hack, specifically by not digging deep enough. When your metrics are the aggregation of other metrics (as is the case in your example) you have to look at every single metric and understand how it proxies what you're really after. If we're keeping with economics, GSP might be a great example. It is often used to say how "rich" a country is, but that means very little in of itself. It's generally true that it's easier to increase this number when you have many wealthy companies or individuals, but from this alone you shouldn't be about to statements two countries of equal size where all the wealth is held by a single person or where wealth is equally distributed among all people.
  The point is that there's always baked in priors. Baked in assumptions. If you want to find how to hack a metric then hunt down all the assumptions (this will not be in a list you can look up unfortunately) and find where those assumptions break. A very famous math example is with the Banach-Taraki paradox. All required assumptions (including axiom of choice) appear straight forward and obvious. But the things is, as long as you have an axiomatic system (you do), somewhere those assumptions break down. Finding them isn't always easy, but hey, give it scale and Goodhart's will do it's magic
stoperaticless 8 months ago

> Goodharts law basically states that false proxies are game-able
Exactly (btw. very nice way to put it)
> stop wasting time on tracking false proxies
Some times a proxy is much cheaper. (Medical anlogy of limited depth: Instead of doing a surgery to see stuff in person, one might opt to check some ratios in the blood first)
- osigurdson 8 months ago
  
  >> check some ratios in the blood first
  This would not count as a false proxy however. The problem in software is, it is very hard to construct meaningful proxy metrics. Most of the time it ends up being tangential to value.
  
  stoperaticless 8 months ago
  
  Tangential value is still value :)
  I agree in principle, just want to add a bit of nuance.
  Lets take famous “lines of code” metric.
  It would be counterproductive to reward it (as proxy of productivity). But it is a good metric to know.
  For the same reason why it’s good to know the weight of ships you produce.
  
  osigurdson 8 months ago
  
  "Tangential to value", not "tangential value".
  The value in tracking false proxies like lines of code, accrues to the tracker, not the customer, business or anyone else. The tracker is able to extract value from the business, essentially by tricking it (and themselves) into believing that such metrics are valuable. It isn't a good use of time in my opinion, but probably a low stress / chill occupation if that is your objective.
hinkley 8 months ago

Or use the metrics to spot check your processes, and move on to other concerns before the histamine reaction starts.
In theory you can return to the metrics later for shorter intervals.
- alok-g 8 months ago
  
  Why not define additional metrics around what you would spot check and then produce a consolidated figure of merit as the metric to optimize. Aren't we doing that with AI training and general optimization (including regularization).
- osigurdson 8 months ago
  
  What are some example metrics that you might track that add value?
  
  hinkley 8 months ago
  
  With any new habit in life, you fixate on it for a very brief period and then try to let it fade into the background. Then you periodically revisit it to see how you’re doing.
  From a programming standpoint, and off the top of my head, I would include TDD, code coverage, and anything that comes out of a root cause analysis.
  I tell junior devs who ask to spend a little more time on every task than they think necessary, trying to raise their game. When doing a simple task you should practice all of your best intentions and new intentions to build the muscle memory.
  
  osigurdson 8 months ago
  
  >> TDD, code coverage
  I don't know how to track TDD, but for me, code coverage is an example of the same old false proxies that people used to track in the 000s.
  Before creating a metric and policing it, make sure you can rigorously defend its relationship to NPV. If you can't do this, find something else to track.

lmm 8 months ago

> It can’t be that you want your org to run without numbers. And it can’t be that you eschew quantitative goals forever!

Can't it? Amazon may be an exception, but most of the time running without numbers or quantitative goals seems to work better than having them.

James_K 8 months ago

In short, describe the actions you want people to take rather than the results you think those actions should achieve. Or perhaps more fundamentally, you should know what people are doing and you won't know that when you are only looking at an opaque metric.

deeviant 8 months ago

I'm still not convinced. Goodhart’s Law is rooted in human behavior—once people know what’s being measured, they’ll optimize for that, often distorting the system or the data to hit targets. The article's solution boils down to “just do it right” by refining metrics and improving systems, but that’s easier said than done. It ignores the fact that people will always game metrics if their rewards depend on them. Plus, it conflates data-driven decision-making with performance evaluation, which are very different. The psychology behind Goodhart’s Law isn’t solved by more metrics tweaking.

thenobsta 8 months ago

This doesn't feel well elucidated, but I've been thinking about Goodhart's law in other area's of life -- e.g. Owning a home is cool and can enable some cool things. However, when home ownership becomes the goal, it's becomes easy to disregard a lot of life giving things in pursuit of owning a home.

This seems to pop up in a lot of areas and I find myself asking is X thing a thing I really desire or is it something that is a natural side effect of some other processes.

soared 8 months ago

For most people owning a home isn’t the goal, it’s to be able to adjust their living space how they see fit, have a stable place to raise children, remove the risk of landlords, etc
- acdha 8 months ago
  
  Isn’t that exactly the point? Some people set their goal as buying the house but forget to reevaluate how well they’re doing at their true goals - like do they really benefit from the extra rooms enough to be worth commuting taking away family time, or are they doing renovations often enough to say they’re actually taking advantage of that benefit?
kqr 8 months ago

This is a question of values. When home ownership is an implicit value of the culture one lives in, the reason to own a home is to own a home.
Once you start looking for these things that are done for their own sake (or really to gain respect in a community) you notice how pervasive they are and how different they can be for two people next to each other.
I recommend Gregory's Savage Money on the subject. My review here: https://entropicthoughts.com/book-review-savage-money
nrnrjrjrj 8 months ago

If you are smart and think alot you can do well renting and investing elsewhere.
You can also ask what is life about?
This is hard to do because the conclusion may need to break moulds, leading to family estrangement and losing friends.
I suspect people who end up having a TED talk in them are people who had the ability through courage or their inherited neural makeup to go it alone despite descenting voices. Or they were raised to be encouraged to do so.

ssivark 8 months ago

What I find striking is the emphasis on tracking and managing inputs (to develop a healthy causal model for management). This is in contrast with advice that's commonly passed around (ostensibly attributable to OKRs, High Output Management, etc -- though I haven't carefully read the original sources), with heavy emphasis on the output but eschew focusing on input metrics. While it avoids the failure mode of focusing on causally irrelevant metrics, it also assumes that the right causal model has already been discovered and there's no learning process here. This impedes the development of an agile organization that is learning constantly (growth mindset) instead promoting a "fixed" mindset where people/teams can either execute well or not, and the only control lever is hiring/firing/promoting those who seem to "get it". Fantastic and thought-provoking article!

lamename 8 months ago

This is all well and good, but unfortunately depends on the people pushing for the metric/system to give a shit about what the metric is supposed to improve. There are still far too many that prefer to slap 1 or 2 careless metrics on an entire team, optimize until they're promoted, then leave the company worse off.

ang_cire 8 months ago

Sounds like bad management at the top, too. If leaders can't determine if middle management is showing them success in a metric that doesn't actually help the business, they're doing the same thing (paycheck till the parachute arrives).

mark-r 8 months ago

Reminds me of one of my favorite Dilbert cartoons: https://www.reddit.com/r/ProgrammerHumor/comments/k5hka0/bug...

yarg 8 months ago

Goodhart's law can diagnose an issue, but it prescribes no solutions.

However, it's still better to recognise a problem, so you can at least look into ways of improving the situation.

satisfice 8 months ago

"...when you’re incentivising organisational behaviour with metrics, there are really only three ways people might respond: 1) they might improve the system, 2) they might distort the system, or 3) they might distort the data."

This is wrong, and the wrongness of it undermines the whole piece, I think:

- A fourth way people respond is to oppose the choice of target and/or metric; to question its value and lobby to change it.

- A fifth way people respond is to oppose the whole idea of incentives on the basis of metrics (perhaps by citing Goodhart's Law... which is a use of Goodhart's Law).

Goodhart's Law is useful not just because it reminds us that making a metric a target may incentivize behavior that makes THAT metric a poor indicator of a good system, but also because choosing ANY metric as a target changes everyone's relationship with ALL metrics-- it spells the end of inquiry and the beginning of what might be called compliance anxiety.

Lvl999Noob 8 months ago

While true, I think your additions to the behaviors are rather... useless. Out of the original three, notice that one is the actual behaviour we want to happen and two are insidious side effects that we want to prevent.
Your proposed fourth and fifth outcome behaviours, on the other hand, are neither. Most importantly, they are transient (at least ideally). Either the workforce and the management come to an agreement and metrics continue (or discontinue) or they don't and the business stays in a limbo. It is an emergency (or some other word with lower impact; incident?). There isn't a covert resistance by some teams specifically working against the metric and lowering it while also hiding themselves from notice.
- satisfice 8 months ago
  
  My additions are important if you want to understand the value of Goodhart’s Law. They are logically necessary for me to make my point— which is that his analysis of the situation is flawed.
  I am bemused that you deride them, given that they are, in fact, how I have responded to metrics in technical projects since I first developed a metrics program for Borland, in ‘93. (I championed inquiry metrics and opposed control metrics.)
  
  cutemonster 8 months ago
  
  > inquiry metrics
  That sounds interesting, how did you do that? What different inquiry metrics did you use if I can ask, what did the others think, how did it all work out?
mkleczek 8 months ago

How can people (either individuals or groups) assess (or even define) achieving goals without SOME metric?

skmurphy 8 months ago

There is a very good essay in the first comment by "Roger" dated Jan-2023, reproduced below. Skip the primary essay and work from this:

"I really appreciated this piece, as designing good metrics is a problem I think about in my day job a lot. My approach to thinking about this is similar in a lot of ways, but my thought process for getting there is different enough that I wanted to throw it out there as food for thought.

One school of thought 9https://www.simplilearn.com/tutorials/itil-tutorial/measurem...) I have trained in is that metrics are useful to people in 4 ways:

    1. Direct activities to achieve goals
    2. Intervene in trends that are having negative impacts
    3. Justify that a particular course of action is warranted
    4. Validate that a decision that was made was warranted

My interpretation of Goodhart’s Law has always centered more around duration of metrics for these purposes. The chief warning is that regardless of the metric used, sooner or later it will become useless as a decision aid. I often work with people who think about metrics as a “do it right the first time, so you won’t have to ever worry about it again”. This is the wrong mentality, and Goodhart’s Law is a useful way to reach many folks with this mindset.

The implication is that the goal is not to find the “right” metrics, but to instead find the most useful metrics to support the decisions that are most critical at the moment. After all, once you pick a metric, 1 of 3 things will happen:

    1. The metric will improve until it reaches a point where you are not improving it anymore, at which point it provides no more new information.
    2. The metric doesn’t improve at all, which means you’ve picked something you aren’t capable of influencing and is therefore useless.
    3. The metric gets worse, which means there is feedback that swamps whatever you are doing to improve it.

Thus, if we are using metrics to improve decision making, we’re always going to need to replace metrics with new ones relevant to our goals. If we are going to have to do that anyway, we might as well be regularly assessing our metrics for ones that serve our purposes more effectively. Thus, a regular cadence of reviewing the metrics used, deprecating ones that are no longer useful, and introducing new metrics that are relevant to the decisions now at hand, is crucial for ongoing success.

One other important point to make is that for many people, the purpose of metrics is not to make things better. It is instead to show that they are doing a good job and that to persuade others to do what they want. Metrics that show this are useful, and those that don’t are not. In this case, of course, a metric may indeed be useful “forever” if it serves these ends. The implication is that some level of psychological safety is needed for metric use to be more aligned with supporting the mission and less aligned with making people look good."

turtleyacht 8 months ago

Thank-you. The next time metrics are mentioned, one can mention an expiration date. That can segue into evolving metrics, feedback control systems, and the crucial element of "psychological safety."
A jaded interpretation of data science is to find evidence to support predetermined decisions, which is unfair to all. Having the capability to always generate new internal tools for Just In Time Reporting (JITR) would be nice, even so reproducible ones.
This encourages adhoc and scrappy starts, which can be iterated on as formulas in source control. Instead of a gold standard of a handful of metrics, we are empowered to draw conclusions from all data in context.
- skmurphy 8 months ago
  
  I am not "Roger," but I can recognize someone who has long and practical experience with managing metrics and KPIs and their interaction with process improvement. Instead of an "expiration date" I would encourage you to define a "re-evaluation date" that allows enough time to judge the impact and efficacy of the metrics proposed and make course corrections as needed (each with its own review dates).
  One good book on the positive impact of a metric that everyone on a team or organization understands is "The Great Game of Business" by Jack Stack https://www.amazon.com/Great-Game-Business-Expanded-Updated-... I reviewed it at https://www.skmurphy.com/blog/2010/03/19/the-business-is-eve...
  Here is a quote to give you a flavor of his philosophy:
  "A business should be run like an aquarium, where everybody can see what's going on--what's going in, what's moving around, what's coming out. That's the only way to make sure people understand what you're doing, and why, and have some input into deciding where you are going. Then, when the unexpected happens, they know how to react and react quickly. "
  Jack Stack in "Great Game of Business."
  
  krisstring 8 months ago
  
  While Goodhart’s Law often occurs because of a narrow focus on a metric without understanding its role in the larger system, the approach in Jack Stack’s The Great Game of Business is to make targets an educational tool, teaching employees how to interpret and impact those targets responsibly.
  GGOB, by 1. involving employees in decision-making and teaching them about metrics, 2. giving them a line-of-sight for how their contribution impacts the overall business, and 3. providing a stake in the outcome
  creates collective accountability and success, and reduces the likelihood of metric manipulation.
  Bottom line: GGOB recognizes that business success takes everyone, at all levels, and values the input of each employee, right down to the part-time janitor. The metrics are used as tools, like the scoreboard in baseball, to guide decision making and establish what winning as a team looks like. It all comes down to education and getting everyone aligned and pulling in the same direction.
  
  skmurphy 8 months ago
  
  I agree it's really about effective delegation, it acknowledges the risks that Goodhart warns about and suggests how to manage if not avoid them.
  
  shadowsun7 8 months ago
  
  I should note that this essay kicks off an entire series that eventually culminates in a detailed examination of the Amazon Weekly Business Review (which takes some time to get to because of a) an NDA, and b) it took some time to test it in practice). The Goodhart’s Law essay uses publicly available information about the WBR to explain how to defeat Goodhart’s Law (since the ideas it draws from are five decades old); the WBR itself is a two decades-old mechanism on how to actually accomplish these high-falutin’ goals.
  https://commoncog.com/the-amazon-weekly-business-review/
  Over the past year, Roger and I have been talking about the difficulty of spreading these ideas. The WBR works, but as the essay shows, it is an interlocking set of processes that solves for a bunch of socio-technical problems. It is not easy to get companies to adopt such large changes.
  As a companion to the essay, here is a sequence of cases about companies putting these ideas to practice:
  https://commoncog.com/c/concepts/data-driven/
  The common thing in all these essays is that it doesn’t stop at high-falutin’ (or conceptual) recommendation, but actually dives into real world application and practice. Yes, it’s nice to say “let’s have a re-evaluation date.” But what does it actually look like to get folks to do that at scale?
  Well, the WBR is one way that works in practice, at scale, and with some success in multiple companies. And we keep finding nuances in our own practice: https://x.com/ejames_c/status/1849648179337371816
  
  skmurphy 8 months ago
  
  It looks like any other decision record where you set a date to evaluate the impact of a policy or course of action and make sure it's working out the way that you had anticipated.
  
  shadowsun7 8 months ago
  
  And how are you going to tell that when you have a) variation (that is, every metric wiggles wildly)? And also b) how are you able to tell if it has or hasn’t impacted other parts of your business if you do not have a method for uncovering the causal model of your business (like that aquarium quote you cited earlier?)
  Reality has a lot of detail. It’s nice to quote books about goals. It’s a different thing entirely to achieve them in practice with a real business.
  
  skmurphy 8 months ago
  
  I agree that reality is complex, but I worry you are conflating the challenges of running an Amazon-scale business with running the smaller businesses that most of the entrepreneurs on HN will need to manage. I thought Roger offered a more practical approach in about 10% of the words that you took. I am sorry if I have offended you; I was trying to save the entrepreneurs on HN time.
  As to Jack Stack's book, I think the genius of his approach is communicating simple decision rules to the folks on the front line instead of trying to establish a complex model at the executive level that can become more removed from day-to-day realities. In my experience, which involves working in a variety of roles in startups and multi-billion dollar businesses over the better part of five decades, simple rules updated based on your best judgment risk "extinction by instinct" but outperform the "analysis paralysis" that comes from trying to develop overly complex models.
  Reasonable men may differ.
  
  shadowsun7 8 months ago
  
  This comment is for HN readers who might be interested in solutions.
  My two questions (a) and (b) were not rhetorical. Let’s get concrete.
  a) You are advising a company to “check back after a certain period”. After the certain period, they come back to you with the following graph:
  https://commoncog.com/content/images/2024/01/prospect_calls_...
  “How did we do? Did we improve?”
  How do you answer? Notice that this is a problem regardless of whether you are a big company or a small company.
  b) 3 months later, your client comes back and asks: “we are having trouble with customer support. How do we know that it’s not related to this change we made?” With your superior experience working with hundreds of startups, you are able to tell them if it is or isn’t after some investigation. Your client asks you: “how can we do that for ourselves without calling on you every time we see something weird?”
  How do you answer?
  (My answers are in the WBR essay and the essay that comes immediately before that, natch)
  It is a common excuse to wave away these ideas with “oh, these are big company solutions, not applicable to small businesses.” But a) I have applied these ideas to my own small business and doubled revenue; also b) in 1992 Donald Wheeler applied these methods to a small Japanese night club and then wrote a whole book about the results: https://www.amazon.sg/Spc-Esquire-Club-Donald-Wheeler/dp/094...
  Wheeler wanted to prove, (and I wanted to verify), that ‘tools to understand how your business ACTUALLY works’ are uniformly applicable regardless of company size.
  If anyone reading this is interested in being able to answer confidently to both questions, I recommend reading my essays to start with (there’s enough in front of the paywall to be useful) and then jump straight to Wheeler. I recommend Understanding Variation, which was originally developed as a 1993 presentation to managers at DuPont (which means it is light on statistics).

rc_mob 8 months ago

its not a "law" of course. should not be called a law

stoperaticless 8 months ago

tl;dr version:

- Use not one, but many metrics (article mentioned 600)

- Recognize that some metrics you control directly (input metrics) and others you want to but can’t (output metrics).

- Constantly refine metrics and your causal model between inputs and outputs. (Article mentions weekly 60-90min reviews)

Edit: crucial part, all consumers of these metrics (all leadership) is in this.

bediger4000 8 months ago

Seems like the headline should be:

Is Goodhart's Law as useful as you think?

test1235 8 months ago

Betteridge's Law would say, "no"
https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...

vtodekl 8 months ago

[dead]

stonethrowaway 8 months ago

What does “Law” mean in this case?

LeonB 8 months ago

“Pattern”

Henry2763 8 months ago

[flagged]

aunwick 8 months ago

I suspect next months article will be pay for performance as measured by lines of code and production issues. Prepare for a 10x increase in code base and zero production changes until after bonuses hit the bank.

aunwick 8 months ago

Although SPC in manufacturing is an example of measuring the correct goals. I'll give you that one.