· 8 min read · Phronesis

Phronesis vs spreadsheet decision matrices — a showdown

Decision matrices in Google Sheets are the default. Here is what they do well, what they quietly hide, and when a purpose-built tool wins.

Cover art for Phronesis vs spreadsheet decision matrices — a showdown

Most teams that make structured decisions already have a tool for it: a Google Sheet. A column for each option, a row for each criterion, a weight column, a weighted-average formula, conditional formatting. It works. It has worked for decades before Phronesis existed.

So is there a real reason to use a purpose-built tool? The honest answer is: for a large class of decisions, no — the spreadsheet is fine. For a specific class of decisions, the spreadsheet has failure modes that a blind-jury tool avoids cleanly. This post walks through both.

What the spreadsheet does well

Credit where due. A weighted-scoring spreadsheet is:

  • Transparent. Every number is visible. The math is auditable. You can tell your CFO exactly how the recommendation was produced.
  • Flexible. Add a criterion, add an option, add a weight. It bends to whatever structure you need.
  • Free. Zero marginal cost per run. No API charges, no per-seat fees.
  • Portable. Anyone with Google or Excel can read it. No vendor lock-in.
  • Durable. The sheet you made in 2022 still works in 2026. No app deprecation risk.

For simple, transparent, infrequent decisions, the spreadsheet is the right tool. Don’t over-engineer it.

Where the spreadsheet quietly fails

The spreadsheet has five failure modes that are invisible until you’ve been using them for a while.

1. The scorer is not blind

The person filling in the cells knows which option is which. They are biased, even if mildly and even if they don’t intend to be. That bias shows up in small score drifts — a 7 instead of a 6.5 here, an 8 instead of a 7 there — and those drifts compound across 20–30 cells. The “weighted score” ends up looking objective while being anchored on the scorer’s priors.

Blind scoring — where the options are labeled generically for the scoring step — eliminates this. Most spreadsheet matrices don’t do it because setting up the blind is tedious. A tool can do it automatically.

2. There’s only one scorer

A single-scorer spreadsheet is a single point of view dressed in mathematical clothing. Even if the scorer is rigorous, they have one prior, one set of blind spots, and one sense of which criteria are demanding versus lenient to score.

Three scorers (or three analyst models, as in Phronesis) produce disagreement, which is data. A single scorer produces agreement with themselves, which is noise.

3. Scoring resolution drifts over time

On a 1–10 scale, the first few cells get scored carefully. By row 15, the scorer is calibrating against rows 1–14 instead of against the absolute meaning of the scale. By row 30, the top end has compressed — everything is 7s and 8s — and the distinctions that matter have been sanded down.

This is a well-documented cognitive effect (scale contraction). Fixed-reference scoring — where each model scores against an explicit rubric rather than against the running average — avoids it.

4. The weights get gamed, often unconsciously

When the same person picks the weights and does the scoring, the system allows you to subtly shift weights to make your preferred option win. This isn’t lying — it’s the way cognition works under motivated reasoning. You round up the weight on the criterion where your preferred option is strong, and down on the criterion where it’s weak.

The fix is to lock weights before scoring starts and to have scoring done by a different entity (or multiple entities) than the weight-setter. The spreadsheet doesn’t enforce this; a tool can.

5. There’s no confidence signal

A spreadsheet produces a ranking. It doesn’t tell you how confident to be in that ranking. The top score is 8.1, the second is 7.9. Is that a meaningful win? A coin flip? A function of how you rounded?

Without confidence and agreement signals, the spreadsheet treats all rankings as equivalent, whether the winner dominated or squeaked by. This is where the most expensive decision errors come from — confident action on a barely-won ranking.

Where a purpose-built tool wins

Given those failure modes, here’s the class of decisions where a tool like Phronesis produces materially better output than a spreadsheet:

  • High-stakes decisions where the cost of being subtly wrong dwarfs the cost of the tool
  • Decisions with 4+ options and 5+ criteria where scale drift and working memory overflow matter
  • Decisions that will be revisited — the audit trail of who scored what matters for post-mortems
  • Decisions made by someone with a stake — i.e. most of them — where blind scoring removes a bias vector
  • Decisions where you don’t know what weights are right — a tool can show you how the ranking changes under different weight sets, a spreadsheet shows you one configuration

And the class where the spreadsheet stays the right tool:

  • Low-stakes, frequent decisions (vendor selection for office snacks)
  • Decisions with a dominant criterion (the price is 60% of the weight)
  • Decisions where auditability is more important than accuracy (regulated industries sometimes need the sheet for compliance)
  • Decisions where you genuinely prefer one option and want a cover story — we’ve all done it, the sheet is optimized for this

That last case is worth naming out loud. A spreadsheet lets you work backward from the conclusion. A blind-jury tool makes it harder. Sometimes you’re going to pick Option A regardless and the sheet is a ritual. No tool fixes that.

A concrete example

Let’s run the same decision through both. The question: “Which of these three cloud providers should we standardize on?” Options: AWS, GCP, Azure. Criteria:

  • Primary service breadth (weight 20%)
  • AI/ML ecosystem (20%)
  • Cost for our workload profile (20%)
  • Team familiarity (15%)
  • Enterprise agreement terms (15%)
  • Exit cost (10%)

Spreadsheet workflow: A single ops lead scores each cell 1–10, multiplies by weight, sums the column. The output is “AWS: 7.6, GCP: 7.4, Azure: 6.9.” AWS wins. We move on.

Phronesis workflow: Same criteria and weights, three blind analysts. Output: “AWS 7.5, GCP 7.4, Azure 7.0, Unanimous: GCP on AI/ML; Unanimous: AWS on breadth; Split 2–1: cost, depending on workload profile. Confidence: 61%.”

The Phronesis output tells you what the spreadsheet hid: the cost dimension is where the decision actually lives, and the analysts split on it. The spreadsheet version claimed AWS won with a comfortable margin. The Phronesis version says AWS has a slight edge, and the close race is about cost, and cost evaluation depends on your actual workload — so go look at your workload before committing.

Those two outputs would lead to different next actions. The spreadsheet leads to “we’re going with AWS.” The Phronesis output leads to “let’s do a 2-week workload profiling study, then re-rank.”

The cost of the workload profiling study is maybe $10k of effort. The cost of committing to AWS and finding out in year 2 that your actual workload was 30% cheaper on GCP is $200k+. The purpose-built tool bought you a better question to ask, not a different answer.

The hybrid workflow that actually wins

In practice, the teams we see doing this best don’t choose one or the other. They use:

  1. Spreadsheet for the criteria-and-weights debate (transparent, shareable, editable in real time)
  2. Phronesis for the scoring + ranking step (blind, multi-analyst, confidence-reported)
  3. Spreadsheet again for the sensitivity analysis (“what if we reweight cost to 30%?”)
  4. Written doc for the final decision (not a spreadsheet, not a tool — prose is where commitment lives)

The tools complement. The spreadsheet is a transparent interface for the parameters humans argue about. Phronesis is the black-box-free comparison engine. Prose is the commitment artifact. Three tools, three jobs.

The cost calculus

At list pricing, Phronesis’s per-decision cost is a few cents. A spreadsheet’s per-decision cost is whatever 2–4 hours of ops time is worth. For a 15-person team, that’s ~$100–$400 of labor per decision.

If Phronesis produces a materially better output for even 10% of decisions — which our experience suggests is conservative — it’s a net positive on time-and-dollars by a wide margin. And the audit trail, the confidence signal, and the disagreement surfacing are all free extras.

The broader point

This isn’t a “spreadsheets are bad” argument. Spreadsheets are the Swiss Army knife of business analytics, and they will outlive most of the tools built on top of them. The argument is narrower: for a specific class of decision — high-stakes, multi-option, motivated-reasoning-prone — a blind-jury tool is a strict upgrade, and using one doesn’t mean abandoning the spreadsheet ecosystem.

If you already have a decision-matrix template that works for your team, keep it. Add a blind-jury pass at the scoring step for the decisions that matter most. Use the spreadsheet for the other 80% of decisions where your single-scorer approach is good enough. You’ll get most of the value of purpose-built tooling without the switching cost.

The tools are complements. Not substitutes.

#decision-matrix #spreadsheet #phronesis #product #comparison
Share Post on X LinkedIn
RUN IT YOURSELF

Score a decision with Phronesis.

Three blind AI analysts rank your options on weighted criteria. See agreement, confidence, and per-dimension scores.

Open Phronesis →
RELATED FIELD NOTES

Decision paralysis — a cognitive science guide for product teams

Why smart teams freeze on important calls, what's happening in the brain, and five interventions that unstick decision-p…

HubSpot vs Salesforce vs Pipedrive — an objective breakdown

A vendor-neutral comparison of HubSpot, Salesforce, and Pipedrive. Where each wins, where each loses, and the buyer pers…

A weighted-criteria approach to buying a laptop

Laptop reviews optimize for views, not your use case. A weighted-criteria framework produces better buys in 15 minutes t…