The Measurement System That Scaled HexClad

Hey everyone,

Welcome back for another bite to chew on.

HexClad is one of those brands that has everything going for it on paper — a great product, Gordon Ramsay as a partner, and a paid media engine that was already working before the current team even arrived.

That is a dangerous position to be in. Because when everything already works, nobody builds the systems to figure out why it works.

London Spilker is the Director of Paid Marketing at HexClad. He joined over three years ago as part of a team that jumped from agency side to brand side all at once. What they built since then is one of the more disciplined measurement and creative evaluation systems we have seen in DTC — and they had to build it specifically because Gordon Ramsay kept making everything look better than it actually was.

What stood out in our conversation is how HexClad moved from "everything is working, keep spending" to "we need to know exactly what is working and why, down to the individual asset level."

On the Menu:

Building a measurement system that doesn't lie to you
The creative math most brands get wrong
When direct response hits a ceiling (and what to do about it)

BTW, this is just a taste of our chat with London. Listen to the full recording here.

Why is your abandoned cart email flow still stuck in 2024?

Here’s a lazy retention strategy: send every shopper the same abandoned cart email from last year and hope it works.

Most brands still do this, and it makes no sense. Your shoppers are not the same. They looked at different products, different price points, different categories. They need different reasons to come back. Yet most retention setups treat them like a list instead of individuals.

And it gets worse. Raise your hand if you built your welcome series and abandoned cart flow 12 to 18 months ago and haven’t really touched it since.

You’re not alone. But your catalog has changed, your pricing has changed, your bestsellers have shifted, and your email flows are still recommending products from last year.

The dirty secret of retention marketing is that most brands set up their flows once and never revisit them. And we get it, it’s time-consuming. You need new creative, new copy, and someone to actually go into Klaviyo and update everything. So it just… doesn’t happen.

But stale email flows kill performance slowly. Open rates drift down. Click rates flatten. Revenue per email drops. And you don’t notice because the flow is still “on” and still generating some revenue. It’s the slow leak that nobody plugs.

The future of this is AI-generated email flows that stay current automatically.

That’s what Instant does. Their AI pulls your live site data: new arrivals, bestsellers, current promos and generates personalized emails in real time. No templates to update, no flows to rebuild. It just stays fresh.

More importantly, Instant never sends the same abandonment email to every shopper.

Instant looks at what each shopper actually did on your site, then sends the email that person should get. Right product. Right message. Right offer. Like you had someone manually writing the perfect follow-up for every single shopper, except it all runs automatically.

That's why brands like ThirdLove, Neuro, Kind Patches, and TRX trust Instant, and why brands using it are driving millions in incremental email revenue - we're talking 3 to 5x increases.

See what personalized abandonment flows could do for your brand. Book a demo by April 7 and get 50% off your first 60 days before this offer goes away.

Building a Measurement System That Doesn't Lie

1. The Gordon Ramsay comparison trap

When you have one of the most recognizable chefs on the planet as your creative partner, every asset he touches outperforms. That sounds like a gift. It was actually a measurement problem.

HexClad went through a phase where almost all of their weekly creative sprints included Gordon Ramsay to some extent. The result: their performance baseline was inflated by celebrity, making it nearly impossible to evaluate anything else fairly.

As London put it: "We were over-reliant on Gordon Ramsay. All of the creative we launched in different categories — whether that was a traditional micro creator we were whitelisting — obviously it's not going to look as good as Gordon Ramsay."

That is the trap. When your best asset skews the average, the average becomes useless as a benchmark.

2. 60 controls, reviewed weekly

The fix was building a control system that segments performance by category. HexClad now runs 60 individual controls that they review on a weekly basis. Each control defines a benchmark for a specific combination of funnel stage, creator type, and asset style.

A Gordon Ramsay video gets compared to its own control. A micro-creator whitelisted static gets compared to its own. A black background product shot with a specific offer has its own benchmark — but only once there is enough data behind it to be statistically significant.

The key insight: you would not compare the same creative this year versus last year and call it an apples-to-apples evaluation. Different asset types serve different purposes, and your measurement system needs to reflect that.

3. Net new visits as the north star metric

Beyond asset-level controls, HexClad tracks one metric more intentfully than almost anything else: percent new visits, tracked year over year.

The reason is telling. All Gordon Ramsay assets are falling in net new visit percentage over time. That is not because the creative is worse — it is because the audience Gordon reaches has been saturated. The same people keep seeing the same face.

That single metric became the forcing function for diversifying away from Gordon-heavy creative mixes. It also became a core decision-making factor for scaling individual assets beyond traditional ROAS windows.

What you can do: If you are evaluating creative performance against one blended average, you are hiding the truth from yourself. Build controls by asset type, creator tier, and funnel position. Start with five or six segments and expand from there as you accumulate statistically significant data.

The Creative Math Most Brands Get Wrong

1. Calculate minimum spend per asset for statistical significance

Before you can evaluate creative, you need to know how much money you need to spend per asset before the data means anything. For HexClad, that number is $2,500 per asset.

That figure is driven by their CAC, AOV, and consideration period. A brand with lower AOV and faster purchase cycles would have a lower stat sig threshold.

From there, the math works backwards. Take your weekly forecasted testing budget (traditionally around 30% of total spend), divide by your stat sig number, and you get the maximum number of assets you can meaningfully test per week.

If you have $25K in weekly testing budget and your stat sig threshold is $2,500, you can test 10 creatives. Not 30. Not "as many as we can produce." Ten that you can actually learn from.

2. The hit rate graph — diagnosing creative vs. media buying

HexClad uses a scatter plot with ROAS on the Y axis and spend on the X axis to diagnose performance at a glance.

In a healthy account, dots cluster in two places: top right (high spend, high ROAS — your scaled winners) and bottom left (low spend, low ROAS — tests that did not work and were turned off correctly).

The red flags are the other two quadrants. Bottom right means you are spending heavily on assets that are not performing — a media buying problem. Top left means high-performing assets are not getting enough spend — also a media buying problem.

This single visualization separates creative evaluation from media buying evaluation. If your dots are in the wrong quadrants, the problem is not your creative team. Fix the buying first, then diagnose creative output.

3. Quarterly forecasting like a media mix

HexClad forecasts creative production the same way they forecast media spend — at a quarterly level, broken down into monthly and weekly cadences.

Their current target: 65% static, 35% video in the evergreen creative mix, further segmented by persona, category type, and creator tier. Without this forecasting discipline, it is easy to drift. London described the scenario: six months later you realize you launched 80% statics and 20% videos, missed your micro-creator whitelisting targets, and your diversification strategy fell apart.

What you can do: If you are not forecasting your creative mix with the same rigor you forecast spend, start now. Map your last quarter of output by format, persona, and creator type. Identify the gaps and build a quarterly production plan that fills them.

When Direct Response Hits a Ceiling

1. The efficiency paradox

Year one and two of HexClad's current team saw massive efficiency gains across the board — creative, CRO, retention, paid media. Growth came from low-hanging fruit that had been underserved when the team was agency-dependent.

Year three was different. As London put it: "Each year that you increase your efficiency and spend more at the same time, the next year will be even harder."

The diminishing returns curve is real. When you have already optimized creative output, landing page conversion, retention flows, and media buying, the incremental gains shrink from 20-50% improvements down to single-digit optimization. At HexClad's scale, even 1% improvement in efficiency translates to massive net contribution. But finding that 1% requires fundamentally different strategies than finding the first 20%.

2. Upper funnel bets and the Super Bowl

The answer for HexClad was bridging into upper funnel investment — and the Super Bowl ad was the biggest bet they made.

The team ran measurement studies with multiple partners and found the direct impact was between $10 million and $22.5 million. That media spend materialized over the next 3, 6, and 12 months — and they are still feeling the effects over a year later.

What made the bet educated rather than reckless was measurement. They were hitting net new reach and frequency caps on traditional paid social channels. View content, reach, and video view objective campaigns had become a substantial percentage of their Meta investment. They used incrementality testing tools to create multipliers that could track the broader impact even though the investment would not translate 1:1 into same-period revenue.

London described the mindset shift: "Coming from a direct response background, really reframing our mindset as a growth team has been super impactful as we've hit this level of scale."

3. Landing page testing with extended attribution

One of HexClad's biggest early mistakes was evaluating landing page tests on one-day click attribution.

Their average time from first website visit (via Meta) to purchase is 6.5 days. That means one-day click attribution misses most of the conversion signal. They shifted their primary landing page evaluation window to seven-day click and now review multiple attribution windows — including 30-day click LTV, modeled view, and deterministic view.

The finding that changed their approach: roughly 25% of the time, extended attribution windows paint a different picture than one-day click. A landing page that "lost" on one-day click sometimes wins on seven-day click or 30-day LTV.

That is not a marginal insight. One in four tests was being misread because the evaluation window was too narrow for their consideration period.

What you can do: Calculate your average time from first paid touch to purchase. If it is longer than your attribution window, you are making landing page decisions on incomplete data. Extend your evaluation window to match your actual purchase timeline, and compare results across multiple windows before calling a winner.

Sum It Up

HexClad's real advantage is not Gordon Ramsay. It is the measurement discipline they built specifically because Gordon Ramsay made everything look good on the surface.

On measurement: Build asset-level controls instead of benchmarking against blended averages. 60 controls reviewed weekly is how HexClad separates signal from celebrity.
On creative: Calculate your stat sig threshold, forecast your creative mix quarterly, and use the ROAS-spend scatter plot to diagnose whether problems are creative or media buying.
On scaling: When direct response hits diminishing returns, the answer is not "optimize harder." It is educated upper funnel bets with proper measurement — and extended attribution windows that match your actual purchase cycle.

The thread connecting all of this is rigor. The brands that win at scale are the ones that build systems to tell themselves the truth, even when the surface metrics say everything is fine. The same principle applies to retention — if your email program is running broad flows on a fixed schedule, you are missing the same kind of signal HexClad refused to ignore on paid.

Let us know how we did...

How would you rate this post?

All the best,

Ron & Ash