Email A/B testing: how to run tests that actually improve performance
What email A/B testing is and why most people do it wrong
Email A/B testing, also called split testing, is the practice of sending two versions of an email to separate groups of subscribers and measuring which version performs better against a defined metric. Version A goes to one group. Version B goes to another. The version that achieves the better result on your target metric becomes the control for future tests or is sent to the remainder of your list.
Done correctly, A/B testing is the most reliable mechanism for improving email performance over time. Done incorrectly, it produces results that look meaningful but are not, leading to changes that do not actually improve your programme and may harm it.
The most common errors are testing too small a sample, changing more than one variable at a time, and choosing a winner before the test has run long enough to reach statistical significance. Each of these errors produces a false signal. A test that shows version B outperforming version A by five percent on a sample of 200 subscribers is not a reliable result. Random variation in a sample that small can easily produce a five percent difference with no underlying causal relationship.
The second most common error is testing variables that have no meaningful effect on the metric being measured. Testing button colour to improve open rate is not useful because button colour has no effect on open rate. Testing subject line length to improve conversion rate conflates two different parts of the funnel. Map the variable to the metric before running any test.
Understanding A/B testing as part of a wider optimisation system is covered in the email campaign optimisation guide, which explains how testing fits into a systematic improvement framework across the full campaign lifecycle.
What to test first: subject lines, CTAs, and send times
Start with subject lines. They are the highest-impact variable for open rate, the easiest element to create multiple variants of, and the one that does not require design changes or technical setup. Most email platforms support subject line A/B testing natively, meaning the test infrastructure is already in place.
When testing subject lines, isolate one element at a time. Test length versus length. Test question format versus statement format. Test with personalisation versus without. Test urgency framing versus benefit framing. Each of these is a separate test. Running them in sequence gives you a clear picture of what drives performance for your specific audience.
For a full breakdown of subject line formats and their typical performance characteristics, the email subject lines guide covers the structures that consistently outperform, with frameworks you can adapt directly to your testing programme.
CTAs are the right second focus. CTA testing affects click-through rate, which is the metric directly downstream from open rate. Test CTA text first, then button placement, then button colour. The text change typically produces the most significant result and requires the least design effort. Specific CTAs outperform generic ones in most contexts. "Download the free report" converts better than "Learn more" for the same offer, because it removes ambiguity about what the subscriber is clicking for.
Send time testing is valuable but produces smaller returns than subject line and CTA testing for most programmes. It is worth running once you have stable results from the higher-impact variables. Test two or three time windows across equivalent segments, run each window for several sends before drawing conclusions, and use your platform's engagement data as a guide to which windows are worth testing first.
How to set up a statistically valid email A/B test
Statistical validity requires a large enough sample to detect a real difference rather than random noise. The minimum sample size for a reliable email test depends on your baseline open rate, the size of difference you expect to detect, and your required confidence level.
As a practical guide, tests with fewer than 500 subscribers per variant are rarely conclusive for open rate differences below ten percentage points. For click rate differences, you typically need larger samples because click rates are lower and therefore more sensitive to random variation. If your list is under 1,000 subscribers, run your test across two or three consecutive campaigns before drawing conclusions rather than trying to conclude from a single send.
Most platforms handle the mechanics of test setup automatically. In Mailchimp, A/B testing is available on paid plans and lets you test subject lines, sender names, content, and send times with configurable sample sizes. Klaviyo supports both A/B and multivariate testing within campaign flows, with automatic winner selection based on a defined metric after a specified time window. HubSpot includes A/B testing in its email tool with real-time reporting on variant performance.
Set your winning metric before you launch the test. Decide in advance whether you are optimising for open rate, click rate, or revenue generated, and do not change this once the test is running. Post-hoc metric switching, choosing a different metric after seeing results, invalidates the test entirely.
Advanced testing: content, design, and segmentation
Once subject lines and CTAs are stable, testing content structure, email length, design layout, and offer type produces the next tier of improvements. These tests require more effort to set up because they involve content or design changes, but they address variables that have a direct impact on the subscriber experience and therefore on conversion rate.
Content length testing compares a shorter email, typically 100 to 200 words, against a longer version covering the same topic in more depth. The result depends heavily on your audience and campaign type. Promotional emails typically perform better short. Educational and product-announcement emails often benefit from more detail. Test within your specific context rather than applying a universal rule.
Design layout testing, single column versus multi-column, image-heavy versus text-led, is most relevant for programmes where design resources are available to produce both variants. Image-to-text ratio also affects deliverability, so design tests should track spam placement rates alongside engagement metrics to catch any deliverability side effects from design changes.
Offer testing, for example a percentage discount versus a free shipping offer for the same monetary value, can produce significant differences in conversion rate. Offer testing requires careful setup to ensure both groups are genuinely comparable in terms of engagement level and purchase history, otherwise you cannot attribute performance differences to the offer rather than to audience variation.
Segmentation-based testing compares performance of the same content across different audience segments. This is not a traditional A/B test but produces valuable data about how different audiences respond to the same message. If the same email achieves a 35 percent open rate with your most engaged segment and 12 percent with your cold segment, that tells you something important about whether cold subscribers should receive that content at all.
Connect your email platform to Google Analytics with UTM parameters on all test variants. Email-side metrics show you what happened in the inbox. Analytics shows you what happened after the click, which is where conversion rate data lives. A test that shows version B winning on click rate but losing on conversion rate is telling you that version B attracted clicks that did not convert, which is a different problem requiring a different fix.
How to interpret and act on A/B test results
A test result is only actionable if it meets three conditions: the sample was large enough to be statistically significant, only one variable was changed, and the metric was defined before the test began. Results that do not meet these conditions should be treated as directional data, not confirmed findings.
When a test produces a statistically significant result, update your default approach to reflect the winning variant. If subject line questions consistently outperform statements in your programme, make questions your default format for that campaign type. This is how a testing programme builds knowledge: not from any single result but from the accumulation of confirmed findings applied consistently.
When a test produces no statistically significant difference, that is also a finding. It means the variable you tested does not meaningfully affect performance for your audience in that context. Record it as a null result and move to the next variable. Do not retest the same variable in the same context expecting a different result unless you have a specific reason to believe the first test was flawed.
For tracking your results and building a testing programme that compounds over time, the email click-through rate guide covers how CTR data connects to the broader picture of campaign performance, which helps you prioritise which tests are worth running next based on where your programme currently leaks engagement.
Building a continuous testing programme
A one-off test is useful. A continuous testing programme changes the trajectory of your email performance over time. The difference between programmes that stagnate and programmes that consistently improve is almost always whether they test consistently and act on what they find.
A sustainable testing cadence runs one test per campaign for programmes sending weekly or biweekly. This produces four to eight confirmed test results per month, which is enough to meaningfully inform decisions about subject line format, CTA placement, content length, and send time over a quarter.
Keep a testing log. This is the most underused element of email optimisation. Record the date, the variable tested, the hypothesis, the sample size, the result, and what action you took as a result. Without this record, you repeat tests already run, forget which variants won, and lose the institutional knowledge that gives a long-running programme its edge over newer ones.
A simple log in Notion or Airtable is sufficient. Structure it as a running table with one row per test. Review it quarterly to identify patterns: which variables produce consistent results, which produce noise, and where the next tier of testing should focus.
The connection between testing and broader campaign optimisation is explored in the email marketing optimisation guide, which covers how testing results feed into systematic improvements across every element of a campaign from list quality to landing page performance.
What this means for your campaign improvement process
Email A/B testing produces the best returns when it is treated as a process, not a project. A testing programme that runs consistently over six months, with clear hypotheses, adequate sample sizes, and a log of results and actions, will produce a materially better performing email programme than one that runs three well-designed tests and stops.
Start with a single variable in the part of your funnel with the biggest drop-off. If open rates are your primary problem, start with subject lines. Run three to five tests on subject line format before moving to the next variable. By the time you have confirmed findings on subject line length, question versus statement format, and personalisation effect, you have a clear picture of what your audience responds to. That picture is more valuable than any industry report because it is specific to your list.
Once subject line performance is stable, move to CTAs and content structure to improve click rate. Then move to the post-click experience to improve conversion rate. This sequential approach, fixing the top of the funnel before addressing the bottom, prevents the common mistake of spending time on landing page optimisation when the primary problem is that subscribers are not opening the email at all.
Use your email platform's built-in testing tools as your primary infrastructure. Mailchimp and Klaviyo both handle sample splitting, winner selection, and performance reporting without requiring additional tools. For tracking post-click performance, connect to Google Analytics via UTM parameters on every test variant.
The goal of a testing programme is not to find a winning template and repeat it indefinitely. Subscriber behaviour changes, inboxes change, and what worked last quarter may produce diminishing returns this quarter. The goal is to maintain a habit of testing that keeps your programme responsive to what your audience currently responds to rather than what it responded to when you last ran a test.
For the full picture of how A/B testing fits within a broader campaign improvement system, the email campaign optimisation guide covers the complete chain from list quality to post-click conversion, showing where testing delivers the highest return at each stage of the funnel.
LATEST BLOGS
AI tools for business: how to build your stack
Workflow automation: how to identify what to automate and get it running
AI for small business: the tools worth using and how to get started
RELATED
Email marketing ROI: how to measure it and what good looks like
Email conversion rate: what good looks like and how to improve yours
Email marketing optimisation: a complete guide to improving every campaign
Subscribe for updates
Get the insights, tools, and strategies modern businesses actually use to grow. From breaking news to curated tools and practical marketing tactics, everything you need to move faster and smarter without the guesswork.
Success! Check your Inbox!
Tezons Newsletter
Get curated tools, key business news, and practical insights to help you grow smarter and move faster with confidence.
Latest News




Have a question?
Still have questions?
Didn’t find what you were looking for? We’re just a message away.








