Chapter 8:

Testing and Experimenting

Coming up through the ranks at two payments giants, I took for granted the extent to which testing and experimenting were baked into everything we did. Even before imagining the possibilities of digital and mobile, payments companies worked with big data, analytics, and technology to experiment with communications, channel, products, personalization, and promotional strategies.

Testing was how we improved operations — customer selection, line of credit calculations, offer targeting, and repayment risk. We experimented to figure out ways to meet business model requirements under given conditions. We saw testing as a huge positive, the fastest and most cost-effective way to get ahead of the market, understand revenue and expense levers, anticipate and avoid risk, and create value and growth. The channels up until the mid-nineties just happened to be mail and phone.

So my eyes were opened when I left the sector and had two aha’s:

For change makers, the stakes are too high to rely on guesswork. Seat-of-the-pants may have been necessary in the past, but today’s complexity, competition, and demands to achieve scale all require more. There is too much information, too many choices and decisions, and too little time. Expectations are high, while the effects of innovation investments can be difficult to discern.

Strong testing and experimenting come from mindset, capabilities, investment, and commitment. The starter elements may already be within reach, even inside your head.

Direct-to-consumer and transaction-based business models, such as that of credit cards businesses, have long-valued data sciences capabilities. These companies have proven that testing and experimenting effectiveness depends upon:

Testing and experimenting is constant for any business creating value and growth.

More is required than statistics, data clouds, and number-crunching tools. This chapter is devoted to setting out testing and experimenting requirements. The principles fit scale efforts of all sizes, shapes, and stages.

Don’t be thrown off course by the jargon surrounding artificial intelligence, big data, machine learning, and whatever else is coming next. There is plenty of confusion. The marketplace for automation offerings is large and fragmented.

The good news is that there are lots of examples from which to borrow. Technology is accessible. Data is abundant. Storage is cheap.

Assumptions about testing and experimenting:

Testing and experimenting keep startup and grown-up businesses on pace. Even better, they point to marketplace discontinuities leading to innovation.

Surprise: Marketers have conducted A/B tests for a long time

The first known instances of direct mail trace back to 1000 bc.1 Modern-day testing tactics have origins at least back to the early 1960s, when agency executive Lester Wunderman gets credit for coining the term “direct marketing.”2 Testing was structured against control groups. On the edges of direct mail and telemarketing programs, test cells were created allowing for tight measurement of changes to offer, pricing, communications, targeting, and channel. Results were read — maybe beginning within days of deploying, but possibly weeks or even months later — and winners adopted.

Such testing measured and valued execution tactics to decide investments in programs whose pre-digital timeframes were long and whose costs were high. The goals were to find incremental improvements, validate hypotheses, and justify new campaign strategies. Teams debated details, such as the precise level of statistical significance to apply to each test for results to be accepted as empirically sound. Precision was achievable.

Measurement and optimization are quite different from experimentation pursued to shape and scale innovation. Traditional techniques still matter, especially in scale businesses with mature methods to acquire and retain customers. But the speed at which feedback can be obtained with an explosion of variables has disrupted the slow and steady structures of the past. Now it is essential to find ways to pick up bits of precision as needed in shorter, faster cycles. Machine learning is creating an entirely new paradigm — one where testing is built into ongoing self-learning campaigns.

Traditional testing depended upon market opportunities holding still. Today nothing holds still. But there are still principles and patterns, some of which have been proven through decades. So, even when there is no fixed template, rigor is achievable.

Testing and experimenting: “Just the way we work”

Startup Pypestream’s Smart Messaging Platform uses artificial intelligence and chat bots to connect businesses to their customers. Chief customer officer Donna Peeples brings dual perspective on the value of testing and experimenting, having also served as an executive at major corporations including AIG. She says, “The difference I’ve seen is that in the corporate world testing is formalized and structured. In startup land, we don’t even call testing out as a separate activity. It is just the way we work. It’s very iterative and very fluid. There is no line of demarcation between testing, experimenting, and everything else.”3

Integrating testing and experimenting into how a team operates takes a systematic, flexible approach to extracting insights from many forms of data — and then taking action. It’s not about data for the sake of data, or tools for the sake of having the latest number-crunching capability, or data storage for the sake of volume. You may be starting with a clean sheet. You may be jury-rigging your first learning platform, or trying to get out from under the constraints of a rigid, overly complicated one. There are common themes.

Eric Sandosham, cofounder of Singapore-based Red & White Consulting Partners, agrees. He says, “Startups are naturally predisposed to testing and experimenting simply because they have so little historic data to draw upon, and no existing business to defend. These factors make them extremely agile. They can evaluate insights with no stake in the past so they are better able to detect emerging opportunities.”4

What are you trying to figure out?

Decide which of the three types of insight apply to your priorities:

Three critical thinking guideposts

No matter the path, consider scribbling these guideposts on sticky notes tacked up in your workspace:

  1. Define the questions leading to decisions and action, linked to strategic priorities.
  2. Know the business-model operating levers — user behavior, processes, policies, and regulation. All drive financials.
  3. Have the execution capability to apply models, including each test result. If not, what is the point of the test?

Ten testing and experimenting pitfalls

  1. Confusing learning priorities and methodology. Mismatching hindsight, insight, and foresight goals with test approach has penalties. An A/B test is productive to determine relative impact of alternative pricing, offers, messages, or channels. If head-to-head testing is not feasible, pre- and post-testing can work. But neither is relevant if the goal is to foresee performance of an unprecedented idea, where metrics may not even be quantifiable at the outset.
  2. Failing to stay connected to start and end points. Customer insights, business model, goals, priorities, and strategy are always linchpins to value and growth. They frame testing and experimenting priorities, too.
  3. Pursuing flawed testing paths. A proposed test becomes so complicated it cannot be implemented. A team defines legitimate test questions. As design is sketched out, more permutations of segments and offers are added. The team strays from priorities.
  4. Becoming overwhelmed by data. Everything seems knowable. The most valuable data — substantially more valuable than third party overlays — will be data about your customers’ engagement with you. Apply art and science to sort “need to know,” “nice to know,” and “really no need to know” data.
  5. Coming at the world technology first. Technology is an enabler. By itself even the slickest analytics tool can add complexity and expense that subtracts value. Match tools to the task and the talent.
  6. Producing results that are not actionable. Tests so clean that they are isolated from the realities of processes, policies, and regulations take time and effort with no practical impact.
  7. Over-devotion to a method. Agile methodology is a force for speed, focusing teams on what’s good enough. Teams benefit from standardizing cycles — implement, read, react, repeat. But solving for time to the point of inflexibility blocks experiments that only come with time and messiness.
  8. Expecting the test to tell you what to do. Testing produces insights and raises additional questions. Think critically to decide which results to act upon and which new questions are worth pursuing.
  9. Rejecting findings at odds with the status quo. A test or experiment is well designed and executed. The findings fly in the face of orthodoxies. Conflict ensues. Politics takes over. Status quo is maintained. Opportunities are missed. This scene is unfortunately familiar when the data suggest a future that causes discomfort among people satisfied with the current state, or fearful of change.
  10. Not establishing and leveraging an inventory of testing. An inventory affords opportunity to gain speed, connect dots, and provide a knowledge base of discoveries waiting for the right timing to put them to work.

Moving to action through small, manageable steps

How can you make a habit of sensing where there is useful data, analyzing, synthesizing, assessing, and then applying findings quickly? For the answer I sought input from Marcia Tal. Marcia advises executives on advancing innovative data science and technology-based solutions. For much of her career, Marcia was busy inventing, shaping, and leading Citi Decision Management, a powerhouse global function.

A rocket scientist by training, Marcia has a way of patiently breaking down multilayered problems into useful pieces. She defines problem statements to generate tests and experiments. Outputs provide solutions to complex problems by identifying and addressing the component levers. Outputs build momentum for change by creating fact bases that wear down even the toughest resistors and surface unexpected opportunity.

Marcia advises focusing first on internal data sources. She says, “Internal data will always be the most valuable. Other sources won’t replace proprietary data — they can complement it.”5

Not collecting or maintaining a strong customer database? Make it a priority to figure out how to address this gap.

Next, know how the business operates to construct, execute, and interpret tests. Especially important is to understand how customers make decisions that define behavior with the brand.

Marcia shares a wonderful family anecdote highlighting the impact of paying attention to data and seeing insights through to action:

For decades around the 1950s, my uncle Norman owned a store — Norman’s Everybody Store — in Tulsa, Oklahoma. He would ask everybody who came through the door the same question: “What do you need?” That was his way of gathering data. He could sell anything, and he did, from cowboy boots to hats to women’s pajamas to lingerie to men’s underwear. And he applied a very important insight to his location — always next to a bar. What data drove the insight? He knew that people went to the bar after they got paid. So he knew when they had cash, and he wanted to be second in line, after the bartender, to get some of it.

Norman understood his business-model levers. He gathered and applied data about how customer behavior drove sales. He aligned his location strategy to take advantage of a findable and measurable business driver.

At Norman’s Everybody Store, learning from data was as much a part of how things worked as it is today at Pypestream.

Sure, understanding and acting upon customer data back in Norman’s day may look primitive from the perspective of our Amazon world. But what Norman lacked in technology was more than offset by mindset and translation of insights into decisions.

Reminder: don’t let vast data and new technologies take you away from the basics of knowing the operating levers within the business model, including understanding customer behavior. No amount of cool technology or data will make up for lack of attention to these essentials.

Focus on the gaps when designing tests

“You will get lost if you don’t keep going back to the question, ‘What am I trying to do?’” says Angela Curry, former managing director, Global Analytics & Insights at Citi. “Understand the goals. Where do you need insight? Build tests from there.”6

An easy way to gauge a well-defined goal is to fill in the blank in this sentence: “We will be successful in our venture if __________ happens.”

Is success attracting a certain number of users? Getting to a particular level of engagement or a market share goal or a sales volume target? The way you complete the sentence signals the data-driven learning priorities.

Gaps occur because too many teams construct tests without first thinking about what success means. They get caught in the data weeds and don’t build momentum. What is the objective? In post-MVP startups through mature businesses, success means hitting the annual plan. A business unit’s plan numbers may be high level, but they also frame expectations. Every test design should ultimately contribute to achieving the plan.

Your plan may not be as formalized as that of a business unit inside a global, publicly traded company, but whatever plan you have developed defines a horizon line. That horizon line is the direction for test efforts.

Plan

Tackle feasibility questions by considering each business-model lever that might impact the test. Figure out which ones do, and address the consequences. Engaged stakeholders are a big help. You may know your product really well, or be closely familiar with how channels work. Odds are you don’t know all of the operating details. So include people who do. Topics for these conversations:

Design experiments yielding metrics that close gaps between performance and goals. Do you want to acquire more customers, and if so, at what cost? Is the goal to expand margins, and if so, by how much? Is your leverage in pricing, distribution, communications, positioning, or product configuration? Do you anticipate new risks, and if so, how can a test validate or disprove mitigation strategies?

What measurement precision is reasonable and necessary? In a mature space with back history, expect greater precision. But if there is less precedent, get comfortable with directional outputs in the early testing of hypotheses. Be ready to build on each new insight to refine, and then refine some more. Accept that attribution of impact across many variables may be a work-in-progress. You may not even realize what all the variables are until after several test cycles.

By laying out the range of priority test questions before diving into the details, it becomes possible to realize the advantages of an integrated plan and avoid the inefficiencies and errors caused by fragmented testing. This discipline also mitigates hindsight bias risk. When people look at results and say, “We already knew that,” the value of testing is diminished. Validation that doesn’t become over-testing — an excuse to avoid a decision — can be useful. Just be sensitive to motivations and unintended consequences.

Go for impact by striking the balance between testing scope, complexity, and speed. Prioritize the “need to knows.” Push back on “let’s add this” temptations.

A pragmatic way to validate the metrics: create a prototype of the results dashboard before launching any tests. Then ask, “What will we do with these results? Where are the answers to the questions we must answer now?” If the answers to these questions are clear, the metrics are on track. If you are spinning in circles coming up with the answers, take action to add missing pieces or edit out extraneous, low priority data.

Assemble the right mix of people for testing and experimenting success

Pre-digital era testing used internal data sets, complemented by a few external overlays. Now, data is accessible from multiple sources, in structured and unstructured formats, in real time. The multichannel customer experience, with its many dimensions, affects sales, returns, repurchase and referrals, out-of-stock product, and ratings and reviews. This complexity intensifies the benefits of a cross-functional skillset — creativity, critical thinking, analytical, technical, operational — whatever perspectives apply to the business-model levers in the daily activities of running the business.

As with any tough problem, diversity of thought and experience opens the pathways to better, faster solutions.

Even if one person is accountable for all aspects of testing, two persona types make testing and experimenting work better:

The planner: Someone who is able to anticipate whether, what, and when testing will be influenced by the operations, processes, or policies of a business. They account for these realities in test design and planning. In a complex business, it’s impossible to anticipate all connections and how they interact without engaging people who know the plumbing and the wiring. The planner identifies and integrates multiple perspectives. They connect the dots between test design and implementation. They are a strategic jack-of-all-trades.

The implementer: This person has hands-on knowledge of analytics tools and methods. Perhaps they are a mathematician or technologist. Such backgrounds provide foundational implementer skill sets. Data is vast and varied, sources expand, and tools change. So what they know today may be obsolete in a few quarters. That is why the implementer who shines is skilled in how to look for information, read and write code to set up tests, and adapt to new techniques.

The planner and the implementer are both motivated to learn. Both are collaborators: if they are not one and the same person, they need each other.

Both benefit from a sustained ability to figure things out on the fly. What is going through their heads is this: “I don’t necessarily need to know everything, but I can figure things out or tap into others for help.”

The infrastructure and tools for testing and experimenting success

Infrastructure matters. If basics are not in place, pulling off a test plan will be challenging. Assemble the components — data sources, campaign management, analytics software, how data is captured, and how it links to partner systems. Ask:

Just as in test design, go back to the problem statement, strategy, and implementation details to make automation decisions.

Say the focus is to test content. Which content is worth testing? If you are delivering content across channels, is the capability in place to pull the channel messaging together — or will silos limit what can be tested, read, and delivered? If the vision is to achieve the holy grail of one-to-one personalization, how do you create content, subject lines, offers for individuals, and execute dynamically?

Or say the customer journey spans digital and physical channels. Where along the path from creating awareness, to investigating options, to buying and then post-purchasing experiences is value created? Is customer behavior interpretable across channels, or are channel data disconnected? What is the plan to justify or overcome limitations?

Finally, getting full value from infrastructure is more than a function of picking and implementing the right tools. An example of a challenge: A company’s corporate email platform was different than that used by the sales force. There was no automated way to assemble all of the interactions with a customer across both platforms. Focus was placed on coordinating these channels to align infrastructure with the organization’s goal of improving sales and engagement. The business model drivers, power and authority, and politics all came into play to make this happen.

Even if your own skills are cutting edge and the team is dogged, infrastructure predating current data integration and flexibility demands can disable well-conceived testing goals.

Licensing a tool to address a capability gap is a widely used approach. Software providers help set up tests and produce results for digital and mobile testing in real time. Is budget tight? Maybe the latest version of a tool is not necessary. Perhaps it would be smarter to go the do-it-yourself route for a few iterations, confirm you are on the right path, and then choose a solution. Free tools, especially in the early days of figuring out your requirements, may be all you need.

What is the condition of your database? For starters:

Partnering for testing and experimenting success

When to partner is a function of answers to questions including:

Use common sense to invest for impact

Each test is an investment. Don’t avoid testing to protect every last short-term dollar of revenue, but don’t harm the business for the sake of a test, either. Some tests will pay out quickly and some will lead to the next set of tests.

A senior marketing executive at a direct-to-consumer brand serving over thirty million users shared a story about testing to quantify the impact of specific channels on new accounts. The company’s marketing activities bring together search, site, mobile advertising, direct mail, and telemarketing in integrated campaigns.

One of the proposals floated was to measure the contribution of search by turning it off for three months.

CMOs are eager to understand how to attribute marketing spend to each channel. But common sense says that the practical consequences of shutting down a channel as important as search, even if such a tactic could yield a precise answer, likely outweigh benefits.

The CMO or CFO or business unit head may resist carving out even a small testing population. Tests may not produce immediate improvements. Such pushback is shortsighted, but exists in organizations that do not yet fully embrace a testing and experimenting mindset.

Tackle reality

Rarely until teams get into the thick of implementation and executing on findings are the operational requirements for testing and experimenting understood and appreciated.

Be eyes-wide-open:

Bring people along. Getting test results to take hold starts with science and lands squarely with people and their emotions. Learning from tests drives improvement. Learning is not a vehicle for spotlighting failure or mistakes.

Back in Chapter 5, the tools of storytelling were presented in the context of selling the business model to investors. Now, consider how to relate the story that emerges from test results. Do stakeholders prefer charts, words, or colorful graphs? Visualizing data and insight to support audience preferences makes it easier to build understanding and buy-in.

A digital executive and his team at a top insurance carrier led a site redesign that delivered 500 percent growth in leads generated, based on pre- and post-reads. Sounds amazing, right?

The site experience team was celebrating what they saw as a big success. But the company’s sales leaders had a different view. They could not see the impact in the bottom line (or in their commission checks) and as a result, withheld applause.

Of course, lots of factors affect the many sales funnel steps for an intermediated business — from completing a form to getting a call-back, having a meeting, assessing options, completing applications and underwriting, accepting the offer, making payment, and finally closing the sale. But for all practical purposes, it’s a failure of the test that does not bring stakeholders along by translating results into their view of success.

And, in a sector driven by a historically stable distribution model, success in a new channel could be a threat to people whose incentives are connected to the old way of doing things.

The moral of this story: identify stakeholders and get them on board as early as possible, and always before the test is implemented and results are generated. Smart people can discount test results that cause personal discomfort. That’s why building buy-in is as important as getting the design, talent, capabilities, and output reporting details right.

Where do you start and how do you move forward?

Two pieces of advice:

  1. Think big. Start small. Act quickly. Especially start small. Be really specific about the questions you want the tests to answer. You cannot afford to boil the ocean. Some startups succeed by going after a specific learning niche. Even global companies start with focus on a very specific test objective, and then broaden.
  2. Iterate and know when to persist. When statistical knowledge, logic, judgment, and stakeholder needs all come together the testing path will be anything but linear.

A financial institution selling a complex product initiated an experiment whose goal was to understand direct-to-consumer selling dynamics, from marketing message to application submission.

Only a small number of applications were submitted online. So the test was seen internally as a failure. But the team leading the test did not give up. They introduced econometric techniques to determine the overall business impact for the population receiving digital messaging. It turned out that among the digital message recipients, business results were measurably up. Digital communications were effective in driving incrementally more prospects to contact an advisor to seek assistance versus traditional methods. Client preference was to follow up the digital marketing message by speaking with a person, not to submit an application online.

In retrospect, this buyer journey makes a lot of sense given the nature of the product offer. Had the team not persisted to assess results more thoroughly the value of the multichannel experience would have been missed.

Testing and experimenting are pillars of scaling. There will be hiccups no matter how well designed the plan. As with the entirety of the business, the testing process itself will develop and refine with experimentation.

Three Cs of Testing and Experimenting

Capabilities:

Visualizing the Testing and Experimenting Capability

Connections:

Team structure: The Pod

In a conventional hierarchical structure, people’s accountability (and incentives) align to their organizational silo (and manager). An alternative structure: create a testing team whose role is to converge around a problem statement — designing, testing, solving, and implementing — then disbanding.

The “Pod” is a semi-permanent, virtual structure driving speed and solutions. Even if an organization is large enough to dedicate a team, a more powerful approach is to tap into people around the organization for different perspectives, specialized knowledge, and relationships.

Dedicate the implementers full-time to solve the problem statement over a sixty-to-ninety-day period. The implementers might be from analytics, product, UX/design, and channel execution. Legal or compliance should be on the team as partners in making the tests happen, not as approvers at the end of the line.

The traditional way is to assign people from each silo to assist on tests — in addition to everything else already on their plates. The risk is testing sits on the side of the desk. The Pod’s value — a highly focused, get-it-done team — is not realized.

The Pod:

Culture:

Cultural attributes for successful testing and experimenting include:

Chapter summary

Notes

1. The Myers & Briggs Foundation, 2014. Sourced during July and August 2017, http://www.myersbriggs.org.

2. Dax Hamman, partner, Reinvent Partners and Fresh Media, in discussion with the author, May 2017.

3. Erik Asgeirsson, CEO, CPA.com, in discussion with author, July 2017.

4. Deborah Chardt, “Marijuana Industry Projected to Create More Jobs Than Manufacturing by 2020,” forbes.com, February 22, 2017.

5. Howard Lee, founder, Spoken Communications, in discussion with author, August 2017.

6. Rochelle Gorey, cofounder, CEO, and president, SpringFour, Inc., in discussion with author, June 2017.

7. Rosenthal, Elizabeth, “How the High Cost of Health Care Is Affecting Most Americans,” NYTimes.com, December 18, 2014.