RedStone Community: Testing

Showing posts with label Testing. Show all posts

Wednesday, October 23, 2013

Unit testing is out, Vertical Slice Testing is in

We have been doing testing for a long time. Some people are practicing TDD, but I think that’s only 46 people in the world and they all follow my twitter feed.

</attention-seeking-headlines>

We all know the common issues people have with writing large test harnesses around systems: it seems to take a long time, existing code is difficult to test, subsequent changes to code makes a lot of existing tests fail or become redundant, and the test harnesses often can become brittle, in no small part because of the abusive use of mocking.

As a recovering TDD addict, I used to tell people, like many others do, that the issue with their TDD was that they didn’t do it right. If TDD is too hard, do more TDD! In other words, if it hurts, just bang that leg with that baseball bat a bit harder, eventually you will not hurt anymore.

I will come straight out with it: this approach to unit, integration and tdd testing has, by far and large, failed. It’s over. It is time to move on. I have, and I now do VErtical Slice Testing.

A law of diminishing returns

Whatever your test coverage, there is a point where more tests don’t equate more quality. Actually, let me be entirely honest here, I do not believe that a good test harness leads to better perceived quality software for your users.

The people you are trying to please with a good test harness are the ones using your system that are not on a high bandwidth medium with you. Usually that would be your team. To reduce the risk of them experiencing issues, and to increase the chance of them liking your software, you write tests so that their experience is positive.

A long time ago, I realized that testing everything is not always feasible for me, and I practice TDD all the time. A quick mental review of the teams I worked with in the last 12 years I’ve been practicing TDD also tells me that this fact is rather general.

Mocking considered evil

I’ve never been a big fan of mocking frameworks, for one simple reason. Simple stubbing relies on assumptions on the functioning of the mocked part of your system that rarely match reality, for no one is going to look at all the documentation of an API and decompile the code when writing a one-line stub.

As you progress in your implementation, you may learn about more scenarios you didn’t cover, and provided you are in that tiny little fraction of a percent of people refactoring your test harness as you progress your implementation, you may feedback some of that knowledge in previous tests you wrote and in your test doubles, while caressing your pet unicorn and enjoying a bit of sunshine on planet Equestria. Or if you’re like the rest of us, you probably won’t.

Increasing returns for user interfaces

Admitting to myself that I could not fix those problems (you cant fix stupid!), I started trying to understand how I could bring some of the TDD practices I enjoyed (fast feedback, repeatable tests, etc) while increasing the perception of quality, leaving any theoretical or coverage notions behind me. Instead of starting from the components I want to put under test, I start with what my users, hum, use.

If I provide users with an API, I will start by the API that I decide to publish, and consider everything else implementation details. I will only document and polish, a.k.a. test, what other people may use.

All the same, if my user interface is not an API, but some sort of UI, there is very little reason to care about all the scenarios a component may face that cannot be triggered from that user interface.

The process of discovering your user interface has an added advantage. The answer to most but what if questions about such a top-down approach usually unveils an additional user interface you didn’t quite know you had (looking at you, JSON SPA AJAX <insert buzz word of the day> Frankenstein “web” apps).

This is already an understood concept, and is usually referred to as acceptance testing.

At warp speed, everything is relative

A common issue arises from using existing acceptance-driven tools. Automating the browser is slow, automating a DB is as well, so is the file system. Each of those may also fail for reasons that have nothing to do with your code.

That would make your tests brittle and slow, which inexorably will lead to longer feedback cycles, harder to run tests, which would get us straight back to my introduction and why traditional approaches to TDD have failed.

Acceptance testing recommends, to avoid such a problem, the creation of an alternative interface allowing you to remove the browser from the equation. This is no longer necessary. With tools such as zombie being available for free, you can run an in-memory browser that behaves like a browser, runs in-memory and is increadibly fast. No more Selenium issues on running your automated test suite, no interaction with your operating system’s UI library, it’s all fast and beautiful. And if your user interface is an API, unit testing frameworks and test runners have provided those advantages for many, many years.

External-facing components are now in-memory, making executing our tests fast and reliable by not triggering external systems.

I now apply the same concept to internal facing components. Instead of mocking out the inner-most elements, such as the file system, the network or a database, of my system on a per-method or per-component basis, I use libraries that give me, out of the box, an in-memory version of their system that is functionally equivalent to the real one.

It means an in-memory file system would implement the same locking primitives as the real systems, the same rules around reading, writing or seeking data, and be as close as the author of the library can make it to the real system.

In other words, a VEST-friendly library turns the unit tests of that library on it’s head. The component accessing external systems is developed alongside it’s test-double variant, and both are built to react, error-out and validate calls in the same way. The test double can be shipped. I don’t write the mocks, they come as part of the package.

There are many advantages to such an approach. The author of the library knows intimately the contract he provided you as part of his API. The act of providing an in-memory version means this knowledge is now expressed explicitly, forcing error conditions (which are very much part of a user interface in an API) to be accounted for.

A VEST-friendly library will usually end up testing explicit contracts in their test harness, so we go one step further. A library author can ship the test harnesses that exercise the contract they expose for all the conditions that are known to potentially exist, and once again, we turn test code into shipping code: if the author has written a test harness, and the author builds two components implementing an interface, the test harness for the explicit public interfaces can be shipped, as it’s probably already written.

I believe this process to be recursive and leading to the pit of success, as any library built using VEST in mind will naturally feed the VEST approach.

Vertical Slice Testing

The VEST approach replaces all external system calls by functionally equivalent stable and controlled ones, both at the outer-most layer (whatever exercises the user interface) and the inner-most one (whatever touches the low-level IO APIs).

By using VEST, I can focus on delivering web pages or components that focus on user-interaction, I can run the whole system on every test, and do it very quickly. I can change a lot of inside internal code without breaking existing testing scenarios. And should I wish to implement components replacing the ones in my libraries, I can do that without writing tests, because the test harness is already there, testing the contract for me.

Note: As I’m abroad and not available that much, don’t expect quick answers in the comments section but I’ll try my best. Subjects not covered but not forgotten: my team as a user, availability of VEST libraries, code samples, library author burden, brown field projects, generic crazy-talk, unicorns not being ponies, “you never worked in a team!”, “in an ideal world…”, etc.

[Source]

Monday, October 21, 2013

The Power of Split Testing in eCommerce

If you own or manage an eCommerce store, one question matters above all others. You’ll do whatever it takes to find out the answer.

How can I sell more stuff?

There are a ton of different strategies you can use. One is to get more traffic to your store through SEO, paid ads, and social media marketing. Another is to reach out to your customers through email. These are both important.

But once you’ve got traffic, is that it?

Definitely not.

When people have walked through the doors of your virtual store, are they enticed to buy? Or do they run away before they’ve even had a chance to check out a single product?

The design of your website matters. In the real world, it’s called visual merchandising, and bricks-and-mortar stores spend hundreds of thousands of dollars making sure they’ve got it just so.

Likewise, your website copy matters. This is your virtual sales patter, and it’s what convinces your customers to click the buy button.

How can you optimize your store so more of your website visitors become customers?

You can flounder in the dark, guessing at different designs, structures and copy, hoping you’ve got it right this time.

Or you can split test.

Are you naturally curious? Do you want to know why things are the way they are? If you answered yes to either of these questions, then you’ll love split testing. With split testing, you’ll quickly learn why popular websites are designed, structured and written as they are.

You’ll also learn how to make your website the best it can be.

Let’s get going.

What Is Split Testing?

Split testing is a powerful weapon in the arsenal of eCommerce traders. With split testing, you can change any element of your website, and observe how that change impacts your customers’ behavior.

You can see the precise impact of the change, because some visitors will be shown the website as it was before you made the change, while others will only see it after you’ve made the change.

With split testing, you can see whether the changes you make get your customers to:

Click where you want them to click.
Spend more time on a specific page, or on your website.
Sign up to your email list.
Buy a specific product.
Buy more stuff overall.

Why is Split Testing so Powerful?

Sometimes, split testing will have drastic results. For example when the BBC first tried split testing on the World Service website, it found a small tweak to the copy resulted in three times more people clicking a particular link. Similarly, CrazyEgg.com got 64% more conversions from their homepage by adding a short video with the same message as their homepage copy.

It’s actually equivalent to a 165% improvement, because of a process known ascompounding.

Those are some sweet examples, and if you hit the jackpot like that, you’ll of course be delighted you tried split testing.

But most of the time you will only see tiny improvements of a few percent in each split test, if you see any improvement at all. After all, it’s a test, and tests can show that your original website was the best one.

Yet the real power of split testing lies in multiple small improvements.

Let’s say you find 20 small improvements, each of which improve your conversion rate by 5%. That should equal a 100% improvement in sales, right (because 20 x 5 = 100)?

Wrong.

It’s actually equivalent to a 165% improvement, because of a process known ascompounding.

Let’s say you start with 400 conversions a week. Adding 5% makes 420 conversions, or 20 extra than you started with. The next time you add 5%, you’re starting from a base of 420, so the increase this time is 21. It might seem insignificant, but it quickly adds up.

What Can You Split Test?

Split testing can be used for all kinds of marketing, from pay-per-click ads to email campaigns. In this article, we’re focused on what you can split test on your eCommerce website.

You can test anything on your eCommerce site that might change how your website visitors behave. If you can tweak it, you can test it. That includes your design and yourcopy.

You can test simple things such as:

the color of a buy button
product descriptions
product photographs
product page layout
pricing
the design of sign-up forms

Or you can test more complicated things, such as:

The overall structure of your site.
How you categorize products.
How your search bar functions.
Different types of check-out process.

All of these have the potential to make a difference.

When you’re split testing changes on your site, it’s worth tweaking one thing at a time. The smaller you make your experiments, the more aware you’ll be of precisely what works at bumping up your figures. When you know what works in one area of your website, you can apply it elsewhere.

Split Testing Case Studies

When deciding what to split test, it’s worthwhile getting to grips with what has worked for other Internet marketers. Knowing what’s worked well for others can help you make a better guess of what could work for you.

There are a ton of case studies available online. If you really want to geek out, you’ll love the blog Which Test Won?, where you can test your marketing instincts against the results of real split tests.

Here are three tests we love:

One Word, Triple Conversions

The BBC World Service’s web developer didn’t try split testing until 2010. When he did, the results astonished him. Three times more people decided to install the BBC World Service widget when visiting the website. And the change came from adding just one word.

The text on the button to install the widget was changed from Widget to Install Widget. That’s a great example of how tiny tweaks can make a big difference. It also demonstrates the importance of writing clearly.

FULL STORY: BBC A/B TESTING.

Even Web Marketers Need Help Sometimes

Crazy Egg, a tool that lets website owners track the behavior of visitors, turned to web marketing company Conversion Rate Experts for help with split testing.

The result was a massive 363% increase in conversion rates. This is a fine example of compounding in practice, and shows that sometimes it takes lots of little tweaks to make a big difference.

The changes that helped conversions included:

Talking with customers, then writing the homepage copy in the language they used.
Talking with sales staff to find the language they used to sell the product.
Reframing the price, by showing the value it represented compared to other options.
Adding a video to the home page.
Making the home page 20 times longer.
Optimizing the checkout process.

Full story: Does optimization ever end? How we grew Crazy Egg’s conversion rate by 363%.

FULL STORY: HOW WE GREW CRAZY EGG’S CONVERSION RATE BY 363%.

Why It’s Best to Keep Tests Small

When MailChimp noticed it had over 300,000 failed login attempts per month, it knew it had a problem, so it set to work making it easier to login. This included adding social login buttons, such as Log in with Facebook.

The changes helped massively, resulting in a 66% drop in failed logins.

But then the MailChimp CEO stepped in and told his team he was “extremely repulsed” by the social login options and wanted to “restore simplicity”.

The team stood their ground, but the CEO refused to listen. In the end, the social login buttons were removed. Surprisingly, the failed login rate hardly changed.

After some digging, the MailChimp team discovered it was other changes they’d made to the login process that had significantly reduced the failed login attempts.

The lesson? Keep your tweaks and split tests small, so you know what’s really making the difference.

FULL STORY: SOCIAL LOGIN BUTTONS AREN’T WORTH IT.

Split Testing Tools

The easiest way to start split testing on your eCommerce store is to install a split testing app. Shopify, Bigcommerce and Magento all include split testing apps or extensions. These apps all cost money, but it’s a worthwhile investment to make your site the best it can be.

There are a range of split testing apps to choose from, including:

Optimizely. The big draw of Optimizely is its ease of use. Whatever platform your eCommerce store uses, you can be up and running in just a few minutes. What’s more, with Optimizely’s point-and-click redesign option, you don’t need coding experience to set up experiments. Prices start at $19 per month.
Visual Website Optimizer. With prices starting at $49 per month, VWO is at the expensive end of the market. In return, you get heat map reporting, and the option to run split tests on specific segments of visitors.
Experiment.ly. What makes Experiment.ly stand out from the crowd is it’s combo testing feature, so you can test every combination of colors, text and images you can think of with multivariate testing. The starter package costs $27 per month.

Split Test Your Way to Sales

Whether you’re just starting out, or you’re an established trader, split testing can take your business to the next level. By focusing on what makes sales happen, you’ll channel your energies in the right direction.

Have you ever tried your hand at split testing? What were your results?

[Source]