Friday, January 06, 2012

31 Days of Testing—Day 25: Performance Testing, Part 2

Index to all posts in this series is here!

My past post laid out some overview and planning issues around performance testing. This post points out what you might be interested in, and lays out some resources I’ve found very useful.

What do I Monitor?

Figuring out which metrics, measurements, and counters to monitor can be extremely daunting—there are hundreds of individual counters in Performance Monitor alone! In most cases you don’t need anywhere near the entire set of metrics. A few counters will give us all the information you generally need for starting your performance testing work.

Most performance testing gurus will tell you just a few items will get you started in good shape:

    • Processor utilization percentage
    • ASP.NET requests per second
    • SQL Server batch requests per second
    • Memory usage (total usage on the server, caching usage)
    • Disk IO usage
    • Network card IO

If you’re doing load testing you’ll likely be interested in errors per second and queued requests. Often times soak or endurance testing will look to counters associated with memory leaks and garbage collection too—these help you understand how your application holds up over a long period of stress. However, those are different scenarios. The few counters mentioned above will get you started in good shape.

Where Can I Learn More?

Microsoft’s “Performance Testing Guide for Web Applications” is somewhat older, but remains a tremendous resource for learning about performance testing. It’s an extensive, exhaustive discussion of everything around planning, setting up for, executing, and analyzing results from your performance testing. The guide is freely available on Codeplex.

Steve Smith of NimblePros in Kent, Ohio, has been extremely influential in my learning about performance testing. Steve’s been appointed by Microsoft as a Regional Director because of his technical expertise in many areas. He blogs extensively on many software topics and has great practical examples for performance testing. He also has an online commercial course offered through Pluralsight that’s well worth checking in to.

The website Performance Testing has a great number of references to performance testing information across the Web. The site lists blogs, articles, training material, and other highly helpful information.

I’ve recently come across two folks on Twitter who I’ve found a wealth of information from:

  • Ben Simo, aka Quality Frog, writes and Tweets extensively about testing, but also talks specifically about performance issues regularly.
  • Scott Barber has an amazing blog with scads of information on it, plus he Tweets amazingly good reads on a regular basis.

One of the things Scott Tweeted recently was this nice series on web performance optimization. There’s some tremendously valuable information in its articles.

Go! Get Started!

Spend some time planning out your performance testing effort. Make sure you work HARD to only change one variable at a time. Don’t get flooded with information; more often less information can be more helpful at the start.

Performance testing is a tremendous asset to your projects, and it can also be an extremely fun, interesting, and rewarding domain to work in.

Go! Get started!

Thursday, January 05, 2012

31 Days of Testing—Day 24: Getting Serious About Performance

Updated: Fixed wrong day # in title. Duh.

Index to all posts in this series is here!

In this post I’d like to cover something that too often gets ignored: performance testing. I thought I’d take some time to lay down some of my opinions and experiences around performance testing in general.

The phrase “performance testing” can mean a great many things to different people in different scenarios, so covering a few of the different types of tests may be helpful.

Performance Testing is generally an umbrella term covering a number of different, more complex test environments. I’ve also used the term to describe a very simple set of scenarios meant to provide a baseline for performance regressions.

Load Testing generally uses a number of concurrent users to see how the system performs and find bottlenecks

Stress Testing throws a huge number of concurrent users against your system in order to find “tipping points” – the point where your system rolls over and crashes due to a huge amount of traffic

Endurance/Soak Testing checks your system’s behavior over long periods to look for things like degradation, memory leaks, etc.

Wikipedia’s Software Performance Testing page has some very readable information on the categories.

You can also look at performance testing as a slice of your system’s performance. You can use a specific scenario to dive down in to specific areas of your system, environment, or hardware.

Load, stress, and endurance testing are all that, but turned up to 11. (A reference to Spinal Tap for those who’ve not seen the movie.)

With that in mind, I generally think of performance testing in two categories: testing to ensure the system meets specified performance requirements, and testing to ensure performance regressions haven’t crept into your system. Those two may sound the same, but they’re not.

Performance testing to meet requirements means you’ll need lots of detail around expected hardware configurations, baseline datasets, network configurations, and user load. You’ll also need to ensure you’re getting the hardware and environment to support those requirements. There’s absolutely no getting around the need for infrastructure if your customers/stakeholders are serious about specific performance metrics!

Performance testing to guard against regressions can be a bit more relaxed. I’ve had great successes running a set of baseline tests in a rather skimpy environment, then simply re-running those tests on a regular basis in the exact same environment. You’re not concerned with specific metric datapoints in this situation – you’re concerned about trends. If your test suite shows a sudden degradation in memory usage or IO contention then you know something’s changed in your codebase. This works fine as long as you keep the environment exactly the same from run to run—which is a perfect segue into my next point.

Regardless of whether you’re validating performance requirements, guarding against regressions, or flooding your system in a load test designed to make your database server weep, you absolutely must approach your testing with a logical, empirical mindset. You’ll need to spend some time considering your environment, hardware, baseline datasets, and how to configure your system itself.

Performance testing isn’t something you can slap together and figure out as you go. While you certainly can (and likely will!) adjust your approach as you move through your project, you do indeed need to sit down and get some specifics laid out around your testing effort before you begin working.

First and foremost: set expectations and goals.

Ensure everyone’s clear on why you’re undertaking the performance testing project. If you are looking to meet specific metrics for delivering your system then you’ll need to be extremely detailed and methodical in your initial coordination. Does your system have specific metrics you’re looking to meet? If so, are those metrics clearly understood – and more importantly reasonable?

Keep in mind that your customer/stakeholder may be giving you metrics you think are unreasonable, but it may fit business needs of their which you’re unaware of. You have to put in the extra effort to ensure you understand those higher-level needs.

Your customer may also be giving you vague requirements simply due to their lack of experience or understanding. “We want the page to load fast!” is an oft-heard phrase from stakeholders, but what do they really mean?

Define your environment

If those same metrics are critical to your delivery, then they will also need to be defined based on a number of specific environment criteria such as exact hardware setups, network topologies, etc. These environments should be the same exact environment you recommend to your customers. If you’re telling your system’s users they need a database server with four eight-core CPUs, 32 GB of RAM, and a specific RAID configuration for the storage, then you should look to get that same hardware in place for your testing.

(A tangential topic: it’s happened more than once that a server and environment acquired for performance testing somehow gets borrowed or time-shared out to other uses. Timesharing your performance environment can be a highly effective use of expensive resources, but you’ll need to ensure nothing, absolutely nothing, is being utilized on that server once your performance runs start – you have to have dedicated access to the server to ensure your metrics aren’t being skewed by other processes.)

Agree on baseline data

Something that’s commonly overlooked is the impact of your system’s baseline dataset on your performance tests. You likely won’t get anything near an accurate assessment of a reporting or data analysis system if you’ve only got ten or thirty rows of data in your database.

Creating baseline data can be an extremely complex task if your system is sensitive to the “shape” of the data. For example, a reporting system will need its baseline data laid out across different users, different content types, different date patterns.

Often the easiest route to handle this is to find a live dataset somewhere and use that. I’ve had great success coordinating with users of systems to get their datasets for our testing. You may need to scrub the dataset to clear out any potential sensitive information such as e-mail addresses, usernames, passwords, etc.

If using a live dataset isn’t an option, you’ll need to figure out tooling to generate that dataset for you.

Determine your usage scenarios

Talk through the scenarios you want to measure. Make sure you’re looking to measure the most critical scenarios. Your scenarios might be UI driven, or they could be API driven. Steve Smith has a terrific walkthrough of a real world scenario that gives a great example of this.

Set up your tooling

Once you’ve got a handle on the things I’ve discussed above, look to get your tooling in place. Performance testing utterly relies on an exact, repeatable process. You’ll need to do a large amount of work getting everything set up and configured each time you do a perf run. Avoid doing this work manually; instead, look to tooling to do this for you. You shouldn’t rely on doing the setup manually for two reasons. One: automating setup ensures you’ll cut out any chance of human error. Two: it’s really boring.

Build servers like Hudson, Team City, or TFS can interface with your source control and get your environment properly configured each time you need to run a perf pass. Scripting tools like PowerShell, Ruby, or even good old command files can handle tasks like setting up databases and websites for you.

You’ll also need to ensure you’re setting up your tooling to handle reporting of your perf test runs. Make sure you’re keeping all the output data from your runs stored so you can keep track of your trends and history.

Change only one variable at a time. Compare apples to apples!

It’s critical you take extraordinary care with the execution of your performance testing scenarios! You need to ensure you’re only changing one variable at a time during your test passes, or you won’t understand the impact of your changes.

For example, don’t change your database server’s disk configuration at the same time you push a new build to your test environment. You won’t know if performance changes were due to the disk change or code changes in the build itself.

In a similar vein, ensure no other folks are interacting with the server during your performance run. I alluded to shared servers earlier; it’s great to share expensive servers for multiple uses, but you can’t afford for someone to be running processes of any shape or form while you’re doing your performance passes.

Profiling: Taking the simple route for great information

All the work above can seem extraordinarily intimidating. There’s a lot to consider and take in to account when moving through some of the more heavyweight scenarios I laid out in my introductory post.

That said, you can look to simpler performance profiling as a means to get great insight in to how your application is behaving. Profiling enables you to use one scenario, or a very small set, and see in a slice how your application’s behaving. Depending on the tooling you can see results of performance back to the browser, dive in to performance metrics on the server (think CPU or disk usage, for example). You may even be able to dig down in to the application’s codebase to see detailed metrics around specific components of the system.

Profiling is a great way to start building a history of your application’s performance. You can run regular profiling tests and compare the historical performance to ensure you’re not ending up with performance regressions.

Start small, start smart

As you’ve seen in this post, performance testing can be particularly complex when you’re looking to ensure high performance, reliability, and scalability. You need to approach the effort with good planning, and you need to ensure you’re not changing variables as you move through the testing.

Make sure your performance efforts get you the information you need. Start with small environments and scenarios, ensure you’ve clearly laid out your goals and expectations, and keep a careful eye out as you’re running your tests.

Wednesday, January 04, 2012

31 Days of Testing—Day 23: Acceptance Tests & Criteria in the Real World

UPDATED: I goofed and Andy caught it, thankfully. They’re using Watir, not WatiN in their work. I knew that and still fat-fingered the post. Fixed!

Index to all posts in this series is here!

Today’s post is by Andrew Vida, another smart pal in the Heartland region. I’ve chatted with Andy a number of times at various conferences, and I’ve enjoyed hearing about the work he and Bramha Ghosh do at Grange Insurance in Columbus, OH.

We three have spent a pretty good amount of time moaning about our shared pain in getting great, reliable, valuable functional test suites in place. Andy and Bramha are working in Ruby and Watir, but their issues are my issues are the same issues seen in any technology: dealing with data, environments, timing, and of course the inevitable hardest part: “soft” problems in ensuring clarity of communication between folks on the project team.

Andy offered up the following article for my series based on the work they’ve done trying to get a smooth flow around well-defined acceptance criteria. This is a perfect follow on to yesterday’s post by Jon Kruger!


Using Acceptance Tests to Define Done

Have you ever been on a team and was asked "What is the definition of done?"  You respond by saying, "When all of your automated tests pass, and there are no bugs, then you have satisfied the acceptance criteria.  Done!"  Which then is responded to by "Well, how do I define the acceptance criteria?" Good question!

Understanding the Feature

First things first - you have to understand what feature you'll be building. Building the right product and building the product right takes communication and collaboration between your product owner and your team.

The reason for all of the collaboration is that we're trying to build a shared understanding of what needs to be done and also produce examples that are easy to maintain. There are many ways to work collaboratively and ultimately, you have to decide what works best for your team.

The team I'm currently on has found that smaller workshops work best for us. Those workshops, otherwise known as "Three Amigos", include a business analyst, a developer and a tester who share a similar understanding of the domain.

Lets hypothetically say you're discussing a shopping cart feature for your site.  Start by defining the goals of this feature. By starting with the goal, you'll let everyone know why they're spending their time on implementing the feature.  If you can't come up with a good reason why, then maybe the product owner is wasting everyone's time. 

We've used the Feature Injection template from Chris Matts and Liz Keogh to help us successfully describe why:

As a <type of stakeholder>

     I want <a feature>

So that <I can meet some goal>

Here's our feature description:

As an online shopper
I want to add items to my shopping cart
So that I can purchase them

Determining Acceptance Criteria

Next, your team needs to determine what the system needs to do to meet those goals-the Acceptance Criteria.

In your Three Amigos meeting, be sure to ask questions to clear up assumptions, such as "Are there any products that cannot be purchased online?" or "Does the shopper need to be authenticated to purchase?”

Remember, the scope of feature should be high level as we only want to identify what the application needs to do and not how it's implemented. Leave that part to the people that know how to design software. It was determined by the team that the following are in scope:

  • Only authenticated shoppers can add items to the shopping cart.
  • Cannot add refrigerators to shopping cart.
  • Only 25 items can be added.
  • Shopper can remove items from shopping cart.
  • Shopper can change quantity of items after adding it to the cart.

Hey, now we have some acceptance criteria!

Acceptance Criteria lead to Acceptance Tests

We've used communication and collaboration to determine why a feature is necessary and what the system needs to do to at a high level, so now we can come up with some examples to test our acceptance criteria.

To do this, we'll write some Cucumber scenarios.  We've chosen Cucumber for all of the reasons mentioned in Tim Wingfield's post on Day 15. If you haven't read it, go back and check it out.  It's an excellent post on the benefits of employing Cucumber.

Here are a few scenarios that were created:

Given the shopper is a guest
When they try to add an item to their shopping cart
Then they will receive the error "Only authenticated shoppers can add items to their shopping cart."
 
Given an authenticated shopper
When they click the "Add Item to Cart" button
Then they will have an item in their shopping cart
 
Given an authenticated shopper with an item in their shopping cart
When they click the "Remove Item" button
Then that item is no longer in their shopping cart

These are only a few of the examples that were developed as part of the Three Amigos meeting.  On our team, the output of the Three Amigos is a Cucumber feature file.  We now have a shared understanding and a definition of done!  We can pass on our failing acceptance tests to the Dev team to begin their work.  They will begin by creating failing unit tests and writing enough code to make them pass.  Once they are passing, they can then run the acceptance tests.  Once those are passing then the feature is complete.  We're done!  Those acceptance tests will be added to the regression suite to be ran anytime to ensure that the feature remains done.  Now the feature can be demonstrated to the product owner at the next review. 

What we've just done is taken a trip around the Acceptance Test Driven Development cycle.  Just remember, it's not about the tools or the technology, but rather the communication and collaboration.  Our ultimate goal is to deliver high quality software that functions as the product owner intended.  By including QA in the entire process, we can eliminate many of the problems that plague us earlier so that they don't make it to production.  Quality is not just a QA function, it's a team function.

Tuesday, January 03, 2012

31 Days of Testing—Day 22: Why Collaboration Matters (A Real World Example)

Updated: Index to all posts in this series is here!

Today’s post is reposted from Jon Kruger’s blog. Jon is a tremendously smart, passionate indie working out of Columbus, Ohio. I was lucky enough to work with Jon some years back, and I’ve always had great regard for his views and thoughts.

Jon’s post today really hit home for me because it’s all about communication and collaboration early in the cycle. I can’t jump up and down enough about how critical this is--and Jon’s post is a real-world example of why it’s so important.

I read Jon’s blog this morning and immediately pinged him on IM to see if he’d let me drop his article in to my #31DaysOfTesting series. Thankfully he agreed!

Follow Jon on Twitter, and definitely bookmark or subscribe to his blog. Lots of great stuff in both spots!


Just Another Run of the Mill Wednesday

On my current project, we release every 2 weeks. We do the push to production on Saturday, so we set a deadline of Wednesday night for everything to be developed and tested so that we can have two days for demos and UAT.

I remember a certain Wednesday a couple of months ago where things were chaotic to say the least. We looked at the board on Wednesday in the early afternoon and there were 20 items where testing was not complete. We were running around trying to make sure that everything got tested. The entire development team was helping out with testing. Many people stayed past dinnertime to get everything done.

This past Wednesday was much different. Everyone was very relaxed. There was only one item on the board that was still being tested. We were all working on getting stuff ready for the next iteration. And oh by the way, one of the QA testers was out on vacation and another one had been moved to another project.

I immediately thought back to that chaotic Wednesday a few months ago and thought about everything that has happened since then. We certainly had come a long way to get to the point where things were much more relaxed. So what happened?

The Three Amigos

Before development can start on a feature, we have a “three amigos” meeting where developers, business analysts, and QA people get together and decide on the acceptance criteria for the feature. This helps us all get on the same page and make sure that we know what we’re building. It also gets the QA team involved very early in the process, so when it comes time for them to manually test the feature, they already know it inside out.

Automating acceptance tests

The outcome of the three amigos meeting is acceptance criteria. Developers take these and automate them whenever possible (we use a combination of unit tests and acceptance tests using SpecFlow). The development workflow now looks something like this:

  • Work with the QA team to write out the acceptance tests in SpecFlow
  • Develop all of the components needed to make the feature work, writing unit tests along the way
  • Try and get the acceptance tests to pass, fixing any problems we find along the way

When I’m working on the “try and get the acceptance tests to pass” phase, I’m going to find pretty much all of the coding errors that we made during development. The development ticket is still marked as “In Development” at this point, which is very important. We all take quality seriously, both QA testers and developers. I’m not going to hand it over to be tested by the QA team until I can get all of the acceptance tests to pass.

Almost no bugs

Because we knew what we were building up front and because we automated pretty much all of the testing, the QA team is finding very few bugs in the new features that we’re developing. One of our testers brought this up in our retrospective this past week and mentioned how they got everything tested so much faster because they weren’t finding bugs, writing up bugs, waiting for bugs to be fixed, and retesting bug fixes.

We had looked at the schedule earlier in the week and we had thought that developers might have to help out with testing because one of the testers was on vacation and they had some items to test that we thought would take a long time. In the end, no developers had to help with testing and testing got done ahead of schedule!

Everyone talks about how bugs are a waste of time, how they slow you down, etc., but it was really cool to see it play out. Yeah, getting those acceptance tests to pass takes a little extra time, but now I can hand a completed feature over to QA and have a good chance of not having any bugs. We had two developers working for a week on the feature that we completed, and we did it with no bugs. Not only that, we have automated acceptance tests that will do our regression testing for us.

Recap

A lot of the changes that we’ve made seem to be relatively minor, but they’ve produced huge dividends. Much of it comes down to discipline, not cutting corners, communicating effectively, and taking pride in your work. I’m really excited about what we’re going to be able to do from here and I expect to have even more stories to tell in the near future.

Subscribe (RSS)

The Leadership Journey