When can I start performance tuning? How do I monitor ____?

November 16, 2007 – 10:02 am

It seems (at least to me) that often the first thing a performance engineer, especially an inexperienced one, wants to do is start tuning the application- change settings, configurations, etc. to make it go faster. Who doesn’t want to be the “superstar” that saved the company by enabling some feature or setting that improved capacity or performance by 200%? I have even seen occasions where testers sit down to do a performance test plan and practically the first thing they do is start with identifying all of the “tunables” for every part of the application. Too often I hear questions after running a load test that begin with, “What if we change…”

In both the planning of tests and in the analysis, the focus should not be on tuning, but rather understanding performance behavior and its causes. Tuning systems should not be viewed as a task to be undertaken, but rather as a result of testing and identifying problems.

In a similar vein, there is also a mysterious force that pushes performance engineers to want to monitor anything and everything that they can. What is the execution time for this java method? What is the HTTP session size? What is the cache hit ratio for all of my database queries? Which query is the slowest? This is, if you want my unvarnished opinion, nuts. You will drive yourself crazy if you try to monitor too much all at once. When running a test, if you are monitoring 800 metrics on 12 different servers, what on earth are you going to do with all of the data? 90% of it will probably be useless.

So instead, when it comes to tuning and monitoring, this is what I suggest:

  1. Start at a high (system) level of monitoring - the basics: CPU, Memory, I/O
  2. Understand the performance and scalability behavior
  3. Identify where the bottlenecks are, implementing additional monitoring, if necessary
  4. Create hypotheses for root cause of performance and/or scalability problems
  5. Identify fixes to hypotheses, which might involve “tuning” or implementing additional monitoring
  6. Test hypotheses

Using this approach, the tuning and monitoring are well thought out and will have a better chance of success. As you gain experience with an application and learn more, the monitoring will be more meaningful and any tuning will do more good more than harm.

What are your thoughts?

  1. One Response to “When can I start performance tuning? How do I monitor ____?”

  2. Charlie,

    I generally agree STRONGLY with all of your comments.

    I would like to add that another issue is the need to have an understanding of what our quantitative performance results mean. As we test a system, we develop an understanding of what values to expect. Understanding one system can aid in understanding future systems. Some measurements are comparatively easy to understand, e.g., CPU utilization. Other measurements might be hard to understand or misleading, even for seasoned engineers.

    For instance, I was talking to an experienced computer architect about cache miss rates. This person suggested an instruction-cache miss rate of 5% was not very high, which is a reasonable conclusion if you are familiar with DATA cash miss rates, but it is not correct for contemporary instruction caches.

    A common instruction cache line size today is 32 bytes. This means that if each instruction requires 4 bytes, 8 instructions fit in a cache line. Thus, if we are executing straight-line code (no branches), the highest miss rate we will see is if we miss each time we enter a new cache line, which happens 1/8th of the time. This miss rate is 12.5%

    Of course, because of branches, we enter new cache lines more often than once every 8 instructions. But, because of loops, we end up with long periods of execution during which we will see no or very few instruction cache misses.

    A high instruction cache miss rate for the processor above is about 2-4%. I follow the rule that this can be reasonably compared to a DATA cache miss rate by multiplying by the number of instructions which fit in a line. In this case, the result is 16-32% for a data cache, which falls into the range of what we would commonly call a high miss rate for an L1 data cache.

    Cheers!

    -Todd


    Todd Bezenek
    Computer Architect and Performance Engineer

    By bezenek on Nov 26, 2007

Post a Comment