So I wasn't intending to blog again about The Design of Everyday Things by Don Norman but last night I was reading the final few pages and got to a section titled Easy Looking is Not Necessarily Easy to Use. From that:How many controls does a device need? The fewer the controls the easier it looks to use and the easier it is to find the relevant controls. As the number of controls increases, specific controls can be tailored for specific functions. The device may look more and more complex but will be easier to use. We studied this relationship in our laboratory ... We found that to make something easy to use, match the number of controls to the number of functions and organize the panels according to function. To make something look like it is easy, minimize the number of controls.How can these conflicting requirements be met simultaneously? Hide the controls not being used at the moment. By using a panel on which only the relevant controls are visible, you minimize the appearance of complexity. By having a separate control for each function, you minimize complexity of use. It is possible to eat your cake and have it, too.Whether with cake in hand, mouth, or both, I would note that easy saying is not necessarily easy doing. There's still a considerable amount of art in making that heuristic work for any specific situation.
One aspect of that art is deciding what functions it makes sense to expose at all. Fewer functions means fewer controls and less apparent complexity. Catherine Powell's Customer-Driven Knob was revelatory for me on this:
Someone said, "Let's just let the customer set this. We can make it a knob." Okay, yes, we could do that. But how on earth is the customer going to know what value to choose? As in my first post about The Design of Everyday Things, I find myself drawn to comparisons with The Shape of Actions. In this case, it's the concept of RAT, or Repair, Attribution and all That, the tendency of users to adapt themselves to accommodate the flaws in their technology.
When I wrote about it in The RAT Trap I didn't use the word design once, although I was clearly thinking about it:
A takeaway for me is that software which can exploit the human tendency to repair and accommodate and all that - which aligns its behaviour with that of its users - gives itself a chance to feel more usable and more valuable more quickly.Sometimes I feel like I'm going round in circles with my learning. But so long as I pick up something interesting - a connection, a reinforcement, a new piece of information, an idea - frequently enough I'm happy to invest the time.
I would believe, without any evidence, that a majority of the test community and product development companies have matured in their view on testing. At conferences you less frequently see the argumentation that testing is not needed. From my own experience and perceiving the local market, there is often new assignments for testers. Many companies try to hire testers or get in new consulting testers. At least looking back a few years and up until now.
At many companies there is an ever increasing focus and interest in Continuous Deployment. Sadly, I see troublesome strategies for testing in many organisations. Some companies intend to focus fully on automation, even letting go of their so called manual testers. Other companies focus on automation by not accepting testers to actually test and explore. This troubles me. Haven’t testers been involved in the test strategy? Here are few of my pointers, arguments and reasoning.
Automation Snake oil
In 1999 James Bach wrote the article Automation Snake Oil [see reference 1], where he brings up a thoughtful list of arguments and traps to be avoided. Close to 17 years later, we see the same problems. In many cases they have increased because of the Continuous Deployment ideas, but also because of those from Agile development. That is, if you ignore all the new ideas gained in the test domain as well as all research done.
The miracle status of automation is not a new phenomenon, together with the lure of saving time and cost it is seducing. In some cases it will probably be true, but it is not a replacement of thinking people. Instead it could be an enabler for speed and quality.
Testing vs. Checking
In 2009, Michael Bolton wrote an article that clarified a distinction between Testing and Checking. Since then the definition has evolved. The latest article Testing vs. Checking Refined [see reference 2] is the last in the series. Most of the testers I know and that I debate with are aware of this concept and agree with the difference or acknowledge the concept.
If you produce test strategies in a CI-environment that put an emphasis on automation, and if it means mostly doing checking and almost no testing (as in exploration), then you won’t find the unexpected. Good testing include both.
Furthermore when developing a new feature, are you focusing on automating checks fulfilling the acceptance criteria or do you try to find things that have not been considered by the team? If you define the acceptance criteria, then only check if that is fulfilled. It will only enable you to reach a small part of the way toward good quality. You might be really happy how fast it goes to develop and check (not test) the functionality. You might even be happy that you can repeat the same tests over and over. But I guess you failed to run that one little test that would have identified the most valuable thing.
Many years ago a tester came to me with a problem. He said, “We have 16000 automated tests, still our customers have problems and we do not find their problems”. I told him that he might need to change strategy and focus more on exploration. Several years later another tester came to me with the same problem, from the same product and projects. He said, “We have 24000 automated tests, still our customers have problems and we do not find their problems!”. I was a bit surprised that the persistence in following the same strategy for automation while at the same time expecting a different outcome.
In a recent argument with a development manager and Continuous Deployment enthusiast. They explained their strategy and emphasis on automation. They put little focus on testing and exploration. Mostly hiring developers who needed to automate tests (or rather checks). I asked how they do their test design? How do they know what they need to test? One of my arguments was that they limited their test effort based on what could be automated.
We know that there is an infinite amount of tests. If you have done some research, you have an idea what different stakeholders value and what they are afraid will happen. If that is so, then you have an idea what tests would be valuable to do or which areas you wish to explore. Out of all those tests, you probably only want to run part of these tests only once, where you want to investigate something that might be a problem, learn more about the systems behavior or try a specific, very difficult setup or configuration of the system. This is not something that you would want to automate because it is too costly and it is enough to learn about it just once, as far as you know. There are probably other tests that you want to repeat, but most probably with variation in new dimensions, and do more often. It could be tests that focus on specific risks or functionality that must work at all times. Out of all those that you actually want to test several times, a part of those you plan and want to automate. Out of those that you have planned to automate, only a fraction can be automated. Since automation takes a long time and is difficult, you have probably only automated a small part of those.
If you are a stakeholder, how can you consider this to be ok?
Rikard Edgren visualized the concept of what is important and what you should be in focus in a blog post called “In search of the potato” [see reference 3].
His main points are that the valuable and important is not only in the specification or requirements, you need to go beyond that.
Another explanation around the same concept of the potato is that of mapping the information space by knowns and unknowns.
The majority of test automation focus on checking an aspect of the system. You probably want to make repeatable tests on things that you know or think you know, thus the Known Knowns. In making this repeatable checking you will probably save time in finding things that you thought you knew, but that might change over time by evolving the system, thus evaluating the Unknown Knowns. In this area you can specify what you expect, would a correct result would be. With limitation on the Oracle problem, more on that below.
If you are looking beyond the specification and the explicit, you will identify things that you want to explore and want to learn more about. Areas for exploration, specific risks or just an idea you wish to understand. This is the Known Unknowns. You cannot clearly state your expectations before investigating here. You cannot, for the most part, automate the Known Unknowns.
While exploring/testing, while checking or while doing anything with the system, you will find new things that no one so far had thought of, thus things that fall into the Unknown Unknowns. Through serendipity you find something surprisingly valuable. You rarely automate serendipity.
You most probably dwell in the known areas for test automation. Would it be ok to ignore things that are valuable that you do not know of until you have spent enough time testing or exploring?
The Oracle Problem
A problem that is probably unsolvable, is that there are none (or at least very few) perfect or true oracles [see reference 4, 5, 6].
A “True oracle” faithfully reproduces all relevant results for a SUT using independent platform, algorithms, processes, compilers, code, etc. The same values are fed to the SUT and the Oracle for results comparison. The Oracle for an algorithm or subroutine can be straightforward enough for this type of oracle to be considered. The sin() function, for example, can be implemented separately using different algorithms and the results compared to exhaustively test the results (assuming the availability of sufficient machine cycles). For a given test case all values input to the SUT are verified to be “correct” using the Oracle’s separate algorithm. The less the SUT has in common with the Oracle, the more confidence in the correctness of the results (since common hardware, compilers, operating systems, algorithms, etc., may inject errors that effect both the SUT and Oracle the same way). Test cases employing a true oracle are usually limited by available machine time and system resources.
Quote from Douglas Hoffman in A taxonomy of Test Oracles [see reference 6].
Here is a the traditional view of a system under test is like the figure 1 below.
In reality, the situation is much more complex, see figure 2 below.
This means that we might have a rough idea about the initial state and the test inputs, but not full control of all surrounding states and inputs. We get a result of a test that can only give an indication that something is somewhat right or correct. The thing we check can be correct, but everything around it that we do not check or verify can be utterly wrong.
So when we are saying that we want to automate everything, we are also saying that we put our trust in something that is lacking perfect oracles.
With this in mind, do we want our end-users to get a system that could work sometimes?
Spec Checking and Bug Blindness
In an article from 2011, Ian McCowatt expresses his view on A Universe of behavior connected to Needed, Implemented and Specified based on the book Software Testing: A Craftsman’s Approach” by Paul Jorgensen.
For automation, I would expect that focus would be on area 5 and 6. But what about unimplemented specifications in area 2 and 3? Or unfullfilled needs in area 1 and 2? Or unexpected behaviors in area 4 and 7? Partly undesired behaviors will be covered in area 6 and 7, but enough?
As a stakeholders, do you think it is ok to limit the overall test effort to where automation is possible?
It seems like we have been repeating the same things for a long time. This article is for those of you who are still fighting battles against strategies for testing which state automate everything.
- Test Automation Snake Oil, by James Bach – http://www.satisfice.com/articles/test_automation_snake_oil.pdf
- Testing and Checking Refined, by James Bach & Michael Bolton – http://www.satisfice.com/blog/archives/856
- In search of the potato, by Rikard Edgren – http://thetesteye.com/blog/2009/12/in-search-of-the-potato/
- The Oracle Problem and the Teaching of Software Testing, by Cem Kaner - http://kaner.com/?p=190
- On testing nontestable programs, by ELAINE J. WEYUKER – http://www.testingeducation.org/BBST/foundations/Weyuker_ontestingnontestable.pdf
- A Taxonomy for Test Oracles, by Douglass Hoffman – http://www.softwarequalitymethods.com/Papers/OracleTax.pdf
- Spec Checking and Bug Blindness, by Ian McCowatt – http://exploringuncertainty.com/blog/archives/253
Automating Shared Infrastructure Impact Analysis: Why Monitoring Backend Jobs is as important as monitoring applications
This posting illustrates how to effectively automate shared infrastructure analysis to support both your back-end jobs and your applications.
Last week Gerald, one of our Dynatrace AppMon super users, sent me a PurePath as part of my Share Your PurePath program. He wanted to get my opinion on high I/O time they sporadically see in some of their key transactions on their Adobe based documentation management system. The hotspot he tried to understand was easy to spot in the PurePath for one of the slow transactions: Creating a local file took very long!createFileExclusively takes up to 18s in one of their key document management system transactions
In order to find out which file takes that long to create I asked Gerald to instrument File.createNewFile. This was now capturing the actual file name for all transactions. Turned out we are talking about files put into a temporary directory on the local D: drive.Instrumenting createNewFile makes it easy to see which files were created across many transactions that got executed
Now – this itself didn’t explain why creating these files in that directory was slow. As a next step, we looked at process and host metrics of that machine delivered by the Dynatrace Agent. I wanted to see whether there is anything suspicious.
The Process Health indicated that there is some very aggressive memory allocation going on within that JBoss. Multiple times a minute the Young Generation Heap spikes to almost 5GB before Garbage Collection (GC) kicks in. These spikes also coincide with high CPU Utilization of that process – which makes sense because the GC needs to clean up memory and that can be very CPU intensive. Another thing I noticed was a constant high number of active threads on that JBoss instance which correlates with the high volume of transactions that are actively processed:Process Health metrics give you a good indication on whether there is anything suspiciously going on, such as strange memory allocation patterns, throughput spikes or problems with threading.
Looking at the Host Health Metrics just confirmed what I’ve seen in the process metrics. CPU spikes caused by high GC due to these memory allocations. I blamed the high number of Page Faults on the same memory behavior. As we deal with high Disk I/O I looked at the disk metrics – especially for drive D:. Seems though that there was no real problem. Also, consulting with the Sys Admin brought no new insights.Host Health metrics are a good way to see whether there is anything wrong on that host that potentially impacts the running applications, e.g.: constraint on CPU, Memory, Disk, or Network.
As there was no clear indication of a general I/O issue with the disks I asked Gerald to look at file handles for that process and on that machine. Something had to block or slow down File I/O when trying to create files in that directory. Maybe other processes on that same machine that are currently not monitored by Dynatrace or some background threads on that JBoss is having too many open file handles leading to the strange I/O waiting times of the core business transactions?Background Scheduler to be blamed!
Turned out that looking into background activity was the right path to follow! Gerald created CPU Sample using Dynatrace on that JBoss instance. Besides capturing PurePaths for active business transactions it is really handy that we can also just create CPU Samples, Thread Dumps or Memory Dumps for the process that has a Dynatrace AppMon Agent injected.
The CPU sample showed a very suspicious background thread. The following screenshot of the Dynatrace CPU Sample highlights that one background thread is not only causing high File I/O through the File.delete operation. It also causes high CPU through Adobe’s WatchFolderUtils.deleteMarkedFiles method which ultimately deletes all these files. Some “Google’ing” helped us learn that this method is part of a background job that iterates through that temporary directory on D. The job tries to find files that match a certain criterion, marks them, and eventually deletes them.The CPU Sample helped identify the background job that causes high CPU as well blocked I/O access to that directory on drive D
A quick chat with the Adobe administrator resulted in the following conclusions:
- The File Clean Task is scheduled to run every minute – probably too frequent!
- Very often this task can’t complete within 1 minute which leads to a backup and clash of the next clean up task
- Due to some misconfiguration, the cleanup job didn’t clean up all files it was supposed to. That lead to many “leftovers” which had to be iterated by the Watch Utility every minute leading to even longer task completion time.
The longer the system was up and running, the more files were “leftover” in that folder. This led to even more impact on the application itself as file access to that folder was constantly blocked by that background job.Dynatrace: Monitoring redefined
While this is a great success story it shows that it is very important to monitor all components that your applications share infrastructure with. Gerald and his team were lucky that this background process job was actually part of the regular JBoss instance that ran the application and that it was monitored with a Dynatrace AppMon Agent. If the job would be running in a separate process or even on a separate machine it would be harder to do root cause analysis.
Exactly for that reason we extended the Dynatrace monitoring capabilities. Our Dynatrace OneAgent is not only monitoring individual applications but additionally automatically monitors all services, processes, network connections and its dependencies on the underlying infrastructure.Dynatrace automatically captures full host and process metrics through its OneAgent technology
Applying artificial intelligence on top of that rich data allows us to find such a problem automatically without having experts like Gerald or his team perform forensic steps.
The following screenshot shows how Dynatrace packages such an issue in what we call a “Problem Description” including the full “Problem Evolution”. The Problem Evolution is like a film strip where you can see which data points Dynatrace automatically analyzed to identify that this is part of the root cause. The architectural diagram also shows you how your components relate and cross impact each other until they impact the bottom line: which is the end user, performance or your SLAs.Dynatrace automates problem and impact analysis and allows you to “reply” Problem Evolution to better understand how to address the issue.
If you are interested to learn more about Dynatrace, the new One Agent as well as Artificial Intelligence simply try it yourself. Sign up for the SaaS-based Dynatrace Free Trial.
If you want explore Dynatrace AppMon go ahead and get your own Dynatrace AppMon Personal License for your On-Premise installation.