Skip to content

Feed aggregator

The Checking and Testing Debate Explained: Everything You Need to Know…

Gurock Software Blog - Wed, 05/24/2017 - 02:02

 Everything You Need to Know...

This is a guest posting by Simon Knight. Simon Knight works with teams of all shapes and sizes as a test lead, manager & facilitator, helping to deliver great software by building quality into every stage of the development process.

The terms “testing” and “checking” tend to get used interchangeably for activities performed by development teams to verify readiness and/or completeness of software products. Testing and checking could be easily interpreted to mean the same thing. However, as with most words in the English language, both testing and checking are in fact multi-faceted terms, layered with meaning and nuance depending on your context and audience.

For example, if you Google the question “what is software testing?”, you’ll get back a facsimile of the ISTQB definition:

“Software testing is a process of executing a program or application with the intent of finding the software bugs. It can also be stated as the process of validating and verifying that a software program or application or product: meets the business and technical requirements that guided its design and development, works as expected and can be implemented with the same characteristic.”

Receive Popular Monthly Testing & QA Articles

Join 34,000 subscribers and receive carefully researched and popular article on software testing and QA. Top resources on becoming a better tester, learning new tools and building a team.



Subscribe
We will never share your email. 1-click unsubscribes. articles That settles the matter then, right?

 Everything You Need to Know...

You’d think… But on the other hand, Context Driven Testing (CDT) figurehead Michael Bolton asserts that testing is “a process of exploration, discovery, investigation, and learning.” Whereas according to him, checking is “a process of confirmation, verification, and validation.”

In case that’s not sufficiently clear in the context of the checking-versus-testing debate, checking may typically mean activities we can program a computer or robot to do; it’s low-skill, repetitive work. Testing on the other hand, is exploratory and dynamic, normally requiring human intellect, often utilizing a dedicated and skilled resource.

Thanks, I feel much more informed now. So, what’s all the fuss about?

 Everything You Need to Know...

Some people within the wider (i.e. non-CDT) testing community think that the distinction is used as a form of intellectual bullying. The way the difference between testing and checking is debated, taught or otherwise discussed in project teams or within organizations has been perceived to be uncharitable in nature.

Specifically, Brett Pettichord co-author of the classic software testing book Lessons Learned in Software Testing (mentioned in a previous blog here) and Marlena Compton who, on May 18th decided to start a fire on Twitter with the tweets below:

Go and check out the full thread. It makes for an interesting read.

Is it important enough to argue about?

 Everything You Need to Know...

Although it may be counter-productive to bring it up in a conversation with Joe Developer, the distinction between checking and testing is an important one. Understanding the difference between tests that help you learn something new, versus tests that confirm something you already knew (checks) can help to steer a successful testing strategy.

The point of the Twitter debate (in my humble opinion – treading on dangerous/controversial ground here…) was to call out some people in the industry, a minority hopefully, who might use that distinction as a kind of blunt instrument with which to beat developers during discussions around what to test and how: Consultants and Trainers using “testing” and “checking” as buzzwords to drum up business; overzealous CDT followers looking to raise their profile and gain popularity with peers by using trending terminology.

So why do I need to know about this?

 Everything You Need to Know...

What you need to know about testing and checking, is that they’re both valuable strategies in a balanced approach to carrying out your testing.

Test to explore, learn about and detect new bugs in your product. Test to ensure you built the right thing, and that you built it right.

Check to verify that what you think you know about the product is still true. Check to ensure nothing has changed or regressed since you last tested it.

Distinguish between testing and checking to the extent it serves your test strategy well. Use checking as a heuristic to help you determine when, where and how to automate. Use testing for discovery, learning and creativity. Find a balance between the two, and communicate that to your people.

But for goodness sake, be careful how you talk about your testing. By all means replace the word “test” with “check” when referring to your automation stack if it helps. But don’t try and make everyone else do it. And don’t pick other people up when they don’t do it, unless there’s a very compelling reason to do so.

Precise language is important. Relationships more so. Your team will thank you.

Categories: Companies

Dealing With Optimistic Concurrency Control Collisions

Jimmy Bogard - Wed, 05/24/2017 - 00:06

Optimistic Concurrency Control (OCC) is a well-established solution for a rather old problem - handling two (or more) concurrent writes to a single object/resource/entity without losing writes. OCC works (typically) by including a timestamp as part of the record, and during a write, we read the timestamp:

  1. Begin: Record timestamp
  2. Modify: Read data and make tentative changes
  3. Validate: Check to see if the timestamp has changed
  4. Commit/Rollback: Atomically commit or rollback transaction

Ideally, step 3 and 4 happen together to avoid a dirty read. Most applications don't need to implement OCC by hand, and you can rely either on the database (through snapshot isolation) or through an ORM (Entity Framework's concurrency control). In either case, we're dealing with concurrent writes to a single record, by chucking one of the writes out the window.

But OCC doesn't tell us what to do when we encounter a collision. Typically this is surfaced through an error (from the database) or an exception (from infrastructure). If we simply do nothing, the easiest option, we return the error to the client. Done!

However, in systems where OCC collisions are more likely, we'll likely need some sort of strategy to provide a better experience to end users. In this area, we have a number of options available (and some we can combine):

  • Locking
  • Retry
  • Error out (with a targeted message)

My least favorite is the first option - locking, but it can be valuable at times.

Locking to avoid collisions

In this pattern, we'll have the user explicitly "check out" an object for editing. You've probably seen this with older CMS's, where you'll look at a list of documents and some might say "Checked out by Jane Doe", preventing you from editing. You might be able to view, but that's about it.

While this flow can work, it's a bit hostile for the user, as how do we know when the original user is done editing? Typically we'd implement some sort of timeout. You see this in cases of finite resources, like buying a movie ticket or sporting event. When you "check out" a seat, the browser tells you "You have 15:00 to complete the transaction". And the timer ticks down while you scramble to enter your payment information.

This kind of flow makes better sense in this scenario, when our payment is dependent on choosing the seat we want. We're also explicit to the user who is locking the item with a timeout message counter, and explicit to other users by simply not showing those seats as available. That's a good UX.

I've also had the OTHER kind of UX, where I yell across the cube farm "Roger are you done editing that presentation yet?!?"

Retry

Another popular option is to retry the transaction, steps 1-4 above. If someone has edited the record from under us, we just re-read the record including the timestamp, and try again. If we can detect this kind of exception, from a broad category of transient faults, we can safely retry. If it's a more permanent exception, validation error or the like, we can fall back to our normal error handling logic.

But how much should we retry? One time? Twice? Ten times? Until the eventual heat death of the universe? Well, probably not that last one. And will an immediate retry result in a higher likelihood of success? And in the meantime, what is the user doing? Waiting?

With an immediate error returned to the user, we leave it up to them to decide what to do. Ideally we've combined this with option number 3, and give them a "please try again" message.

That still leaves the question - if we retry, what should be our strategy?

It should probably be no surprise here that we have a lot of options on retries, and also a lot of literature on how to handle them.

Before we look at retry options, we should go back to our user - a retry should be transparent to them, but we do need to set some bounds here. Assuming that this retry is happening as the result of a direct user interaction where they're expecting a success or failure as the result of the interaction, we can't just retry forever.

Regardless of our retry decision, we must return some sort of result to our user. A logical timeout makes sense here - how about we just make sure that the user gets something back within time T. Maybe that's 2 seconds, 5 seconds, 10 seconds, this will be highly dependent on your end user's expectation. If they're already dealing with a highly contentious resource, waiting might be okay for them.

The elephant

One option I won't discuss, but is worth considering, is to design your entity so that you don't need concurrency control. This could include looking at eventually consistent data structures like CRDTs, naturally idempotent structures like ledgers, and more. For my purposes, I'm going to assume that you've exhausted these options and really just need OCC.

In the next post, I'll take a look at a few retry patterns and some ways we can incorporate them into a simple web app.

Categories: Blogs

Embracing the Valley’s New Mantra: Sustainable Growth

Testlio - Community of testers - Wed, 05/24/2017 - 00:00

When my husband Marko and I founded Testlio five years ago, we didn’t realize how our own frugality would influence the growth of our company. I’ve constantly stressed the need to keep our burn rate – the amount of capital we spend monthly – as low as possible and to understand the value of everything we purchase. As tech investors have begun valuing sustainable growth over growth at all costs, this strategy has started paying off.

Sustainable growth hasn’t always been in vogue, though. Watching the rise of companies like Uber and Facebook in the late 2000s made you feel like the best way to reach success was to identify a common problem and pour money on it. Sometimes that gives companies enough time to get their business fundamentals in order, but these days it’s increasingly common to see companies raising tens or hundreds of millions go up in smoke.

Startups haven’t always had to think critically about why they’re spending because they’ve never been in a market where they couldn’t continue raising money at higher and higher valuations. That used to be a way to extend your runway and develop a go-to-market strategy, but investors don’t want to see vanity metrics and ballooning overhead that startups used to be able to get away with. Building a stellar product is hard, but mastering the new rules to startup success don’t have to be. Here’s how you can grow your business the right way.

Slowing our burn

Keeping your burn rate as low as possible seems pretty obvious, but you would be surprised how many startups lose sight of this in favor of other goals. As a rule of thumb, most A-series startups aim to keep their burn rate below $10k per employee per month. At our size, we’re well below this benchmark. Keeping our burn low lets us expand our team while giving us as many options down the road as possible. No amount of confidence in your own plan can insure you against the unpredictability that comes with launching and growing a startup. Our runway – the amount of time we have before becoming profitable or seeking new funding – is long enough to weather a change in the funding climate, should there be one.

Building a global team

Distributing our employees across the U.S. and Europe is another key to our success. Estonia is part of our corporate culture and setting up two offices allows us to take advantage of the competitive skill sets both cities have to offer. Estonia a growing startup hub in Europe (Skype was born there) so we recruit in Tallinn for our engineering and product teams. It’s quickly becoming one of the most competitive talent markets in Europe. Estonians respect software engineers without treating them like rock stars, making it the ideal spot to assemble a strong technical team. Likewise, having our business development in the U.S. helps us grow and manage our trajectory as efficiently as possible.

Experimenting with caution

2017 is an exciting time at Testlio, but we still have to be deliberate about our path to success and the steps we need to take going forward. When we look for new ways to connect with our customer, we test each new method individually instead of flooding the zone with different approaches. That way we can isolate what’s working, understand what’s not, and know how to maximize our spend during this crucial phase. We’ve also experimented with different approaches to customer success. We initially envisioned a dedicated CS team, but quickly realized that our sales executives were closest to our customers and best suited to support them. Since then, we’ve made customer success part of our sales team’s DNA and have seen an uptick in positive feedback from customers.

There’s nothing easy about launching a company, building a product users love, and doing everything I’ve described above. But as I’ve told my team, success requires hard work — there’s no getting around it. Startups have a new mantra – sustainable growth – and we’re doing our best to walk the walk.

Categories: Companies

The Dynatrace 2017 EMEA Partner of the Year award winners, headed by YMOR and Omnilogy

The Dynatrace Partner Summit is one of the most anticipated events of the year for us, bringing together 300 of our top partners across primarily Europe, but this year also Asia and North America.

John van Siclen, CEO of Dynatrace opened the summit with a statement that reflects the importance of the Dynatrace partner community:

“We thank our partners for our market leadership in the past, and we look forward to extending this, with you, in the future.”

The awards dinner, where our partners are recognised for highest revenue, marketing excellence, certification, innovation, and services leadership is presented at the awards ceremony on the end of day 1. Michael Allen, VP EMEA and event chair shared:

“We have grown the partner summit from 50 partners in Berlin only a few years ago to, over 300 today. Our partners are essential to the growth of our business, and these awards are a great opportunity to recognise our partners that invest in Dynatrace and the future of APM.”

Partner award winners 2017

Congratulations to all our partner nominees winners. Here is the complete list.

Partner of the Year – EMEA 2017

Winner: YMOR

Nominees:
• T-Systems
• Ymor
• NetQuality
• GFI Informatica

With the largest number of transactions and the most revenue contribution, YMOR won the overall EMEA Partner of the Year award for the first time.

Service Provider of the Year – EMEA 2017

Winner: T-Systems

Nominees:
• T-Systems
• GFI Informatica
• DXC Technology (formally CSC)

Congratulations to T-Systems who wins the Services partner of the year award as a repeat category winner from previous years.

Solution Innovation Award – EMEA 2017

Winner: Atos

Nominees:
• Atos
• DXC
• Evolane

This year’s award goes to a partner that has built a multi-cloud fabric and orchestration system with Dynatrace Managed technology and associated managed services offering at the heart.

Marketing Excellence Award – EMEA 2017

Winner: Innovation Strategies

Nominees:
• CTG
• Innovation Strategies
• Moviri

The award for 2017 goes to another new partner for 2017, who hit the ground running with a major investment into skilling up their teams but also launching the Dynatrace Partnership with a hugely successful customer seminar.

Training and Certification Award – EMEA 2017

Winners: Quenta & Amasol

Nominees:
• Quenta
• Amasol
• GFI Informatica

Quenta promised to get their entire team certified by the beginning of the year, and they did, with nine new certifications. Amasol is the only partner in Europe to have certifications for all of our capabilities.

RFO Award Winners

In the category of RFO (regional franchise offices) we are pleased to announce the following award winners.

Best of the Year Award – RFO EMEA 2017

Winner: Omnilogy

Nominees:
• Mediro
• Red Ocean
• Omnilogy

This award criteria includes revenue and new business. Congratulations to the overall best RFO partner in Omnilogy.

Marketing Excellence Award – RFO EMEA 2017

Winner: Matrix

Nominees:
• Matrix
• Bakotech
• Provice

Awarded to the RFO who has executed a mix of multiple go-to-market initiatives, press releases, case studies, customer videos, lead generation campaigns, and actively engaged with local media partners, demonstrating tangible results.

Training and Certification Award – RFO EMEA 2017

Winner: Mediro

Nominees:
• Performance Expert
• Red Ocean
• Mediro

Congratulations to Mediro, with 14 new technical certification exams this year, assuring that all new hires had an Associate certification and all experience hires were Professional certified.

Net New Logos – RFO EMEA 2017

Winner: Asseco

Nominees:
• Mediro
• Omnilogy
• Asseco

Recognition for the partner that drew the most amount of net new logo growth.

Please join us in sharing the good news:

<Click to Tweet> .@Ymor_nieuws is @dynatrace “2017 “Partner of the Year”  http://buff.ly/2rQtra6

<Click to Tweet> .@TSystems_MMS recognized as @dynatrace EMEA 2017 “Service Provider of the Year” http://buff.ly/2rQtra6

<Click to Tweet> .@Atos receives @dynatrace EMEA 2017 “Solution Innovation Award” http://buff.ly/2rQtra6

<Click to Tweet> .@innovationspain receives @dynatrace EMEA 2017 “Marketing Excellence Award” http://buff.ly/2rQtra6

<Click to Tweet> .@dynatrace presents @QuentaSolutions with EMEA 2017 “Training and Certification Award” http://buff.ly/2rQtra6

<Click to Tweet> .@amasolAG receives @dynatrace EMEA 2017 Training and Certification Award http://buff.ly/2rQtra6

<Click to Tweet> .@omnilogypl  recognized as EMEA 2017 Regional Franchise Office “Best of the Year Award” by @dynatrace http://buff.ly/2rQtra6

<Click to Tweet> .@HurnIct (Mediro) receives EMEA 2017 Regional Franchise Office “Training and Certification Award” from @dynatrace http://buff.ly/2rQtra6

<Click to Tweet> .@dynatrace presents Matrix (Israel) with EMEA 2017 Regional Franchise Office “Marketing Excellence Award” http://buff.ly/2rQtra6

<Click to Tweet> .@dynatrace presents @asseco_see with EMEA 2017 Regional Franchise Office “Net New Logos” partner award http://buff.ly/2rQtra6

The post The Dynatrace 2017 EMEA Partner of the Year award winners, headed by YMOR and Omnilogy appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

4 Ways QA Can Supercharge Your Release Cycle

Testlio - Community of testers - Tue, 05/23/2017 - 19:35

Testing is often seen as the biggest hindrance to release. QA is known as the number one bottleneck. It’s the phase that’s always in the way.

That’s not really fair, is it?

First off, there may be organizational shifts that have yet to occur (ones that would reduce delays), things like breaking down silos, improving collaboration, clarifying requirements. Also, once a critical bug is found, isn’t that a development problem, rather than a QA problem?

Luckily, the agile approach to software development not only combines project phases into short cycles, but it also helps alleviate the blame game by bringing people of different roles together.

The fact is that QA has changed a lot recently. Testing early and often, running automated scripts, and other ways of optimizing the testing process are making it so that QA is (hopefully) no longer associated with the word “bottleneck.”

It would seem now that analyzing the market, getting real customer validation and feedback, and creating project requirements is of a greater concern to speed.

Teams who work together finish together—meaning that there are definitely strategic ways to obliterate QA bottlenecks and get products ready for release on schedule. Here are some of the QA practices that contribute to product readiness and obliterate the idea that testing causes lag.

Collaborate early on with user stories

In a fully collaborative, test left approach, user stories are the first place that developers and QAs intersect. Typically written by product owners and based on documented project requirements, user stories are the initial basis for work done by testers and by developers.

User stories are not intended to tell coders exactly how to code or to tell testers exactly what to test. Rather, they detail what the user will do with the small facet of an application currently being worked on.

From there, user stories are used to write acceptance tests, which are clearer definitions of the tests that must be run in order to determine whether requirements have been met. Using these, developers can know functions are required and know what to code.

A test-driven development approach is increasingly common in agile because it makes use of strategic automation, brings development and testing together, and ensures the maximum amount of code coverage, since code is written to pass the tests, rather than tests being written to pass (or fail) the code.

Have software testers (not developers) do unit testing

Unit testing allows for the testing of small, disconnected components of software. Units should be calling on one action, and never involve complex steps or multiple features. Because unit tests are written with scripting languages, they are typically done by developers and not testers.

Continuing to leave unit testing to developers can actually cause delays because developers tend to view them as non-critical tasks. Developers may be more likely to move on to the next item to code rather than write a script that will test something they’ve just completed in isolation.

But getting these tests done is a must for QA engineers, who need to verify the success of small components as early as possible, so that at the end of the cycle testing is being done end-to-end.

That’s why one of the best practices a QA team can adopt is to learn to do their own unit testing. They’ll increase their skills by learning to code, ensure that they get the results they need to move forward, and free up development to work on other tasks.

In addition, since the unit tests will written by someone other than those who wrote the code, the tests themselves are likely to be more critical and thus more effective.

Stay on top of changes

One reason that QA can fall behind is because they are simply not always in the loop.

It’s not uncommon for QA to be handed code that doesn’t meet requirements. Maybe requirements changed due to time constraints. Maybe there was a push back of one small piece of functionality, or it was decided that user feedback would be collected on the smallest iteration of a new feature before developing it more fully.

If QA engineers feel like they’re routinely in the dark, then they probably also are routinely behind. Unexpected changes can cause for adaptations to test plans and test scripts or can drag out the exploratory phase.

Agile teams notoriously keep documentation processes lean, but any resulting gaps need to be filled somehow.

Maybe it’s with better standups or daily check-ins with representatives from different teams. Fortunately, writing unit tests can help with communication barriers too because testers will know about changes as soon as possible, rather than be handed code that doesn’t match up to their test plans later down the line.

Use the same tools as development

Breaking down barriers between QA and development is essential for continuous testing and continuous delivery. Sometimes those barriers are procedural. Sometimes they’re a problem of project language or skillset differences.

And sometimes those barriers are more tangible. Namely, tools.

When QA and development work in separate environments, knowledge transfer is that much harder. Whether they work inside a platform that handles just about everything or using an integrated development system that integrates with just about everything, the point is that QA and development can work together. This can help streamline collaboration on:

  • Bug fixes
  • Test automation
  • Cycle pivots
  • Requirements changes

Our Testlio developers have worked hard to build extensive integrations into our test management platform, so that even with a remote community of testers on board, there is no barrier to communication or collaboration.

Involved testers are happy testers. Valued testers are happy testers. Testers who have contributed to the meeting of project goals and deadlines are happy testers. When QA teams are supported in the pursuit of practices that obliterate bottlenecks and increase velocity, then not only can businesses get to market faster, but they can also contribute to the skill growth of their employees.

Speed doesn’t have to mean high-stress scrambling at the end. It’s something that’s achieved all cycle long.

For testing that gets your product ready for release on schedule, get in touch with us for a demo.

Categories: Companies

Hunting and Fixing AWS DynamoDB Client Scalability Issues on Tomcat

As a performance consultant, I get called on to address various performance issues. One of our recent scalability issues happened on a micro service exposing a REST API. The micro service runs on Apache Tomcat, on an AWS EC2 instance, in a VPC. It uses ehcache as in-memory cache and Dynamo DB as persistent data source. DynamoDB gets updates from the source system using data pipeline built in Kinesis and Lambda functions.

In this blog, I’ll walk through the steps taken by our performance engineering Melchor to analyze this scalability issue in our Performance Test environment, which tools were used, and how this problem was resolved. I hope you find this useful!

API Scalability Analysis in Performance Test Environment

Price API has a very tight SLA. 99% of requests processed within a minute must show a Response Time of < 20ms under a load of 2000 tps (transactions per second). To keep the number of EC2 instances and number of DynamoDB calls low, we decided for memory optimized EC2 instances and increased the JVM heap size to 100 GB to cache 80-90% of SKU price details in JVM. In performance test, we noticed that we can only meet this type of SLA if all request are served from the cache. During cache warmup, or in case more than 10% of items are not found in cache (Cache Misses), the service would miss its SLA.

The following diagram visualizes our service flow. Consumer API’s calls Price API to look up prices for multiple items and different location ids. The Price API validates if the requested data is in the ehcache. If not it pulls the data from DynamoDB using AWS DynamoDB Client library.

Service Flow when a consumer issues requests to the Price API Micro Service on AWSService Flow when a consumer issues requests to the Price API Micro Service on AWS

To monitor individual service health, we log entry and exit calls of each service invocation in Splunk. We can see how much time is spent in Price API and DynamoDB calls. We also look at AWS CloudWatch metrics to validate response time from DynamoDB. We ensure that DynamoDB query time is between 3-4ms and that DynamoDB has enough read/write capacities as well.

Application Monitoring with Dynatrace AppMon

The Price API team also leverages Dynatrace AppMon for end-to end-transactional tracing and deep dive diagnostics. It is the team’s tool of choice because Dynatrace AppMon is already used for live production performance monitoring of all our services in production. Melchor used Dynatrace AppMon to analyze a spike in response time above the accepted SLA, as neither CloudWatch nor the custom-built logging via Splunk provided an answer to the Price API Team.

Next, let’s walk through the steps in Dynatrace AppMon to see how we identified high response time and its root cause. In case you want to try it on your own I suggest you:

Issue #1: Time Spent in RandomUUID method of servlet API

Once Dynatrace AppMon collects data you can decide whether to analyze in the Dynatrace AppMon Diagnostics Client or go directly to the Dynatrace AppMon Web interface. In the last blog on Hybrid Cloud Patterns we showed how we analyzed our PurePaths in the Web Interface.

In today’s example, we stick with the Dynatrace AppMon Diagnostics Client as we will perform thread dump analysis which is better to be done in that user interface.

Step 1: Analyze PurePaths to understand hotspots

Dynatrace AppMon captures every single PurePath of every single request that was executed. In our scenario we rely on the 100% transactional coverage because most of our transactions we consider slow (>20ms) are considered fast by other tools in the APM space. Other tools would therefore not capture all the details we need to optimize our critical transactions.

In Dynatrace AppMon we typically start by opening and looking at a PurePath. In the PurePath Tree there is a neat option that is called “show all nodes”. Now we not only see critical methods based on Dynatrace’s hotspot algorithm but we get to see every method executed including its execution time contribution and whether that time was spent on CPU, Sync, Garbage Collection or I/O. The following screenshot shows that extended PurePath Tree and it is easy to spot that the method taking most of the time was the nextBytes method. This method already spent 53.33ms getting a randomUUID in our servlet execution, without even reaching the business API code. Remember – our API SLA is 20ms – so we are already more than twice over the limit. We can also observe that nextBytes spends 95% of its time waiting to enter a synchronized code block instead of actually executing code!

The PurePath Tree shows complete transaction flow, executed methods and how long they took to execute. Easy to spot the problematic 53ms execution time of the servlet secureRandom class which also happens to be 95% synchronization time.The PurePath Tree shows complete transaction flow, executed methods and how long they took to execute. Easy to spot the problematic 53ms execution time of the servlet secureRandom class which also happens to be 95% synchronization time. Step 2: Thread Diagnostics to understand dependencies

At this point, we decided to take thread dumps and determine why nextBytes method in SecureRandom class is taking that much time in sync.

Fortunately, Dynatrace AppMon comes with a built-in thread dump analysis feature. Thread dumps can either be triggered on demand, scheduled or triggered by an event. After we executed a thread dump we could immediately see what all threads were doing, and whether they are blocked by other threads.

Dynatrace AppMon comes with a built-in thread diagnostics feature to analyze what threads are doing and how they are cross impacting each other.Dynatrace AppMon comes with a built-in thread diagnostics feature to analyze what threads are doing and how they are cross impacting each other.

It turned out that many Tomcat http-nio and threadPoolTaskExecutor (used for calling DynamoDB asynchronously) threads were blocked because of a single thread executing nextBytes, which is a thread safe synchronized method. All the incoming traffic will pass through this bottleneck since getting a secure SSL connection will use nextBytes (synchronized method) to obtain a secure random thus blocking Tomcat threads.

Also all async threads that call DynamoDB (threadPoolTaskExecutor) will end up blocked since AWS DynamoDB client library requires a randomUUID, and will use the same secure random implementation defined in the java.security of Tomcat.

The Price API also accepts multiple SKUs in one HTTP request, but queries DynamoDB for each SKU in single get requests (sounds like the classical N+1 Query pattern that Andreas Grabner has been talking about). During the services warm up phase, or when we see more than 10% cache misses, the number of nextBytes method invocations increases exponentially by both Tomcat’s and the async threadPoolTaskExecutor threads. Since nextBytes is a synchronized thread safe method we see a huge increase in wait time for all other concurrent invocations of nextBytes. Simliar to the PurePath tree, we can also analyze the full call stack for each thread in the dump – showing us who is really calling into these synchronized methods.

When analyzing Thread Dumps we also get to see the full stack trace for every thread. This helps to understand who calls nextBytesWhen analyzing Thread Dumps we also get to see the full stack trace for every thread. This helps to understand who calls nextBytes

Dynatrace also provides a useful feature of “decompiling source code”. Right from the PurePaths, or from the Thread Dumps, we can get the decompiled version of every single method on the call stack. The following shows us the synchronized nextBytes method:

Dynatrace provides a neat feature called “decompile source code”. Makes it easier to understand what happens within methods we do not have source code access to.Dynatrace provides a neat feature called “decompile source code”. Makes it easier to understand what happens within methods we do not have source code access to. Solution to Issue #1: Time Spent in RandomUUID method of servlet API

We did some digging in the source code of JDK 1.6. Turns out that SecureRandom will seed itself /dev/random or /dev/urnadom. We used strace to identify which source was used in our case. It was /dev/random/. If you want to learn more about this please find more details in the following two links: https://linux.die.net/man/4/random, http://man7.org/linux/man-pages/man4/random.4.html

How we Fixed it

In the $JAVA_HOME/jre/lib/security/java.security configuration file we changed the securerandom from /dev/random to /dev/./urandom which is much faster and does not block the threads as easily as random does.

securerandom.source=file:/dev/./urandom

This can also be achieved by adding the following parameter in the JVM command line

-Djava.security.egd=file:/dev/./urandom

This change allowed our API to operate within define 20ms SLA because we completely eliminated the synchronization overhead!

Issue #2: AWS DynamoDB client metadata cache

After fixing the RandomUUID bottleneck we soon started to see blocked threads again. This time for a different reason. The approach to identify it was similar though.

Step 1: Thread Diagnostics

We went back to creating Thread Dumps using Dynatrace AppMon, which quickly showed us why threads are getting blocked. This time it was due to the add method in ResponseMetaDataCache class in AmazonDynamoDB client library.

The high level thread dump analysis showed us that more than 50% of our threads were in blocking state.The high level thread dump analysis showed us that more than 50% of our threads were in blocking state. Looking at the stack trace showed us that the calls ending up waiting originate in the AmazonHttpClient library.Looking at the stack trace showed us that the calls ending up waiting originate in the AmazonHttpClient library. Solution to Issue #2: Time Spent in AWS dynamoDB client

The default behavior of the Amazon AWS Http Client libraries is to cache response metadata for troubleshooting. For more details check out setCacheResponseMetadata in the AWS Doc.

We changed the behavior to false to prevent this bottleneck when we made calls to DynamoDB through the Amazon Client Library.

Our code change to change the default cache behavior of the Amazon DynamoDB Client Library.Our code change to change the default cache behavior of the Amazon DynamoDB Client Library. Performance after both fixes

After implementing the two fixes described above, Price API could handle peak load within SLA. Thread Dumps during the test showed no blocked threads as well.

No more blocking threads after applying both fixesNo more blocking threads after applying both fixes

And the PurePaths also looked much better!

Transaction Response Time was now within our SLAs as validated through the PurePathsTransaction Response Time was now within our SLAs as validated through the PurePaths

Thanks again to Melchor for sharing this story. It shows us that good performance engineers not only understand how to analyze performance issues, but also work with the underlying frameworks and the engineering team to come up with the right solution. It also showed us that even though we built custom log-based monitoring we could only find and fix it thanks to Dynatrace.

If you want to learn more about how Dynatrace can help feel free to get your own Dynatrace AppMon Personal License or try our Dynatrace SaaS offering with Full Stack Cloud and Container monitoring support.

The post Hunting and Fixing AWS DynamoDB Client Scalability Issues on Tomcat appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Cross Functional Code Reviews

Testing TV - Tue, 05/23/2017 - 16:48
Although nearly every development team uses some form of code review, code reviews are frequently used only among developers. Other developers are certainly a valuable audience for your code, but nondevelopers can also add value by applying their own perspectives to the work as early in the process as possible. This session discusses the benefits […]
Categories: Blogs

Unit Testing: Fakes, Mocks and Stubs

Software Testing Magazine - Tue, 05/23/2017 - 10:12
When you perform unit testing, there are many situations where you don’t have the full code or the right context that is needed to execute it. It might be that part of the code is not written...

[[ This is a content summary only. Visit my website for full links, other content, and more! ]]
Categories: Communities

Strong growth and the largest market share puts Dynatrace at #1. Again.

Gartner, Inc., a leading IT research and advisory firm, has ranked Dynatrace as the number one global Application Performance Monitoring (APM) solution provider, once again. This ranking is based on 2016 market share revenue identified in Gartner’s report: “Market Share: All Software Markets, Worldwide, 2016” for Performance Analysis: APM in the IT Operations Market.

Here’s our interpretation of what this all means for Dynatrace and our customers:

Dynatrace is the stand out market share leader

Thanks to another stellar year, we’ve seen our revenue rise above USD 400M which makes us nearly double the size of the second largest APM player.

For anyone yet to see the overall result, here’s a quick overview of revenue and YoY growth across the industry:

Profitability fuels our R&D and industry firsts

What the market share report doesn’t cover is the profitability that fuels our #1 position.

No other APM provider invests more in R&D than us. While our competitors invest heavily in large scale marketing and advertising campaigns, our 500-strong full time technical team is pioneering the most innovative monitoring capabilities on the planet.

Dynatrace is the only APM company that can claim:

#1  A platform powered by AI for the last four years – light years ahead of our competitors who are only just starting to talk about integrating AI capabilities in the near future

#2  A full stack, single agent solution that’s fully automated

#3  The industry’s first digital virtual assistant – davis

#4  The deepest and broadest cloud and technology partnerships – Pivotal, Docker, Red Hat, AWS to name just a few

#5  Full visibility of every single user, across every application, anywhere in the digital ecosystem.

An agile business model fit for the future

Importantly, under private equity, we’ve thrived with an EBIDTA of 30% or USD 120M in profit.

This sustainable revenue means we can invest heavily in R&D, which in turn allows us to continually redefine our platform so that it’s ready for the future challenges and complexities our customers face.

The world’s leaders choose Dynatrace.

We ran some numbers recently and we’re proud to say that:

Thank you

A big thank you to all our loyal customers who choose Dynatrace for its ability to continually lead and innovate the APM space. We celebrate this win with you.

The post Strong growth and the largest market share puts Dynatrace at #1. Again. appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

GTAC Diversity Scholarship

Google Testing Blog - Mon, 05/22/2017 - 16:03


by Lesley Katzen on behalf of the GTAC Diversity Committee


We are committed to increasing diversity at GTAC, and we believe the best way to do that is by making sure we have a diverse set of applicants to speak and attend. As part of that commitment, we are excited to announce that we will be offering travel scholarships again this year.
Travel scholarships will be available for selected applicants from traditionally underrepresented groups in technology.

To be eligible for a grant to attend GTAC, applicants must:
  • Be 18 years of age or older.
  • Be from a traditionally underrepresented group in technology.
  • Work or study in Computer Science, Computer Engineering, Information Technology, or a technical field related to software testing.
  • Be able to attend core dates of GTAC, November 14th - 15th 2017 in London, England.
To apply:
You must fill out the following scholarship formand register for GTAC to be considered for a travel scholarship.
The deadline for submission is July 1st. Scholarship recipients will be announced on August 15th. If you are selected, we will contact you with information on how to proceed with booking travel.

What the scholarship covers:
Google will pay for round-trip standard coach class airfare to London for selected scholarship recipients, and 3 nights of accommodations in a hotel near the Google King's Cross campus. Breakfast and lunch will be provided for GTAC attendees and speakers on both days of the conference. We will also provide a £75.00 gift card for other incidentals such as airport transportation or meals. You will need to provide your own credit card to cover any hotel incidentals.

Google is dedicated to providing a harassment-free and inclusive conference experience for everyone. Our anti-harassment policy can be found at:
https://www.google.com/events/policy/anti-harassmentpolicy.html

Categories: Blogs

User session analysis and search enhancements

As Dynatrace user session analysis has been in beta release for some time now, we’ve had a chance to address your feedback and improve the views and workflow for some of the use cases. Over the past weeks, we’ve made numerous small, but valuable, improvements to Dynatrace user session analysis and search views. This post brings you up-to-date with the latest changes.

User tag and error events

One of the most powerful features of Dynatrace is its ability to identify users based on user session tagging. This is done either via our JavaScript API or, as you may have seen in an earlier post, by defining user session tags based on page metadata. A new entry type has been added to the timeline called Events. In contrast to user actions like load actions and XHR actions, events don’t have performance relevant timing information, such as user action duration. In addition to support for error events, Dynatrace logs user tag events that indicate when a user ID was added during a user session and error events. More user tag events will be added in the coming weeks. Error events are currently limited to JavaScript errors that don’t occur during a user action. Such standalone JavaScript errors are errors that occur outside of any user action. For example, if a page has loaded and the load action is therefore complete, JavaScript code can still be executed and JavaScript errors can occur (see example below).

Segmentation of user sessions of a single user

When analyzing a single user and their associated user sessions, you likely want to understand the differences between the user sessions, or maybe just find out which of the user sessions meet a specific set of search criteria. For example, you might want to find out which of 25 user sessions from a specific user were made using a mobile device, or which user sessions resulted in JavaScript errors. You can do this by either searching for the user sessions or by selecting errors/crashes from the drop list. You can then analyze errors and crashes on the user session timeline (see example below). user session analysisYou can optionally select up to two filter attributes from the drop lists (see screen resolutions and browser versions selected in the filter droplists in the example below). By sorting or simply by reviewing the values in the two columns, you can decide which user session you want to analyze further. user session analysis

User action count has been added as a new attribute so that you can easily identify user sessions that have a high number of actions.

user session analysis

User session list improvements

Two new columns have been added to the user list to provide you with a quick impression of the average user session duration per user and the average number of “clicks” (i.e., user actions) that were performed by the same user across all that user’s sessions. You can sort the list based on user session duration to find out which users spend the most and least amount of time using your applications.

user session analysisAs the number of filter attributes has increased, we’ve restructured the filter list and grouped the different filter criteria into intuitive categories (see below). user session analysis

Filtering and searching user sessions

For finer, more granular segmentation of user sessions, Dynatrace now supports four new criteria for filtering and analyzing user sessions based on user action data and session duration.

user session analysis

Session duration

Filter user sessions that have the shortest duration or that are within or longer than a specific value or value range (see examples below).

user session analysis user session analysis

User action count

Filter user sessions based on a specific number of user actions (see example below).

user session analysis

User action duration

Filter user sessions that have at least one user actions where the user action duration is faster, within, or slower than a given value or value range (see example below).

user session analysis

User action name

Filter user sessions that include a specific user action. You can add more than one user action to the filter. You can search for all user sessions that include at least one instance of any one of multiple user actions in their click paths (see example below).

user session analysis

Combine user session filters

You can combine multiple user session filters to create complex filters that return only those user sessions that fulfill all filter criteria. The example below shows a search for user sessions that are accessed via a web application on a desktop browser and have session duration longer than 5 minutes. Further, the user action count for these user sessions must be 5 and there must be at least 1 user action longer than 5 seconds. Also, a user action called click on “Login” on page /orange.jsf must occur in the user session for the session to be a match.

user session analysisOf course, you can then chart the results based on different criteria (see example below).user session analysis

The post User session analysis and search enhancements appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Forwarding address

Rico Mariani's Performance Tidbits - Mon, 05/22/2017 - 13:31

You can find me posting here now:

https://www.facebook.com/perftidbits

Categories: Blogs

The A-Z of XP

Hiccupps - James Thomas - Sat, 05/20/2017 - 21:54

After I blathered on and on about how much I'd enjoyed Ron Jeffries' Extreme Programming Adventures in C# the Dev Manager offered to lend me his copy of Extreme Programming Explained by Kent Beck.

Some background from Wikipedia:
Extreme programming was created by Kent Beck during his work on the Chrysler Comprehensive Compensation System (C3) payroll project. Beck became the C3 project leader in March 1996 and began to refine the development methodology used in the project and wrote a book on the methodology (in October 1999, Extreme Programming Explained was published).So I took the book (it's the first edition) and I enjoyed it too, but differently. I might say that if Adventures is a road trip, Explained is a road atlas.

One of the things I liked about Explained (that it shares with Adventures) is the suggestion that only you can really decide whether XP can work in your context, and how. Also that Beck is prepared to offer you suggestions about when it might not.

But the world probably doesn't need any more reviews of this book so instead I'll note that I was a little surprised at the degree of upfront formality (which isn't to say that I don't think formality can license freedom to express yourself); sufficiently surprised that I mapped it to help navigate the rest. (And, yes, that's a map from an atlas.)


Image: Amazon
Categories: Blogs

Software Testers Diary: Building Credibility in the Development Room

Gurock Software Blog - Fri, 05/19/2017 - 18:26

 Building Credibility in the Development Room

This is a guest posting by Carol Brands. Carol is a Software Tester at DNV GL Software. Originally from New Orleans, she is now based in Oregon and has lived there for about 13 years. Carol is also a volunteer at the Association for Software Testing.

It has been a few months, and I still have anxiety about being the only Software Tester in the development team room. There are 8 of them, and 1 of me. I’m worried that I won’t be able to prove my worth, or that I won’t be able to keep up. I’m concerned that this whole experiment will fail and I’ll only prove how useless testers are instead. I think more than I should, about ‘proving’ myself, especially as a non-coding tester.

Transition to The Team Room

 Building Credibility in the Development Room

When I was working in my own office, I didn’t really need to know much about how the program worked. In my new job, Software Testers are expected to know what the program is and isn’t supposed to do. Furthermore, the Testers are required to state why the current behavior didn’t meet expectations if there is a problem. Learning the technical details came naturally to me. I always enjoyed learning how things worked, so that I could do a better job of understanding the program and explaining what was going wrong.

Things are different working in the team room. Our development team has been working with a technology that’s relatively new to them, much less to me. We used to use relational databases and synchronization on our flagship client-server system. On our new projects, we’re using a document database, message queuing, and an event store to transmit and store data. Before I started working in the development room, I worried a lot more about ‘is it working’ or ‘is it not working’. Now that I’m here, it seems a lot more important to be able to fluently talk with developers about how things work.

Learning Experiences

 Building Credibility in the Development Room

One of my biggest focuses has been learning how to use the tools available to me to work through unexpected behaviors. I typically start by looking at the event log generated by the system. When I report a problem, the logs are the first thing the developers will ask for. I’ve also worked hard at paying attention to the other things the developers walk through when examining a problem. I’ve learned how to examine the document database and figure out which documents store data that will be displayed, and which ones are for data that’s used internally. I learned how to examine the message queue audit log to identify what messages have been sent, and what content and header information to expect in those messages. Finally, I’ve learned how to use the event-store to identify all the actions that have been taken against a particular object.

I didn’t know today was going to be the day that I proved my abilities with these new tools. It happened without even thinking about it. I found a good juicy bug, a result in the UI wasn’t what it was supposed to be, so I traced it as far as I could on my own. I called over the product owners who have been reviewing my work, in this case the Lead Developer and Architect. I showed them the problem in the UI, then I shared the error log that indicated the wrong messages had been received. I guided them to the bad results that I found in the database, then I showed them what was in the message queue audit log. I examined the message headers to get more clues about what had gone wrong, and traced the messages in the event-store. I overheard the Architect tell the Lead Developer, “Wow, you see her fly through these tools? She knows what she’s doing with this stuff.” with the Lead Developer responding “Yeah, this is impressive.” Something about that conversation happening right in front of me gave me a huge boost of confidence. By the time we were done, it was obvious they knew I had found something good, and they were really impressed.

I moved through the tools quickly because I had spent hours figuring it all out before showing them; I don’t know if they knew that. Even so, being able to find problems and chase them deeper than the UI feels like a remarkable feat. Something that a useful tester would do. I might keep pushing myself to do even better, because I’ll always want to prove myself, but I’m feeling a lot less worried about whether I’m useful today.

Categories: Companies

OneAgent & Security Gateway release notes for version 119

OneAgent Java
  • Spring Web Services client
.NET
  • Service Fabric SDK 2.5 support
  • Heap and other ETW metrics for .NET Core
Node.js
  • Node.js now reports the loaded packages and versions. To view Node.js package and version detail, expand the Properties section on any Node.js process page.
General improvements and fixes
  • SAP technology discovery
  • Server URL for plugin upload is now stored in a persisted configuration
  • Improved aging of Crash reports, significantly reduces OneAgent consumption of  CPU on machines that have a high number of crash reports
  • OneAgent updates no longer fail when network monitoring can’t be terminated due to pending I/O operations
  • In addition, this version of OneAgent includes many fixes to process grouping, monitored entity reporting, and data presentation consistency
Security Gateway
  • Custom certificates support for Windows
  • OpenStack: Project status is now reported

The post OneAgent & Security Gateway release notes for version 119 appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Cannot record with Chrome 58 (Common Name support dropped from SSL certificates)

Web Performance Center Reports - Fri, 05/19/2017 - 17:23
When Load Tester record a testcase, it impersonates the website – the browser thinks it is talking to the website but it is actually talking to Load Tester’s recording proxy. For this to work with SSL, Load Tester has to provide a certificate that impersonates the website. These are auto-generated on demand and stored within the project workspace. Starting in version 58, Chrome has dropped support for the Common Name (CN) field in a certificate. In this case, the Common Name field tells the browser what domain name the certificate applies to. This field has been deprecated and replaced by the … Continue reading »Related Posts:
Categories: Companies

Business impact analysis now provided with each detected problem

Automatic and multidimensional performance baselining of all your service requests is a powerful way to quickly detect abnormal behavior in your environment. While baseline violations serve as great trigger events for initiating deeper problem analysis, not all potential real user impact can be detected solely based on statistical alerts. This is why each problem discovered by Dynatrace now includes a Business impact analysis section that quickly shows you the impact of the problem on your customers and service calls.

Imagine that one of your backend services begins to slow and that Dynatrace automatically detects and reports on the slowdown. Separately, Dynatrace detects that a process has been shut down for several minutes. However, because your applications and entry point services are currently handling these issues gracefully, you’ll likely miss out on the critical insight that your customers are unable to log into your application or buy your products. With the newly introduced automatic business impact analysis however, a complete analysis of all affected backend service calls brings this and other such issues to your attention so that you can resolve them quickly.

How business impact analysis works

Business impact analysis provides analysis of all transactions, from the problem affected nodes up to their entry points (either a customer facing web application or an entry point service) and collects details regarding all potential impact to your customers.

Note: Immediately following the detection of a problem, business impact analysis collects and analyzes all transactions to provide you with mission-critical information about the problem’s real user impact on your customers.

During business impact analysis, Dynatrace collects and counts the number of distinct real users who have faced the problem so far. For example, the Response time degradation problem below shows a business impact of 350 Impacted users and 1.8k Affected service callsBusiness impact analysis

Click the Show more link within the Business impact analysis tile to view more detail about the affected transactions (for example, the application actions and service methods that triggered the transactions). For example, if a problem heavily affects a login user action, that finding will be displayed here (see example below).

Business impact analysis

Click an affected service call link to focus on the service flow that contains this problem affected transactions (see example below). Service flow enables you to perform further analysis into each individual transaction within PurePath view.

business impact analysis

Limitations

Business impact analysis isn’t triggered for all detected problems. For example, business impact analysis isn’t included with problems detected with synthetic tests, as the result would be only a single transaction. Business impact analysis is also not included with infrastructure-only events, such as CPU spikes, because no relevant transactions exist for analysis. The Business impact analysis tile is only displayed when relevant transactions are available for analysis.

Conclusion

The newly introduced Business impact analysis tile will grab your attention as it quickly shows the potential business impact of each slowdown and error rate increase problem. The results give you the information you need to, for example, ignore issues that only affect a single user while focusing your efforts on business relevant problems that affect hundreds of real users.

Business impact analysis also helps in uncovering problems that would otherwise be impossible to detect based on statistical evidence alone (for example, customers being unable to log into your application).

This feature demonstrates once again how the collection of performance metrics from each individual transaction, when combined with real-time topology information provided by OneAgent—is essential to gaining deep insight into abnormal performance issues within your infrastructure.

The post Business impact analysis now provided with each detected problem appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

7 Things to Know about SaaS Application Testing

Testlio - Community of testers - Fri, 05/19/2017 - 17:00

The Software-as-a-Service model is still on fire, and it will only adapt and evolve from here. Cloud computing allows providers to deliver software in an unfussy, unmessy way. As the saying goes, “nothing to download, nothing to install.” Granted, not every SaaS app is a web app, but most of them are, so that will be the focus here.

Users simply enter their credit card deets, et voila: Access!

That level of ease is somewhat of a double-edged sword, however, because development and QA teams are continually accountable. There’s a big difference between deploying once a week and once a year (as was common with on-premise).

SaaS application testing creates some unique challenges and opportunities for testers. Many of these require more technical expertise, while others push the development of softer skills. Here’s what QA engineers for SaaS platforms need to focus on to enhance quality.

The SaaS model naturally creates competition

Remember in like…2009 when everyone was saying, “There’s an app for that”? Now it feels like we can walk around saying “There’s a SaaS for that” and be absolutely correct.

The SaaS model has so many benefits for providers. They can more affordably deploy software because everything is in the cloud. They can create, deploy, and sell everything online as a one-man show, in fact. The barrier to entry is very low.

At the same time, this model is easier for users. They can jump on board in a matter of minutes. Guess what? That means they can churn in a matter of minutes too. Users can easily be swayed by another tool that they heard about from a colleague or that they found in a Facebook ad. It’s much harder for an entire organization to switch products than it is a single user, but it’s still far easier than if that product were sold in one large sum (rather than a subscription), because that model requires more of a substantial investment and creates “lock-in.”

So, SaaS has led to increased competition in every software category, meaning companies need to be ever more vigilant about providing a quality experience every second of the day.

Less testing of software elements but more demand on software testing

There are certain things that don’t have to be tested with SaaS applications because they don’t exist:

  • Installation on a client or server
  • Support for multiple versions
  • Support for different platforms and backends

Even though there are fewer facets and components with software testing for SaaS, there’s actually greater demand on testing. This is due to frequent releases and the customer expectation of quick fixes. If something isn’t working, a user is likely to check back a couple hours later, expecting it to work.

Test cycles have to be short, fast and partially automated

Speed is critical in SaaS application development and testing. By breaking releases down into much smaller components and testing as early in the iteration as possible, the agile methodology is the answer to the issue of speed.

Short cycles allow enhancements to reach customers much sooner, so QA engineers are constantly pushed to innovate their processes in order to stay on-cycle with development. Unit testing, the simulation of incomplete components with service visualization, and the automation of regression testing and critical service calls are all ways that QA can run simultaneous tests, rather than wait until the end of a cycle—a real killer when it comes to SaaS.

Performance testing ensures reliability in a shared database

With on-premise software, a user’s experience is tied most closely to his own environment and the behavior of his own organization. With the SaaS model, his experience could be affected by people on the other side of the world.

That’s why performance testing is critical for SaaS. The requirements should be spelled out crystal clear, and never left vague:

  • What is the workflow of each task and how complex is it?
  • What is the expected delivery speed of each web service?
  • How many customers use the platform how often?
  • What is expected of all integrated applications?

Either via automated simulations or manually orchestrated events, QA teams must recreate maximum-use scenarios to conduct stress testing and endurance testing.

QA can help validate the ability to scale

Not only can performance testing validate the reliability of the application for the existing user base, but it can also help determine the platform’s ability to scale.

Communication within an organization is key. Is there a new promotion expected to bring in a massive influx of users? The QA team needs to be aware. The throughput of various workflows must be tested, as well as the capacity of the entire system.

Testers must become experts in customer experience

Because of the increased competition, the success of a SaaS platform comes down not just to user experience, but the overall customer experience. In terms of UX, the app must flow, it must be intuitive and easy to navigate and enjoyable to use.

But testers must think in terms of CX as well. Is the overall experience consistent? Is the app delivering on the promise made by the brand? This can come into play with the in-app support experience, the UX copy and the brand voice on any explainers, walk-throughs, or in-app messages, and any external emails or notifications triggered by the app.

Testers must not only examine the pass/failure of these elements but also simultaneously evaluate how they serve the customer.

Actual (not theoretical) usage should drive test prioritization

By capturing and using customer analytics, QA teams can know exactly which browsers and versions they need to support. Different browsers are used in different parts of the world, and by different audiences. Discovering those popular amongst your own user base is invaluable to focusing testing efforts.

Making use of real user data is critical for enhancing the real user experience. Testers should implement available metrics like performance and error rate to prioritize the testing of problem areas.

It’s also possible to uncover users’ favorite features, meaning that testers can identify those areas that deserve the most amount of attention for writing automated scripts and/or manually exploring.

Overall platform goals: keeping it lean or time to grow?

No matter what sort of application QA engineers are testing, it’s important to understand the business goals for the platform, to know what it is, what it isn’t, and who it serves.

But particularly in an arena with so much competition, and one in which providers can easily pivot, it’s important that any suggestions or enhancements be made in complete alignment with the goals of the entire organization.

Understanding whether now is the time to stay minimal or add capabilities can make it more likely that QA efforts are contributing the maximum amount of value.

Quality assurance is, like any field, ever-changing, and the explosion of SaaS applications in all industries has caused some rapid changes recently. With SaaS, there’s an extra layer in the focus of QA: the need to consider the user base and platform experience as a whole, rather than just the functionality in small environments.

The heightened competition is a direct reflection of the model itself, and excellent QA is a secret weapon used by smart companies to help combat it.

Does your SaaS application deliver on the promise of an awesome user experience every day for every user? Get in touch with us for a demo to see how we can help.

 

Categories: Companies

IIB Integration With UrbanCode Deploy

IBM UrbanCode - Release And Deploy - Thu, 05/18/2017 - 23:20

IBM Integration Bus uses Enterprise Service Bus technology to allow for communication between various business applications. IBM UrbanCode Deploy provides an Automation integration that works with both versions 9 and 10 of IIB. See Installing Plug-ins In UrbanCode for help installing plug-ins in IBM UrbanCode Deploy.

This post will address two main topics regarding IIB and UCD. First and foremost we will go over some concepts related to IIB, namely the hierarchy of resources that exist in IBM Integration Bus. After we have a better understanding of the concepts behind IIB some common use cases and plugin steps which originated the need for an integration with UrbanCode Deploy will be demonstrated.

IBM Integration Bus Concepts Queue Managers

Queue managers handle queues that store messages until an application is ready to process them. IIB version 9 requires that an Integration Node is always associated with a Queue Manager, while IIB version 10 does not. The reason for this is that WebSphere MQ is no longer a prerequisite to run IBM Integration Bus beginning with version 10. This allows you to develop and deploy applications without WebSphere MQ, and instead directly in IIB. This distinction is evident in the plugin, where when connecting to a version 9 integration node you must first specify a queue manager, while with version 10 you may connect directly to the integration node.

A distinction must be made here relating to MQ Nodes. Multiple types of MQ Nodes exist, including MQInput and MQOutput nodes. An MQInput node can be used to receive messages stored on a queue of an MQ Queue Manager, while MQOutput nodes are used to send messages to a queue. So, when deploying message queues that are defined using MQ Nodes a Queue Manager is required.

Integration Nodes (Brokers)

Message flows define a series of steps to be run when an integration node receives a message. Integration nodes host message flows and route messages as defined in the message flow.

Integration Servers (Execution Groups)

Integration servers are groups of message flows managed by an integration node. The integration node makes sure that the message flows operate in separate address spaces when they are on separate integration servers.

Broker Archive Files

As the focus of this post pertains to integrating IIB with UrbanCode Deploy, the details behind BAR file creation will not be covered. However, the knowledge center fully explains this concept in the Creating a BAR File page in the IBM Knowledge Center.

Note that the terms ‘integration node’ and ‘broker’ are used interchangeably, just as the terms ‘integration server’ and ‘execution group’ are interchangeable.






Integrating IIB with UrbanCode Deploy

Provisioning Your Environment

Sometimes, the most difficult part of automating a deployment is determining the structure of your resource tree within UCD. Determining an efficient scheme for what to treat as a resource or a component can save time and complexity.

A good recommendation is to follow the structure outlined within IIB. Note that if you aren’t using WebSphere MQ, it is not necessary to treat a Queue Manager as a resource. However, if you are, the “IBM WebSphere MQ Explorer” may be a good guide as to what resources need to be created.

As an alternative, you may employ the ‘mqsilist’ command line script located in your IIB’s installation directory to outline your resource tree. Running the mqsilist command by itself will list all integration nodes and the queue manager that they are associated with. You may list execution groups managed by a given broker by supplying the broker name as a command line argument to the script (i.e. ‘mqsilist brokerName’). It may be necessary to run the ‘mqsiprofile’ script first, which will export some necessary environment variables. Please see the mqsilist Command page in the IBM Knowledge Center for more information on this command.

The Navigator pane of the IBM WebSphere MQ Explorer.

Generally an IIB resource tree will involve the queue manager at the top level (if one is being used), the integration nodes associated with that queue manager, the execution groups belonging to each broker, and then the BAR files that are to be deployed to each corresponding integration server.

As a disclaimer, however you decide to provision your resource tree I strongly recommend having integration servers (execution groups) as their own components. As an example, consider the following use case. A user has three execution groups in their IIB environment (EG1, EG2, and EG3). Each execution group belongs to the integration node (broker) named BRK1. They also have ten BAR files to deploy amongst these execution groups (BAR1 to EG1, BAR2 to EG2, BAR3 to EG3, BAR4 to EG1, BAR5 to EG2, BAR6 to EG3, etc…) However, before each deployment the user must make some configuration changes to the execution group that is going to be deployed to and then restart that execution group.

With each execution group as their own separate component we will have three components within UCD. Each of the execution group components will have their own component process to set the properties on the execution group and then restart that execution group. We will also have ten components for each of our BAR files. These components will exist under the execution group component that they are to be deployed to within the resource tree, so that they may access it’s properties. The BAR files components will each have a component process to handle the deployment logic.

Now, within our UCD application we can create an application process to run all of the execution groups component processes to set properties and restart. This logic may run in parallel using the “Install Multiple Components” step. After the execution groups are properly restarted we can have a second application process to run all of the deployments in parallel.

Imagine if this user didn’t have components for their execution groups, and instead used properties on their BAR file components to determine which execution group that BAR file would be deployed to. There would be no way to execute any logic that is specific to the execution group itself. The user could add a component process to restart the execution group referenced by the property on their BAR file component and run all of those in parallel. However, this would mean that ten different components would be run at the same time to restart potentially overlapping execution groups. The BAR1 component would begin execution to restart EG1, while at the same time the BAR4 component would begin restarting the very same execution group. Whichever process finishes first would succeed, and the subsequent process would fail during the middle of execution. These race conditions can be avoided easily by justifiably allocating a component for each execution group within your environment.

A resource tree provisioned with queue managers, integration nodes, integration servers, and BAR files.

The idea behind provisioning your resources in such a way is so that properties from each of the resources can propagate downwards. So, when running a deployment using one of your BAR file components it has access to properties on the parent execution group, broker, and queue manager.

To save time you may choose to create some general components to represent each type of resource (queue manager, broker, and execution group) and configure each with properties that correspond to that resource in IIB.

We will specify the following resource properties on our resources (Note that you may specify any property names that you want. These are just the default values for the fields in the IIB plugin. These properties will be referenced directly in the plugin step fields of your component processes).

  • iib.queueManager
  • iib.brokerName
  • iib.executionGroup

For demonstration purposes I have created the three components: MQ Queue Manager, IIB Broker, and IIB Execution Group. This way we can define the iib.queueManager property on the MQ Queue Manager component, the iib.brokerName property on the IIB Broker component, and the iib.executionGroup property on the IIB Execution Group component.

Resource property definition on the IIB Execution Group component.

Now that we have our IIB Execution Group component in place, if you want to add a new execution group named “NewEG” to your resource tree you may navigate to your “Resources” tab in UCD and add a new execution group component. The name that you specify to the newly added resource should be used as your iib.executionGroup resource property definition to keep all of your resources nicely organized by name.

Adding a new execution group to the resource tree.

Specify the name of your execution group as your ‘iib.executionGroup’ property.

The same procedure may be used to create your queue manager and integration node resources, with careful consideration for the resource tree hierarchy. The BAR files that we want to deploy will require a different strategy to model within UCD. The strategy that was used to add our queue manager, execution group, and broker resources to the resource tree won’t be sufficient. Each BAR file needs it’s own separate component in UCD since each deployable BAR file will require it’s own set of component versions to be deployed.

The easiest way to go about creating BAR file components would be to employ a component template. This tutorial employs a simple component template using the ‘File System’ source configuration type, directly supplying the path to the individual BAR file on the agent system.

A simple component template for creating BAR file components.

This template can now be used to model all BAR files that will be used for deployment. Adding a component property to the template to refer to the name of the BAR file will allow us to dynamically resolve this name during runtime. Please see the section devoted to executing BAR file deployments entitled Override BAR Properties and Deploy for configuring these component properties.

Creating a new BAR file component using the component template.

After all BAR files have been accounted for, the components may be added to the resource tree under the desired execution group resources (follow the guidelines for the resource tree that we determined earlier.)

Completed resource tree with BAR files added to parent execution groups.

Three additional resource properties that you will want to set are iib.mqsiprofile, iib.version, and iib.jarPath. Mentioned earlier in this post is the mqsiprofile executable script that exists in your IIB installation directory. This script is used to set some necessary environment variables to run the IIB command executables. The mqsiprofile script must also be run before using the IBM Integration API. For this reason you will notice a field on all IIB plugin steps that use the IBM Integration API called ‘MQSIPROFILE Executable’. You can provide the path to your mqsiprofile in this field so that your command environment is properly configured, which is located by default in your IIB installation directory. Setting this location as a resource property will make it so that you don’t have to continually specify it on each step. For this example I’ve set the ‘iib.mqsiprofile’ property on the agent resource, which can be referenced in plugin step fields as ${p:resource/iib.mqsiprofile}. The iib.version property is also necessary to determine the version of the IBM Integration API to use when running plugin steps. This property may also be defined at the agent resource level.

For this example I will also set the JAR path property at the agent resource level. Another possible place for this property would be on the environment since it references a path specific to your IIB environment, as does the iib.mqsiprofile property. For specific details as to what JAR files must be specified for your version of IIB please see the section labeled IBM Integration API JAR Files.

Setting the ‘iib.jarPath’ property at the agent resource level.

Running Plugin Steps

Now that we have a proper resource tree, we can begin configuring steps to run using our resources. This tutorial will focus on some of the most common steps in the IIB plugin. See a complete list of all supported steps of the IIB plugin here. The commonly used steps that we are going to cover include Create Integration Node, Create Execution Group, Set Execution Group Properties, Override Bar Properties, and Deploy.

Before we begin, we will want to determine the scope of each step. In order to do this we have to determine the properties required during each step. We will do this for each step as we continue with this post.

The plugin itself contains two different classifications of steps that can be run. There are those plugin steps that act as command line wrappers for the IBM Integration Bus Commands. These steps execute the command scripts that exist within your IIB installation location under the bin directory.

When running steps that rely on the IBM Integration Bus commands you must specify your IIB installation directory. The field in the plugin step is labeled “IIB Installation Directory.These steps include:

  • Create Integration Node
  • Delete Integration Node
  • Start Integration Node
  • Stop Integration Node

IBM Integration API JAR Files

The other classification of plugin steps rely on the Java IBM Integration API and do not require the installation directory to function. However, some other additional properties are required for each of these steps. With these plugin steps you will need to specify the location of the IBM Integration API JAR files from your IIB server. The default location for these files differ based on the version of IIB. With IIB version 9 the default location of these files is within the java/lib directory in your WebSphere MQ installation directory. With version 10 of IIB the default location of these files will be in the classes directory of your IIB server installation directory.

The JAR files required also differ based on the version of IIB. Here are the required JAR files that must be specified based on version:

IIB Version 9 IIB Version 10 ConfigManagerProxy.jar IntegrationAPI.jar com.ibm.mq.jar ibmjsseprovider2.jar

Steps that require access to these JAR files to utilize the IBM Integration API include:

  • Set Broker Properties
  • Create Execution Group
  • Restart Execution Groups
  • Set Execution Group Properties
  • Set Message Flow Properties
  • Create Or Update Configurable Service
  • Delete Configurable Service
  • Deploy
  • Start Message Flows
  • Stop Message Flows
  • Override Bar Properties
Create Integration Node

Note that this step makes use of the MQSI script files located in your IIB installation directory. For this case we should create another resource property to specify the location of this directory. The location of this property is subject to your specific use case. The most likely location would be as a resource property on the agent resource, since we are specifying a directory on the agent itself within our IIB environment. There are a few other steps in the plugin that use the MQSI scripts and rely on this property as well. These steps include Delete Integration Node, Start Integration Node, and Stop Integration Node as noted earlier in this post.

Adding the iib.installDir property to the agent resource.

If you are running IIB version 9, you must also specify the Service User ID and Service Password of the user that the new broker will run under. Earlier in this post we’ve dramatically simplified things by creating a general component to represent our integration node. If you missed it, see the section Provisioning Your Environment.

First what we will want to do is add a new operational component process to our general integration node component created earlier. The logic for this new process will handle the creation of the integration node.

Add the component process step to create the integration node.

After this component process is added to the generic IIB Broker component, the component can now be used to create all future integration nodes in your resource tree. Note that in the future you may add new component processes to the generic component that will automatically be added to the broker components that you have created.

Add the new component to your resource tree under the Queue Manager that it belongs to if applicable.

Choose the generic “IIB Broker” component and specify the name of your new broker to create.

The remaining setup is simplified by the fact that we’ve setup all of the required properties to run the step on our resources. The final steps are to add your components to the application, create an application process, and add your resources to the environment within that application. For this you will want to create an application and add the ‘IIB Broker’ component to it.

Adding your IIB Broker and IIB Execution Group components to an application.

Next, you will need an application process to execute the ‘Create Integration Node’ component process. Since we’ve created the component process as an operational process, it does not require a version to be specified on the component when executing. Operational processes are located in the process designer under the folder pertaining to it’s component. Note that when this application process is executed it will run the step for each ‘IIB Broker’ component that has been added to your environment. If you only want to run specific execution groups in this environment you can tag those added components and specify the tag in the ‘Limit to Resource Tag’ field. Another option is to only add the desired IIB Execution Group components to your environment.

Application process created to execute the ‘Create Integration Node’ component process.

Finally, you will want to create an environment within that application and add your resources to it. In this instance I’ve added my agent and all resources under are also added to the environment.

Adding resources to your environment.

When your application is ready, you can run the application process in your environment from the ‘Environments’ tab in the application.

Create Execution Group

The setup for the Create Execution Group step will be very similar to the above step. The big difference is that you will have to specify the path of your IBM Integration API JARs. For this example I will set the JAR path at the top level of my resource tree, which in this case is my agent. Another good place for this property would be on the environment since it is going to reference a path specific to your IIB environment.

Setting the ‘iib.jarPath’ property at the agent resource level.

After the jarPath property is configured, you can create a new operational component process on your ‘IIB Execution Group’ component to run the Create Execution Group step.

Configure the Create Execution Group step on your IIB Execution Group component.

The remainder of setup for this step involves adding the IIB Execution Group component to your application and environment, and creating an application process to run the operational process. This part of the tutorial was covered in the ‘Create Integration Node’ subsection of the Running Plugin Steps section of this post. Note that you will need to add a new ‘IIB Execution Group’ component to your specific ‘IIB Broker’ component.

Add a new IIB Execution Group component resource to the broker component resource that you want to create the execution group under.

Set Execution Group Properties

We won’t go into depth on the setup for this step since it is essentially the same as the ‘Create Execution Group’ step explained above. Instead, we will focus on how to determine which properties can be set on the execution group. The Integration Server HTTP Listener Properties page in the knowledge center provides you with some insight and examples of what properties you may change on the execution group. However, your IIB installation provides a helpful executable that you can run in your IIB installation directory called mqsireportproperties that you can use to explore all properties on the execution group. You will have to specify the broker the execution group exists under and the execution group name itself. Please see the mqsireportproperties Command” page in the IBM Knowledge Center for usage.

List all objects of the execution group that you can configure properties for.

You can use the same executable to list all properties of a given object for that execution group.

Run the mqsireportproperties executable to list all execution group properties.

In this case I am listing all general properties for the execution group. You may choose any of the objects to report the properties of, such as the HTTPConnector object. When you have discovered a property that you wish to change you may set it’s value in the ‘Properties’ field of the step using the syntax ‘ObjectName/PropertyName=Value’. For instance, if I wanted to change the maximum number of HTTP connections for my execution group at a given time to 128, I would specify ‘HTTPConnector/maxThreads=128’.

Note that not all properties are configurable, and the request to change a non-configurable property will result in a rejected response from the broker. The knowledge center section on Changes to properties that are associated with integration servers provides some examples as to what properties may be configured but doesn’t cover all of them.

Broker rejecting a request to change a property that is not configurable.

Override Bar Properties and Deploy

The Override Bar Properties step is executed before a deployment in order to change configurable properties on a broker archive file. If you are looking to create a new BAR file for deployment, please see the mqsicreatebar Command page in the knowledge center.

Customizable BAR file properties exist in the deployment descriptor of the BAR file in it’s META_INF/broker.xml configuration file. For a list of properties on your BAR file you may use the mqsireadbar executable script located in your IIB installation directory. For details about using this command executable please see mqsireadbar Command. The following is an example of reading the pager.bar sample BAR file provided with a successful IIB version 9 installation.

Executing the mqsireadbar command using the pager.bar file.

In this example, if I were to change the queueName of the TEXTMESSENGER queue to MYMESSENGER, I would specify ‘TextMessenger#TEXTMESSENGER.queueName=MYMESSENGER’. The Override BAR Properties step allows you to either specify individual property names and values (name=value) or to specify a properties file. The properties file must consist of a series of individual property overrides.

Example properties file to override properties on the ‘pager.bar’ file.

For the creation of the BAR file component we will use the IIB BAR File template that we created in the Provisioning Your Environment section of this post. If you plan to override BAR file properties with each deployment it may make sense to add a property to your template that will allow you to easily specify overrides during component creation. To do this add a new component property definition to your IIB BAR File component template.

Add a component property to specify BAR file overrides during component creation.

Now, using the component template to create the BAR file component will allow us to provide BAR property overrides if they are necessary. As stated earlier, you can either specify individual property overrides or you may choose to specify a file. A properties file will be the best option if your property overrides will change from version to version. For this example I have setup a directory on my IIB server’s file system to hold any BAR files and properties files used for BAR overrides. If you wish to use a different type of source configuration to specify the location of your BAR files and properties files you can change the Source Configuration Type field on your IIB BAR File template. Doing so will allow you to use any of IBM UrbanCode’s source configuration plugins to import your files. You may find the proper plugin to suite your source configuration needs from the Source Config Plugins section of the UrbanCode Plugin page.

Creating a new BAR file component using the IIB BAR File component template.

Once we have our BAR component created we can add it to our resource tree under the desired execution group that it will be deployed to from the ‘Resources’ tab of our UrbanCode Deploy web console. For this example I will add the ‘pager.bar’ to our newly created ‘NewEG’ execution group.

Adding the BAR file component to your resource tree.

The placement of the component in the resource tree gives the pager.bar component access to the resource properties on the execution group, broker, and queue manager, as well as the component properties defined on the pager.bar component itself. These resource properties were defined in the Provisioning Your Environment section of this post.

Now we will need to create the component process for the deployment. Note, that this should be created as a ‘Deployment’ type process and not an ‘Operational’ process since a component version must be specified when it is executed. We can create this new component process on our IIB BAR File component template so that we only have to define it once for each BAR file component. In this sample deployment I contain the steps for both overriding the BAR file properties and deploying the BAR file in the same component process.

The logic for our deployment component process will be as follows:

  1. Copy the pager.bar and pager_overrides.txt file from our component version to the agent’s working directory. This is handled by the “Download Artifacts” step of the built in IBM UrbanCode Deploy Versioned File Storage plugin. Please see the Documentation Page for this plugin if you have any issues.

    Note that the properties in the ‘Includes’ field are specific to that BAR file component, and will include the BAR file and properties file.

  2. Override the properties of our BAR file specified in our properties file. This will use the ‘Deploy’ step of the IIB Plugin for UCD. Since the BAR file and the override properties file have both been copied to the agent’s working directory, they may now be referenced directly in the plugin step fields.

    Override BAR file properties.

  3. The final step is to deploy our BAR file to the execution group.

    The ‘Deploy’ step of the IIB Plugin.

After the BAR file component is created and properly configured we must add it to the application and create a new application process to execute the deployment component process. Assuming we will have multiple BAR files to deploy to multiple execution groups it makes sense to use the “Install Multiple Components” application process step. For usage information on this step please see the Install Multiple Components page in the IBM Knowledge Center.

To utilize the Install Multiple Components step we will have to tag each BAR file component that we’ve created. Luckily, we created these components using our IIB BAR File component template. If we tag the component template, all components created using that template will be tagged as well. In this example I will create a new tag to for my BAR file components called “BARFILE”.

Added a new tag to the component template.

All BAR file components now have the ‘BARFILE’ tag.

As the last setup for running our deployment we will create a new application process on our application, which I am going to call ‘Run Deployment’. This new process will only consist of one application process step ‘Install Multiple Components’. To run all deployments on each of our BAR file components we will specify our newly created component tag and the name of the component process to run.

Application process step to run all BAR file deployments.

Before running the process, we must make sure that we have a component version imported into our BAR file component. To do this navigate to the ‘Versions’ tab of your BAR file component and click ‘Import New Versions’. This will use your configured source configuration to import artifacts into a new component version that may be used to run our new application process. As mentioned earlier in this section, please see the Source Config Plugins section of the UrbanCode Plugin page for all available source configurations.

Importing a new component version into the pager.bar BAR file component.

After we have a new component version we can use that version to run our deployment. Navigate to your application and click the arrow to request a new process. When you click the “Choose Versions” link you will be able to choose from any available versions for your BAR file components.

Run a new application process to deploy your BAR files.

Please feel free to ask questions based on any of the content covered in this post.

Categories: Companies

Correlating JavaScript Errors with Slow CDN Performance

JavaScript Errors can happen for many different reasons: special behavior of certain browsers that weren’t tested; a real coding mistake that slipped through the delivery pipeline; poorly handled timeouts, and I am sure the list goes on.

In this blog we discuss yet a different reason which was brought to my attention by Gerald Madlsperger, one of our long term Dynatrace AppMon super users: A CDN server issue resulting in non-delivered static resource files (CSS) leading to spikes in JavaScript errors.

Let’s review the steps that Gerald took to identify and analyze the issue, and learn which metrics he looked at on both the end-user and server sides. And, because not everyone is fortunate enough to have an expert like Gerald in their team, we show you how Dynatrace automates these steps through our Problem Pattern Detection and with Artificial Intelligence.

The Impact: JavaScript Exception Spikes

The problem that Gerald dealt with was visible in a daily spike of JavaScript errors for a particular web application. The spike always occurred at the same time – between 10:30 and 10:35 a.m.:

Charting the number of captured JavaScript errors captured through real user monitoring from Dynatrace AppMonCharting the number of captured JavaScript errors captured through real user monitoring from Dynatrace AppMon

The impact was seen across every browser and every geo location. So it was not something that they simply missed in testing, nor was it a problem related with connection or timeout issues in a certain geo location.

The Problem: Object not found errors

To learn more about these errors Gerald compared the type of errors occurring prior to the spike and those that occurred during the spike. He wanted to see whether there was a certain pattern, or type of JavaScript error, that occurred more often within that time frame, hoping that this would take him one step closer to the root cause.

It turned out the JavaScript errors that occurred more frequently during these five minutes were all around HTML objects that couldn’t be found on the page by some of the JavaScript code:

Comparing JavaScript Errors that happen in two different timeframes makes it easy to see which Errors are causing the spikeComparing JavaScript Errors that happen in two different time frames makes it easy to see which Errors are causing the spike.

Therefore, the problem was not necessarily bad JavaScript code, but most likely related to components that were missing or couldn’t be loaded on the page.

Root Cause: Slow CDNs caused by bad CRON job

Gerald’s next step was to drill down to some of these real end-user browser sessions. He wanted to see whether there was anything else abnormal about them. Turns out that most of these users had one thing in common: a very slow responding CDN Server:

User Action PurePaths show that content from their CDN servers was extremely slow in downloadingUser Action PurePaths show that content from their CDN servers was extremely slow in downloading.

As a final step Gerald created the following chart where he correlates the number of JavaScript Errors with the Download Time from that CDN Server. Now it was clear that very day from 10:30 – 10:35 a.m. there was a download time spike on the CDN Server that correlated with the spike in JavaScript errors:

Clear correlation between slow CDN download times to spikes in the number of JavaScript errorsClear correlation between slow CDN download times to spikes in the number of JavaScript errors. CRON Jobs to be blamed

After discussing this data with the systems engineers it turned out that two of their CDN Servers ran the same CRON job for log file rotation at the exact same time every day. This resulted in a brief outage of the CDN. That outage caused a delay or failed loading of static CSS files which resulted in the JavaScript code generating “object not found” errors.

Better rely on Dynatrace in case Gerald is not there for you!

First: Hats off to Gerald for doing a great job digging through the Dynatrace AppMon data. Also, thanks for sharing the dashboards which are useful when dealing with CDNs or 3rd parties.

While Dynatrace AppMon collects all this data to make troubleshooting of these problems easier, it requires you to know how to navigate the data. Because of scenarios shared by Gerald and others over the years, we have made significant investment in automating error, problem and root-cause detection.

In the latest versions of Dynatrace AppMon (sign up for your lifetime AppMon Personal License) we automate problem pattern detection, and highlight the “Top Findings” for both End-User and Server-Side Performance Hotspots in the Dynatrace AppMon Web Interface:

Dynatrace AppMon automatically shows you the top findings on why end user or server side performance is impactedDynatrace AppMon automatically shows you the top findings on why end user or server side performance is impacted

In the Dynatrace SaaS/Managed platform (sign up for our Dynatrace SaaS trial) we went a step further by running all this data through our Artificial Intelligence Engine. A problem like the one Gerald detected would pop up in a Problem Ticket, and include the information on the Impact and Root Cause. This allows you to analyze and fix these problems, even if you don’t have an expert like Gerald on your team. It just means that you can spend more time on innovating, rather than bug hunting.

Dynatrace’s Artificial Intelligence automatically shows you Impact and Root Cause of any type of End User, Server Side or Infrastructure Issue.Dynatrace Artificial Intelligence automatically shows you Impact and Root Cause of any type of End User, Server Side or Infrastructure Issue.

If you have stories like this one that you want to share with your peers, please let us know. Send me an email to Share your PurePath or your Best Artificially Detected Problem Pattern.

The post Correlating JavaScript Errors with Slow CDN Performance appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Knowledge Sharing

SpiraTest is the most powerful and affordable test management solution on the market today