Skip to content

dynaTrace Blog on Performance, Scalabilty and Architecture
Syndicate content
Every user, every app, everywhere. Actionable insights for optimizing your digital ecosystem.
Updated: 11 hours 32 min ago

OpenStack monitoring beyond the Elastic Stack – Part 2: Monitoring tool options

Tue, 06/27/2017 - 15:34

This article is the second part of our OpenStack monitoring series. Part 1 explores the state of OpenStack, and some of its key terms. In this post we will take a closer look at what your options are in case you want to set up a monitoring tool for OpenStack.

The OpenStack monitoring space: Monasca, Ceilometer, Zabbix, Elastic Stack – and what they lack Monasca

Monasca is the OpenStack Community’s in-house project for monitoring OpenStack. Defined as “monitoring-as-a-service”, Monasca is a multi-tenant, highly scalable, fault-tolerant open source monitoring tool. It works with an agent and it’s also easily extendable with plugins. After installing it on the node, users have to define what should be measured, what statistics should be collected, what should trigger an alarm, and how they want to be notified. Once set, Monasca shows metrics like disk usage, CPU usage, network errors, ZooKeeper average latency, and VM CPU usage.

Ceilometer

Even though it’s a bit far-fetched to say that Ceilometer is an OpenStack monitoring solution, I decided to put it in this list because many people refer to it as a monitoring tool. The reality is, Ceilometer is the telemetry project of the OpenStack Community, aiming to measure and collect infrastructure metrics such as CPU, network, and storage utilization. It is a data collection service designed for gathering usage data on objects managed by OpenStack, which are then transformed into metrics that can be retrieved by external applications via APIs. Also, Ceilometer is often used for billing based on consumption.

Zabbix

Zabbix is an enterprise open source monitoring software for networks and applications. It’s best suited to monitor the health of servers, network devices, and storage devices, but it doesn’t collect highly granular or deep metrics. Once installed and configured, Zabbix provides availability and performance metrics of hypervisors, service endpoints, and OpenStack nodes.

Elastic Stack

Perhaps the most widely used open source monitoring tool which also works well with OpenStack is the Elastic Stack (aka ELK Stack). It consists of three separate projects – Elasticsearch, Logstash, and Kibana – and is driven by the open source vendor Elastic.

The Elastic philosophy is easy: it couples good search capabilities with good visualization, which results in outstanding analytics. Their open source analytics tool – which is now rivaling with big players like Microsoft, Oracle or Splunk – supports OpenStack too.

Monitoring OpenStack with Elastic starts by installing and configuring the Elastic Stack’s log collector tool, Logstash. Logstash is the server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to Elasticsearch for indexing. Once installed and configured, Logstash starts to retrieve logs through the OpenStack API.

Through the API, you get good insights into OpenStack Nova, the component responsible for provisioning and managing the virtual machines. From Nova, you get the hypervisor metrics, which give an overview of the available capacities for both computation and storage. Nova server metrics provide information on the virtual machines’ performance. Tenant metrics can be useful in identifying the need for change with quotas in line with resource allocation trends. Logstash also monitors and logs RabbitMQ performance.

Finally, you want to visualize all the collected OpenStack performance metrics. Kibana is a browser-based interface that allows you to build graphical visualizations of the log data based on Elasticsearch queries. It allows you to slice and dice your data and create bar, line or pie charts and maps on top of large volumes of data.

What open source OpenStack monitoring tools lack

Monitoring OpenStack is not an easy task. Getting a clear overview of the complex application ecosystem built on OpenStack is even more difficult. The above-mentioned tools provide good visibility into different OpenStack components and use cases. However, they clearly have several disadvantages:

  • They are unable to see the causation of events
  • They fail at understanding data in context
  • They rely heavily on manual configuration

Because they are missing the big picture, companies often implement different monitoring tools for different silos. However, they quickly realize that with dozens of tools they are unable to identify the root cause of a performance issue. In these circumstances, how could they reduce MTTR and downtime? And with a number of separate tools, how could they ever see performance trends or predict capacity needs?

By using different monitoring tools for different use cases, companies miss out exactly on the monitoring skills today’s complex business applications require:

Okay, so how is all of this possible with OpenStack? Is there any intelligent OpenStack monitoring tool? In the next part we investigate this by focusing on the Dynatrace way of monitoring OpenStack. Stay tuned!

The post OpenStack monitoring beyond the Elastic Stack – Part 2: Monitoring tool options appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Weekly Wrap – Red Hat Summit, Velocity highlights, Trainmageddon, AI Webinar and more

Mon, 06/26/2017 - 07:25

Round 2 of the weekly summary and we’ve just as much digital performance news as last week. In this week’s summary we cover (click to jump to the post):

Dynatrace at Red Hat Summit EMEA

The Red Hat Partner summit had over 700 partners service integrators and resellers in Munich, and we were there showcasing our AI monitoring power. Martin Etmajer, featured centre above, knocked them out with the presentation –  “Close the OpenShift Monitoring Gap with Dynatrace“. Unfortunately it wasn’t filmed, but check out below for all the quality OpenShift content.

The Dynatrace AI webinar series creates record attendance!

Daniel Kaar, Technology Strategist at Dynatrace, has delivered 2 record webinars, in Asia and Europe. Next week he will re run the session for the US timezone. Not to be missed!

  • Why AI is necessary in today’s world of complex environments and deployments
  • Why current monitoring is not sufficient
  • Better insights into SAP, including support for SAP HANA DB
  • How AI operates when doing root cause and business impact analysis

Register now, and if you are reading this after the above date, don’t worry you can watch it on demand. 

Latest Blogs:

Live from Velocity San Jose 2017

Andi Grabner gives us the highlights from Velocity including catching up with performance guru Steve Saunders (pictured above). In this post Andi provides an overview of his favourite presentations from Verizon, Netflix, Microsoft, Google and more. It’s a feature rich post that all should read. Read more.

Trainmageddon: When the machines stop working, people get upset.

As UK commuters discovered this week the “simple” act of purchasing a train ticket is anything but simple. In fact, from an IT viewpoint it’s a hyper-complex transaction with many potential failure points. But if failure isn’t an option can technologies like artificial intelligence avert disaster?

Customer Corner – Nordstrom, Citrix, Red Hat and more

We’re proud to share the experiences our customers have working with Dynatrace. From COOP Denmark to Nordstrom, Citrix to Raymond James, and thousands of other leading enterprises, our customers’ success reflects the value our “monitoring redefined” mindset delivers to their daily operations. http://buff.ly/2tuk5SS

Dynatrace wins prestigious “Success for All” Sally award for its Champion Playbook

“Success for All” Sally award from @GainsightHQ validates Dynatrace approach to ensuring customer success and being transformationally and cross-functionally aligned around positive customer outcomes, with our customer success managers serving as personal customer advocates and strategic drivers for technology adoption, success and value achievement. http://buff.ly/2tCaDgw

Latest Videos

Online Perf Clinic – Power Web Dashboarding with Dynatrace AppMon

30 min demo – Monitoring Redefined – Unified Monitoring

Online Perf Clinic – Advanced Real User Monitoring: Agentless monitoring and SaaS vendor RUM

Featured Video: Dynatrace UFO

Whilst we didn’t publish this last week if you haven’t seen this video yet, then well you’ll want to take a look.

Perform 2017 – Register for a Perform event near you.

We are on the road running Perform in more than 15 cities around the world.

The post Weekly Wrap – Red Hat Summit, Velocity highlights, Trainmageddon, AI Webinar and more appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Dynatrace wins prestigious “Success for All” Sally award for its Champion Playbook

Thu, 06/22/2017 - 15:31

Every day, we help our customers deliver the best experiences and success for their customers. That’s a big part of what digital performance is all about. But that’s not all we do. We take customer success very seriously for our own clients, too. We help them reach and exceed their goals—through our own signature program—the Champion Playbook.

For years now, Gainsight has led the way for companies — in many different industries — to redefine how they measure and exceed expectations by managing customer success. They also lead the way in recognizing the new digital business paradigm. According to Forbes, they’re at the “center of the market” for customer success and “increasingly the focus of dedicated teams at businesses that seek to monitor and improve customer relationships…”.

Recently Gainsight recognized Dynatrace as the top success plan leader among their corporate customers. We received their “Success for All” Sally award at the recent Pulse 2017 customer success event, in front of an audience of more than 4,000 VP and C-level executives who direct some of the largest Customer Success organizations around the world.

Each year at their Pulse event, the Gainsight team goes through a rigorous selection process to recognize cutting-edge customer success leaders. According to their CEO, Nick Mehta, “The Sally Awards aren’t given—they’re earned. Winning one means your company is transformationally and cross-functionally aligned around positive customer outcomes.” Winners in other categories included such well-known names as Adobe, Angie’s List, Blackbaud, HubSpot and Concur.

Nick Mehta presenting the award to Dynatrace’s own Jim Bowering, Director of CSM for North America and Tracy Streetman, CSM Business Operations Analyst Champion Playbook, Customer Success Managers: a winning combination

Dynatrace was selected for its development and use of the Champion Playbook success plan, a program that supports and drives the best customer outcomes in digital performance led by our Customer Success Managers (CSMs). CSMs are our customers’ personal advocates and strategic drivers for adoption, success and value achievement.

CSMs start by building a close relationship with our customers’ in-house performance monitoring, development and business leadership. Next, they examine the customer’s current state in accelerating innovation, optimizing customer experiences and modernizing operations. Using the Champion Playbook as their guide, Dynatrace CSMs work with customers to expand internal performance culture by sharing proven strategies to highlight value and speed adoption of new ways of doing business.

The whole program is based on our unparalleled and extensive experience working with top companies to build highly successful digital businesses. We’ve taken that knowledge and created a well-defined playbook for working together with our customers, and applying best practices and innovative processes to their specific needs. Together we set and achieve digital performance goals to reach optimum adoption, greater value and constant awareness of new opportunities for improvement.

At Dynatrace we know that having the best technology is important, but following the path to success with that technology also requires the right approach to organizational culture, strategy, people and processes. This award is an enormous validation of the Champion Playbook, our practical and proven way of working with our customers. It’s just the beginning, and, I can’t wait to see what we will accomplish in the future—together.

The post Dynatrace wins prestigious “Success for All” Sally award for its Champion Playbook appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Trainmageddon: When the machines stop working, people get upset.

Thu, 06/22/2017 - 15:18

For the train companies of the United Kingdom, today was a tough one. How’s this for a headline (which we all gazed at during our morning coffee break):

If you haven’t heard, you can read all about how the poor train companies of the UK copped a battering by commuters on social and traditional media for a ticket machine malfunction. I feel for the commuters but my sympathy today lies with the train companies. Well except for the ticket collectors that obviously didn’t get the memo, and handed out fines to those that boarded without a ticket!

Coverage:

Here’s why it’s so hard to be consistently perfect

Purchasing a single ticket is actually a hyper complex transaction from an IT point of view.

The complexity links from the (and let’s list them which isn’t even all of them!):

  1. front end software the customer uses at the attempted purchase
  2. third party payment gateway that processes the payment
  3. integration between the machine (of the device), the software and the gateway
  4. security certificate required to make a secure transaction
  5. credit check application
  6. hosting environment in which all this runs
  7. interconnection between the transactions that are crossing different hosting environments, from the end user, the train station, and through to the back end applications.

Yet all the customer cares about is their experience at the machine – it needs to be perfect, or if there’s a problem, it needs to be resolved in seconds…so they can board their train on time.

Not like this:

So what happened today in the UK?

We may not find out for sure what happened but from our experience, monitoring millions, if not billions of transactions a day, there are three common areas where problems can arise. When it comes to IT complexity, rapid release cycles and digital experience, typically problems centre on:

Human error – Oops did I do that? 

Software needs updating. Software updates are mostly written by humans, and when multiple humans are working together, it’s not uncommon for mistakes to be made. Even if you have the most stringent pre-production testing, issues can still arise once you push to production because you can never accurately replicate what software will do in the wild.

As our champion devops guru Andreas Grabner always preaches in his talks – #failfast. If the issue relates to change that was made, roll it back, fast.

in this case with the outage today, I doubt it was a software update in the core operating system. I’d expect it was a third party failure, which incidentally might have had it’s own update. But more on this on point 3.

Security

Not one to speculate on, but obviously when a software failure causes mass disruption to people, it would be fairly normal to assume that maybe some sort of planned security attack. But again I doubt it.

Delivery chain failure

The most likely cause for the train machine failure is simply a failure somewhere in the digital delivery chain. Considering a single transaction today runs across 82 different technologies, from devices, networks, 3rd party software applications, hosting environments, and operating systems, it doesn’t take much for a single failure to cause a complete outage. Understanding where that is, so that you can quickly resolve is critical. Referencing what I said in point 1, it’s probable that a simple update to any of these 82 different technologies caused a break in the chain. Or maybe one of these 3rd parties had their own outage.

And that’s where AI comes in.

This is why you need AI powered application monitoring, with the ability to see the entire transaction across every single one of the different technologies. But not just across the transaction, but the ability to go deep from the end point machine, to the host infrastructure, the line of code, and the interconnections between all the services and processes.  It’s the only way you can identify the root cause of the problem – in minutes not hours, or days.

The days of eye balling charts, having war room discussions with IT teams, are definitely over. Software rules our lives, and it simply cannot fail. Otherwise digital businesses face a day like this on social media:

What if the machine fixed the machine?

With the ability to see the immediate root cause of a problem, it’s not improbable for the machine to learn how to course correct itself. In the same way when servers are overloaded, a load balancer can direct traffic to a under utilised host. So if you can detect an issue in the delivery chain then the machine can about self correcting itself with an alternative path.  If the payment gateway fails, then it could auto redirect to a new hosted payment gateway for instance. Our chief technical strategist Alois Reitbauer demo’d just this scenario (ok a more simpler version) at Perform 2017. So it’s not that far off.

The post Trainmageddon: When the machines stop working, people get upset. appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

A Tale of Two Load Balancers

Wed, 06/21/2017 - 17:46

It was the best of load balancers, it was the worst of load balancers, it was the age of happy users, it was the age of frustrated users.

I get to see a variety of interesting network problems; sometimes these are first-hand, but more frequently now these are through our partner organization. Some are old hat; TCP window constraints on high latency networks remain at the top of that list. Others represent new twists on stupid network tricks, often resulting from external manipulation of TCP parameters for managing throughput (or shaping traffic). And occasionally – as in this example – there’s a bit of both.

Many thanks to Stefan Deml, co-founder and board member at amasol AG, Dynatrace’s Platinum Partner headquartered in Munich, Germany. Stefan and his team worked diligently and expertly with their customer to uncover – and fix – the elusive root cause of an ongoing performance complaint.

Problem brief

Users in North America connect to an application hosted in Germany. The app uses the SOAP protocol to request and deliver information. Users connect through a firewall and one of two Cisco ACE 30 load balancers to the first-tier WebLogic app servers.

When users connect through LB1, performance is good. When they connect through LB2, however, performance is quite poor. While the definition of “poor performance” varied depending on the type of transaction, the customer identified a 1.5MB test transaction that helped quantify the problem quite well: fast is 10 seconds, while slow is 60 seconds – or even longer.

EUE monitoring

Dynatrace DC RUM is used to monitor this customer’s application performance and user experience, alerting the IT team to the problem and quantifying the severity of user complaints. (When users complain that response time is measured in minutes rather than seconds, it’s helpful to have a solution that validates those claims with measured transaction response times.) DC RUM automatically isolated the problem to a network-related bottleneck, while proving that the network itself – as qualified by packet loss and congestion delay – was not to blame.

Time to dig a little deeper

I’ll use Dynatrace Network Analyzer – DNA, my protocol analyzer of choice – to examine the underlying behavior and identify the root cause of the problem, taking advantage of the luxury of having traces of both good and poor performing transactions.  I’ll skip DNA’s top-down analysis (I’m assuming you don’t care to see yet another Client/Network/Server pie chart), and dive directly into annotated packet-level Bounce Diagrams to illustrate the problem.

(DNA’s Bounce Diagram is simply a graphic of a trace file; each packet is represented by an arrow color-coded according to packet size.)

First, the fast transaction instance:

Bounce Diagram illustrating a fast instance of the test transaction through LB1; total elapsed time about 10 seconds.

For the fast transaction, most of the 10-second delay is allocated to server processing; the response download of 1.5MB takes about 1.7 seconds – about 7Mbps.

Here’s the same view of the slow transaction instance:

Bounce Diagram illustrating a slow instance of the test transaction through LB2; total elapsed time about 70 seconds.

There are two distinct performance differences between the fast transaction – the baseline – and this slow transaction. First, a dramatic increase in client request time (from 175 msec. to 52 seconds!); second, a smaller but still significant increase in response download time, from 1.7 seconds to 7.7 seconds.

The MSB (most significant bottleneck)

Let’s first examine the most significant bottleneck in the slow transaction. The client SOAP request – only 3KB – takes 54 seconds to transmit to the server, in 13 packets.

The packet trace shows the client sending very small packets, with gaps of about 5 seconds between. Examining the ACKs from LB2, we see that the TCP receive window size is unusually small; 254 bytes.

Packet trace excerpt showing LB2 advertising a window size of 256 bytes.

Such an unusually small window advertisement is generally a reliable indicator that TCP Window Scaling is active; without the SYN/SYN/ACK handshake, a protocol analyzer doesn’t know whether scaling is active, and is therefore unable to apply a scale factor to accurately interpret the window size field.

The customer did provide another trace that included the handshake, showing that the LB response to the client’s SYN does in fact include the Window Scaling option – with a scale factor of 0.

The SYN packet from LB2; window scaling will be supported, but LB2 will not scale it’s receive window.

Odd? Not really; this simply means that LB2 will allow the client to scale its receive window, but doesn’t intend to scale its own. The initial (non-scaled) receive window advertised by the LB is 32768. (It’s interesting to note that given a scale factor of 7, a receive window value of 256 would equal 32768.)

Once a few packets have been exchanged on the connection, however, LB2 abruptly reduces its receive window from 32768 to 254 – even though the client has only sent only a few hundred bytes. This is clearly not a result of the TCP socket’s buffer space filling up. Instead, it’s as if LB2 suddenly shifts to a non-zero scale factor (perhaps that factor of 7 I just suggested), even though it has already established a scale factor of zero.

Pop quiz: What to do with tiny windows?

Question: what should a TCP sender do when the peer TCP receive window falls below the MSS?

Answer: The sender should wait until the receiver’s window increases to a value greater than the MSS.

In practice, this means the sender waits for the receiver to empty its buffer. Given a receiver that is slow to read data from its buffer – and therefore advertises a small window of less than the MSS – it would be silly for the sender to send tiny packets just to fill the remaining space. In fact, this undesirable behavior is called the silly window syndrome, avoided through algorithms built into TCP.

For this reason, protocol analyzers and network probes should treat the occurrence of small (<MSS) window advertisements the same as zero window events, as they have the same performance impact.

When a receiver’s window is at zero for an extended period, a sender will typically send a window probe packet attempting to “wake up” the receiver. Of course, since the window is zero, no usable payload accompanies this window probe packet. In our example, the window is not zero, but the sender behavior is similar; the LB waits five seconds, then sends a small packet with just enough data (254 bytes) to fill the buffer. The ACK is immediate (the LB’s ACK frequency is 1), but the advertised window remains abnormally small. We can conclude that the LB believes it is advertising a full 32KB buffer, although it telling the client something much different.

After about 52 seconds, the 3K request reaches LB2, after which application processing occurs normally. It’s a good thing the request size wasn’t 30K!

The NSB (next significant bottleneck)

As is quite common, there’s another tuning opportunity – the NSB. This is highlighted by DC RUM’s metric called Server Realized Bandwidth, or download rate. The fast transaction transfers 1.5MB in about 1.6 seconds (7.5Mbps), while the slow transaction takes about 8 seconds for the same payload (1.5Mbps).

Could this be receiver flow control, or a small configured receive TCP window? These would seem reasonable theories – except that we’re using the same client for the tests. A quick look at the receiver’s TCP window proves this is not the case, as it remains at 131,072 (512 with a scaling factor of 9).

DNA’s Timeplot can graph a sender’s TCP Payload in Transit; comparing this with the receiver’s advertised TCP window can quickly prove – or disprove – a TCP window constraint theory.

Time plot showing LB2’s TCP payload in transit (bytes in flight) along with the client’s receive window size.

The maximum payload in transit for the slow transaction is about 32KB; given that the client’s receive window is much larger, we know that the client is not limiting throughput.

Let’s compare this with the fast transaction as it ramps up exponentially through TCP slow start:

Time plot showing LB1’s payload in transit as it ramps up through slow start.

It becomes clear that LB1 does not limit send throughput – bytes in flight – to 32KB, instead allowing the transfer to make more efficient use of the available bandwidth. We can conclude that some characteristic of LB2 is artificially limiting throughput.

Fixing the problems

For the MSB (most significant bottleneck), Cisco has identified a workaround (even if they might have slightly misstated the actual problem):

CSCud71628—HTTP performance across ACE is very bad. Packet captures show that ACE drops the TCP Window Size it advertises to the client to a very low value early in the connection and never recovers from this. Workaround: Disable the “tcp-options window-scale allow”.

For the NSB (next significant bottleneck), the LB configuration defaults to a TCP send buffer value of 32768K. Modifying the parameter set tcp buffer-share from the default 32768 to 262143 (the maximum permitted value) allowed for LB2 throughput to match that of LB1.

Wait; do you see the contradiction here? If we disable TCP window scaling, that would limit the effective TCP buffer to 65535, limiting the download transfer rate to under 4Mbps (given the existing link’s 130ms round-trip delay).

But this was the spring of hope; it seems that changing the tcp buffer-share parameter also solved the window scaling problem, without having to disable that option. This suggests a less-than obvious interaction between these parameters – but with happy users, we’ll take that bit of luck.

Is there more?

There are always additional NSBs; this is a tenet of performance tuning. We stop when the next bottleneck becomes insignificant (or when we have other problems to attend to). For this test transaction, the SOAP payload is rather large (1.5MB); while the payload is encrypted, it could still be compressed to reduce download time; a quick test using WinZip shows the potential for at least a 50% reduction.

While some of you will be quick to note that ACE has been discontinued, Cisco support for ACE will continue through January 2019.

The post A Tale of Two Load Balancers appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

PurePath visualization: Analyze each web request from end-to-end

Wed, 06/21/2017 - 16:43

Dynatrace OneAgent enables you to track each individual request from end to end. This enables Dynatrace artificial intelligence to automatically identify the root causes of detected problems and to analyze transactions using powerful analysis features like service flow and the service-level backtrace. Dynatrace enables you to efficiently find the proverbial needle in the haystack and focus on those few requests (out of tens of thousands) that you’re interested in. The next step is to analyze each of these requests separately to understand the flow of each transaction through your service landscape. Meet PurePath. PurePath technology is at the heart of what we do.

How to locate a single request

The first step in your analysis should be to locate those requests that you want to analyze at a granular level. Filters can be used to narrow down many thousands of requests to just those few requests that are relevant to your analysis. This can be achieved during problem analysis by following the root-cause analysis drill downs (select Problems from the navigation menu to start problem analysis), or manually by segmenting your requests using the advanced filtering mechanisms in service flow and outlier analysis. Ultimately, you’ll likely discover a handful of requests that require deeper analysis. This is where PurePath analysis comes in.

In the example below, the service easyTravel Customer Frontend received 138,000 service requests during the selected 2-hour time frame. This is an unwieldy number of requests to work with, so we need to narrow down these results.

We’re interested specifically in those requests that call the Authentication Service. There are only 656 of these. To focus on this subset of requests click the Filter service flow button after selecting the desired chain of service calls that you want to look at.


Notice the hierarchical filter; it shows that we are looking only at transactions where the easyTravel Customer Frontend calls the Authentication Service which in turn calls the easyTravel-Business MongoDB

Now we can see that 75% of the easyTravel Custom Frontend requests that call the Authentication Service also call the Verification Service. These are the requests we want to focus our analysis on. So let’s add the Verification Service as a second filter parameter to further narrow down the analysis. To do this we simply select the Verification Service and then click the View Purepath button in the upper right box.

Notice the filter visualization in the upper left corner of the example below. The provided PurePath list includes only those requests to the Customer Frontend that call both the Authentication Service and the Verification Service.

But the list is still too large—we only need to analyze the slower requests. To do this, let’s modify the filter on the easyTravel Customer Frontend node so that only those requests that have Response time > 500 ms are displayed.

As you can see below, after applying the response time filter, we’ve identified 4 requests out of 138,000 that justify in-depth PurePath analysis.

To begin PurePath analysis of this request, click the View PurePath button.

PurePath analysis of a single web request

Dynatrace traces all requests in your environment from end to end. Have a look below at the waterfall visualization of a request to the Customer Frontend. Each service in the call chain is represented here.

The top section of the example PurePath above tells us that the whole transaction consumes about 20 ms of CPU and spends time in 1 database. However the waterfall chart shows much more detail. The waterfalls shows which other services are called and in which order. We can see each call to the Authentication and Verification services. We also see the subsequent calls to the MongoDB that were made by both service requests. PurePath, like the Service Flow, provides end-to-end web request visualizations—in this case that of a single request.

The bars indicate both the sequence and response time of each of the requests. The different colors help to easily identify the type of call and timing. This allows you to see exactly which calls were executed synchronously and which calls were executed in parallel. It also enables us to see that most of the time of this particular request was spent on the client side of the isUserBlacklisted Webservice call. As indicated by the colors of the bars in the chart, the time is not spent on the server side of this webservice (dark blue) but rather on the client side. If we were to investigate this call further, we would see underlying network latency.

By selecting one of the services or execution bars you can get even more detail. You can analyze the details of each request in the PurePath. In the example below, you can see the web request details of the main request. You can view the metadata, request headers, request parameters, and more. You can even see information about the proxy that this request was sent through.

PurePath

Notice that some values are obscured with asterisks. This is because these are confidential values and this user doesn’t have permission to view confidential values. These values would be visible if the active user had permission to view these values.

The same is true for all subsequent requests made by the initial request. The image below shows the authenticate web service call. Besides the metadata provided on the Summary tab, you also get more detail about timings. In this case, we see that the request lasts 15ms on the calling side but only 1.43ms on the server side. Here again, there is significant network latency.

Code execution details of individual requests

Each request executes some code, be it Java, .NET, PHP, Node.js, Apache webserver, Nginx, IIS or something else. PurePath view enables you to look at the code execution of each and every request. Simply click on a particular service and select the Code level tab.

Code level view shows you code level method executions and their timings. Dynatrace tells you the exact sequence of events with all respective timings (for example, CPU, wait, sync, lock, and execution time). As you can see above, Dynatrace tells you exactly which method in the orange.jsf request on the Customer Frontend called the respective web services and which specific web service methods that were called. The timings displayed here are the timings as experienced by the Customer Frontend, which, in the case of calls to services on remote tiers, represent the client time.

Notice that some execution trees are more detailed than others. Some contain the full stacktrace while others only show neuralgic points. Dynatrace automatically adapts the level of information it captures based on importance, timing, and estimated overhead. Because of this, slower parts of a request typically contain more information than faster parts.

You can look at each request in the Purepath and navigate between the respective code level trees. This gives you access to the full execution tree.

Error analysis

Analyzing individual requests is often a useful way of gaining a better understanding of detected errors. In the image below you can see that requests to Redis started to fail around the 10:45 mark on the timeline.

By analyzing where these requests came from we can see that all of these requests originate in the Node.js weather-express service. We also see that nearly all failed Redis calls have the same Exception: an AbortError caused by a closed connection.

We can go one step further, down to the affected Node.js PurePaths. Below you can see such a Node.js PurePath and its code level execution tree. Notice that the Redis method call leads to an error. You can see where this error occurs in the flow of the Node.js code.

We can also analyze the exception that occurs at this point in the request.

Each PurePath shows a unique set of parameters leading up to the error. With this approach to analysis, PurePath view can be very useful in helping you understand why certain exceptions occur.

Different teams, different perspectives

Each PurePath tracks a request from start to finish. This means that PurePaths always start at the first fully monitored process group. However, just because a request starts at the Customer Frontend service doesn’t mean that this is the service you’re interested in. For example, if you’re responsible for the Authentication Service, it makes more sense for you to analyze requests from the perspective of the Authentication service.

Let’s look at the same flow once again, but this time we’ll look at the requests of the Authentication Service directly. This is done by clicking the View PurePaths button in the Authentication service box.

We can additionally add a response time filter. With this adjustment, the list now only shows Authentication requests that are slower than 50ms that are called by the Customer Frontend service (at the time when the frontend request also calls the Verification service).

Now we can analyze the Authentication service without including the Frontend service in the analysis. This is useful if you’re responsible for a service that is called by the services developed by other teams.

Of course, if required, we can use service backtrace at any time to see where this request originated.

We can then choose to once again look at the same PurePath from the perspective of the Customer Frontend service.

This is the same Purepath we began our analysis with. You can still see the Authenticate call and its two database calls, but now the call is embedded in a larger request.

The power of Dynatrace PurePath

As you can see, Dynatrace PurePath enables you to analyze systems that process many thousands of requests per minute, helping you to find the “needle in the haystack” that you’re looking for. You can view requests from multiple vantage points—from the perspective of the services you’re responsible for, or from the point of view of where a request originates in your system. With PurePath, you really do get an end-to-end view into each web request.

The post PurePath visualization: Analyze each web request from end-to-end appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

OpenStack monitoring beyond the Elastic Stack – Part 1: What is OpenStack?

Wed, 06/21/2017 - 16:40

We’ve been talking a lot about OpenStack over the past few months, and with good reason. Its explosive growth in popularity within the enterprise has enabled large, interoperable application architectures and, with this, a need for app-centric monitoring of the OpenStack cloud.

There are several open source monitoring tools for OpenStack out there, but are they mature enough for the challenges posed by its complexity? Can they effectively monitor hundreds of nodes, while simultaneously keeping an eye on hundreds of apps?

This post is the first of a 3-part series where we will take review:

  • What is OpenStack?
  • The OpenStack monitoring space: Monasca, Ceilometer, Zabbix and the Elastic Stack
  • A full stack view on Monitoring OpenStack with Dynatrace
What is OpenStack?

OpenStack is an open source cloud operating system used to develop private- and public-cloud environments. It consists of multiple interdependent microservices, and provides a production-ready IaaS layer for your applications and virtual machines. Being a 2010 joint project of Rackspace and NASA, it will turn seven this year, and it’s supported by many high-profile companies including AT&T, IBM, and Red Hat.

Still getting dinged on its complexity, it currently has around 60 components, also referred to as “services”, six of which are core components, controlling the most important aspects of the cloud. There are components for the compute, networking and storage management of the cloud, for identity and access management, and also for orchestrating applications that run on it. With these, the OpenStack project aims to provide an open alternative to giant cloud providers like AWS, Google Cloud, Microsoft Azure or DigitalOcean.

A few of the most common OpenStack components

The OpenStack components are open source projects continuously developed by its Community. Let’s have a brief look at the most important ones:

Nova (Compute API) – Nova is the brain of the its cloud, meaning that it provides on-demand access to compute resources by provisioning and managing large networks of virtual machines.

Neutron (Networking service) – Neutron focuses on delivering networking-as-a-service in its cloud.

Keystone (Identity service) – Keystone is the identity service used for authentication and high-level authorization.

Horizon (Dashboard service) – Its Dashboard, providing a web-based user interface to other services.

Cinder (Block Storage service) – The component that manages and provides access to block storage.

Swift (Object storage service) – Swift provides eventually consistent and redundant storage, and retrieval of fixed digital content.​

Heat (Orchestration service) – The orchestration engine, providing a way to automate the creation of cloud components.

Why the hype around OpenStack?

The reasons behind the explosive growth in OpenStack’s popularity are quite straightforward. Because it offers open source software for companies looking to deploy their own private cloud infrastructure, it’s strong where most public cloud platforms are weak.

Vendor neutral API: Proprietary cloud service providers such as AWS, Google Compute Engine and Microsoft Azure have their own application programming interfaces (API), which means businesses can’t easily switch to another cloud provider, i.e. they are automatically locked into these platforms. In contrast, its open API removes the concern of a proprietary, single vendor lock-in for companies and creates maximum flexibility in the cloud.

More flexible SLAs: All cloud providers offer Service Level Agreements, but these used to be the same for all customers. In some cases, however, the SLA in your contract might be completely irrelevant to your business. But thanks to the many OpenStack service providers it is easy to find the most suitable one.

Data privacy: Perhaps the biggest advantage of using OpenStack is the data privacy it offers. For some companies, certain data may be prohibited by law to be stored in public cloud infrastructure. While a hybrid cloud makes it possible to keep sensitive data on premise, the potential for vendor lock-in and data inaccessibility still remains. Not with OpenStack. Here, all your data is on-premise, secured in your data center.

These are the reasons why companies like AT&T, China Mobile, CERN or Bloomberg decided to become OpenStack users.

So what’s the state of OpenStack now?

I happened to overhear a comment at the OpenStack Summit Boston 2017 that I have not been able to get out of my head. Someone in the crowd claimed that “OpenStack will eat the world”. This might not be too far-fetched, as the figures of the newest OpenStack Foundation User Survey show.

Nothing demonstrates OpenStack’s growth more that the rapid development of new clouds, with 44% more deployments reported on this year’s survey than in 2016. And, its clouds around the world have also become larger: 37% of clouds have 1,000 or more cores. So what could speak more for its maturity if not the two-thirds of deployments in production environments?

Is OpenStack really going to eat the world? And if it is, who will make sure that application performance stays high?

In the second part of this blog series we will take a look at what the current options on the market are for monitoring OpenStack. Stay tuned!

The post OpenStack monitoring beyond the Elastic Stack – Part 1: What is OpenStack? appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Customer Corner – Nordstrom, Citrix, Red Hat and more

Wed, 06/21/2017 - 02:56

One of the more rewarding parts of my job is working with our team on customer stories. I always feel like a great case study has so much external and internal value, that they should be shared as widely as possible. So, after just a few moments on our site and YouTube channel, I’ve collated a few stand outs to share in the first Dynatrace “Customer Corner” blog post.

Nordstrom: From 8 weeks to 2 days for performance testing

Gopal Brugalette has headed up some very cool digital transformation initiatives at retail giant, Nordstrom. Here, he talks about how his team uses Dynatrace to shorten release cycles and pinpoint issues instantly, to stay ahead of the competition.

CooP: Largest retailer in Denmark avoids store closures

Jeppe Lindberg of Coop looks at how Denmark’s largest retailer avoided massive store closings on the launch day of a new loyalty app, thanks to Dynatrace’s built-in AI capabilities, and our ability to see every user, every application across the entire IT stack.

How Citrix uses Dynatrace for cloud systems insight

Nestor Zapata, Lead Systems Administrator at Citrix, highlights how he and the Citrix production teams use Dynatrace for faster application issue resolution, problem prevention and how they make smarter and more efficient decisions around their cloud systems.

Red Hat and Dynatrace help close the OpenShift technology gap

Chris Morgan, Technical Director from Red Hat, on how the depth of integration, AI and machine learning capabilities set Dynatrace apart as an OpenShift partner.

Raymond James: “Dynatrace sees all transactions; AppDynamics samples”

Jeff Palmiero, APM Manager at Raymond James, explains the role Dynatrace plays in helping monitor customer experience and enabling them to take proactive actions, no matter how complex the technology ecosystem. In this video he explains why ONLY Dynatrace is capable of delivering the application analytics required.

Customers at the heart of Perform 2018

What’s great about this collection of stories is that most were captured on the fly, without prep at last year’s global Perform 2017 event in Last Vegas. But that’s what you get when you attend our Perform event series – customer stories with big brands and innovative, down-to-earth leaders that are only too happy to share their knowledge and insights with you.

So, why not jump over to our new “save the date” page and make plans to join us for a super-sized, customer-focused Perform 2018 at The Bellagio in Vegas, from 29th to 31st January. It’s going to be fun, informative, hands-on and inspiring, and we’re expecting about 2,000 people to join us. Will you?

The post Customer Corner – Nordstrom, Citrix, Red Hat and more appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Live from Velocity San Jose 2017

Tue, 06/20/2017 - 18:32

Velocity has transformed over the last couple of years – such as how organizations transformed their way of building, scaling and maintaining the software that powers their business. I remember the early days of Velocity where it was all about Web Performance Optimization, then it moved over to Web Scale, DevOps, Building Resilient Systems and now we arrived at this years theme which is: Building and maintaining complex distributed systems!

Last year I was fortunate enough to get Steve Souders on our PurePerformance Podcast. Steve started Velocity and has been a major contributor to the Web Performance and DevOps community. Listen in to hear what motivated him to go on this journey: Listen to “PurePerformance CafeVelocity 2016 with Steve Souders” on Spreaker:

.

Here are my highlights from both days as I was doing some “live blogging”

Thursday, June 22 – LIVE Update from Day 2 @ Velocity 2017

This morning I got in line for Speed Networking. As explained yesterday it was a new concept I haven’t seen before. But its REALLY COOL! Got to meet several new people in a short time frame which I probably wouldn’t have met otherwise. I encourage every organizer of events to think about this concept!

Keynotes & Sessions Day 2

Dave Andrews (Verizon | @daveisangry)

Giving us a glimpse into how Verizon is building against “Cascading failure at scale(s)”. Besides Load Testing, Monitoring and Traffic Routing it is about containing potential problems. Some interesting insights into how the contain traffic issues on local and regional level. More to learn from him on twitter

Dharma Shukla (Microsoft | @dharmashukla)

Giving us insights into ComsosDB. Want to learn more – check it out online!

Cliff Crocker (SOASTA | @cliffcrocker)

Talking about “The False Dichotomy of Finders vs Fixers”. We have all a lot of tools to find problems and highlight them. But we are not really good in actually fixing things. While this is a great business model for consulting companies that are “finders” – but it wont help you in the end!

Reminding us that a lot has changed on the tool and technology space that allows us build better tools that are not only finding but also provide better insights to fix things! But there are also great new web technologies to provide better end user performance, e.g: Preconnect: Resource hints or Server Push

Cliff Crocker reminding us about new tech developmentsCliff Crocker reminding us about new tech developments

Dianne Marsh (Netflix | @dmarsh)

Talking about careers on how we have to look back in order to move forward! Giving us insights into her career path and lessons she learned as an individual contributor but also as manager. Reminded us about “Repeating Trends”

-)Repeating Trends: Not all we thought is new is really new
Categories: Companies

OneAgent & Security Gateway release notes for version 121

Mon, 06/19/2017 - 07:07
OneAgent General improvements and fixes
  • Early Access Program for Linux PowerPC (Little-endian) hosts running on RedHat and CentOS. The EAP includes deep monitoring of Java, Node.js as well as system, network, plugin metrics and log analytics.
  • Garden injection for new garden-runc 1.2.0 release
  • Reporting of CPU usage for Windows protected processes
  • Changes to Erlang grouping. Programs with undiscovered modules will be grouped as ‘Erlang’
  • Plugins – Technology names in plugin.json are now case-insensitive
  • Enhanced remote diagnostics for Docker
Security Gateway
  • Security Gateway is now required for monitoring of large AWS accounts that include 700+ AWS service instances
  • Persisted custom config & trusted.jks for Windows
  • Support for VMware Cloud on AWS

The post OneAgent & Security Gateway release notes for version 121 appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

The Weekly Wrap – Cloud Foundry, Market Share, Perform 2017, Interop and VictorOps

Sat, 06/17/2017 - 02:34

It was a huge week in digital performance last week. In what I hope is the first of a weekly series (depending on your feedback), here’s the lowdown.

Dynatrace is the first monitoring solution to provide full stack insight into Cloud Foundry

We are thrilled to announce that Dynatrace is the first monitoring solution to provide full stack insights into Cloud Foundry clusters — automatically and with no configuration. This includes monitoring of both Cloud Foundry cluster health for platform and resource optimization, and automatic monitoring of your deployed applications.

For more information view: 

Dynatrace Ranks No. 1 in latest Gartner Market Share Analysis Report: Performance Analysis Software, Worldwide 2016

Gartner Market Share APM 2016

For the fifth consecutive year,(i) Dynatrace has been ranked by Gartner Inc., a leading IT research and advisory firm as the number one global Application Performance Monitoring (APM) solution provider.

#Perform2017 kicks off in Europe with more than 850 attendees across 4 cities

This week saw the Perform roadshow hit London, Rome and Milan, with customer presentations from Travis Perkins, COOP, Virgin Money and AWS.

Here are some select tweets from each event to share:
  • London

#perform2017 with #TravisPerkins #COOP #VirginMoney #AWS Standing room only! Superb stories featuring AI, Full Stack, Automation. pic.twitter.com/AbejqkpMUo

— Dynatrace (@Dynatrace) June 15, 2017

  • Madrid

Hoy toca hipodromo, en el #Perform2017 de @dynatraceEspana y la sala está a rebosar. pic.twitter.com/rvEkTYZr7I

— Eugenio Sanz (@Eugeniobdi) June 1, 2017

  • Rome

Iconic location for #dynatrace #perform2017 in Rome, great work and thanks to all customers for joining us pic.twitter.com/yv0G7JfLf0

— Pieter Van Heck (@PieterVHeck) June 13, 2017

  • Milan

#Perform2017 in Milan. Thank you @dynatraceItalia pic.twitter.com/IPb8QWVEYl

— Moviri (@moviri) June 6, 2017

Japan embraces Dynatrace with more than 1000 demos delivered at Interop

It’s phenomenal to see the images and stories coming from Japan, where this week our great partner LAC, took the Dynatrace full stack story to Interop. A country that prides itself on its technology innovation, and a company that leads the market in AI, full stack, automation, meant a record number of demos for our booth staff. Rumour has it over 1000 demos were delivered!

Verizon, AWS and Dynatrace accelerate time to market

In a joint case study between AWS and Dynatrace, Verizon shares how they implemented comprehensive methodology for cloud migration.

VictorOps: Microservices Monitoring and Critical Incident Management

Hear how our partner VictorOps and Dynatrace work together to bring greater intelligence to microservices monitoring and critical incident management. http://bit.ly/2roIOuD

Is that it?

What a massive week. I didn’t even get a chance to mention DevOps London, AWS Public Sector Summit and Cloud Foundry Summit. But all good things have to come to an end.

Thank you to all our customers, booth staff, event organisers, and Dynatrace partners for a massive week. What do you think? Do you like the summary?

The post The Weekly Wrap – Cloud Foundry, Market Share, Perform 2017, Interop and VictorOps appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Optimize your dashboard with new filters and custom charts

Thu, 06/15/2017 - 20:53

With the latest release of Dynatrace, we’ve introduced a new way to configure custom charts that makes dashboard creation easier and more intuitive. We’ve also introduced a valuable new type of dashboard tile called a “blinking light” tile.

Create custom charts

Custom charts enable you to analyze any combination of monitoring metrics directly on your dashboard.

To create a custom chart
  1. Select Create custom chart from the navigation menu.
    Alternatively, you can select the Build custom chart tile in the Tile catalog.
    create custom chart
  2. On the Build a custom chart page, select metrics related to the services, applications, processes, or hosts you want to monitor.
    For this example, we’ll select the metric Applications – Actions per session.
  3. Click the Build chart button.
    custom charts
  4. Give your chart an intuitive name.
  5. Adjust the aggregation and display options for the metric you’ve selected. To do this, click the metric name, as shown below.
    custom charts
  6. Once you’ve configured the metric, you can optionally add additional metrics by clicking the Add metric button.
  7. Once you’re satisfied with your new chart, click the Pin to dashboard button to add the chart to your dashboard.
New filters and metrics

Filters make it easy to configure unique combinations of metric data for display on your custom dashboard charts. In the latest release of Dynatrace we enhanced the configuration of metrics and filters for custom charting. The following new metrics and filters are now available can now be added to your custom charts:

  • Additional process filters
  • VMware metrics and filters
  • ESXi metrics and filters

The new metrics can be accessed via the metric drop list on the Build a custom chart page (see Step 3 above) or by clicking the Add metric button on the Custom chart page.

custom charts

Note: You can still create workflow-related charts that focus on relevant subsets of the host-, service-, and database metrics in your environment. You can even combine custom metrics to create new charts that directly support your teams’ unique requirements. For full details, see Can I use filtering to create more sophisticated dashboard charts?

Blinking lights tiles

We’ve introduced a new type of dashboard tile (see examples below). Blinking light tiles enable you to see at a glance how many entities are affected by open problems. Blinking light dashboard tiles focus on a single entity type (e.g., hosts, applications, services, etc). Each green hexagon on a blinking light chart represents a healthy entity (i.e., an entity that is not associated with an open problem). Red hexagons represent entities that are affected by an open problem.

To add a blinking light tile to your dashboard
  1. From your home dashboard, click the Edit button to enter dashboard edit mode. Click the Add (+) button above the dashboard section within which you want the new tile to appear.
     
  2. Select the blinking light tile that’s dedicated to the entity type you want to monitor. Blinking light tiles are currently available for the following entity types: Hosts, Applications, Services, Data centers, Databases, and Web checks. Two different tile sizes are available. In the example below, the small Hosts tile is selected in the Infrastructure section of the tile catalog.
    custom charts
  3. Once pinned to your dashboard, you can click any blinking light tile to visit the corresponding entity list page and begin your analysis of any detected problems.
    You can toggle the size of blinking light tiles by clicking the Toggle size switch available within each tile’s context menu. To retain a blinking light tile on your dashboard and disable the visualization, set the Chart switch to the Off position.
    custom charts

The post Optimize your dashboard with new filters and custom charts appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Dynatrace is first monitoring solution to provide full-stack insight into Cloud Foundry

Wed, 06/14/2017 - 22:55

Dynatrace support for Cloud Foundry applications has been available for some time now, helping application teams better understand and optimize their distributed microservices environments. As we work tirelessly to provide you with full insights into your technology stack, I’m happy to announce that Dynatrace is the first monitoring solution to provide full-stack insights into Cloud Foundry clusters — automatically and with no configuration. This includes monitoring of both Cloud Foundry cluster health for platform and resource optimization, and automatic monitoring of your deployed applications.

Cloud Foundry cluster health monitoring

By deploying Dynatrace OneAgent to your Cloud Foundry VMs, you gain monitoring insights into all Cloud Foundry components, including Diego Cells, Cloud Controller, Gorouter, and more. With these capabilities, Dynatrace enables you to optimize your cluster component sizing, detect failing or under-provisioned components, and leverage AI-powered analytics throughout your entire stack.

Deploying OneAgent to your cluster components gives you health metrics for each VM, including CPU usage, Disk IO, Network IO. It even provides insight into the quality of the network communication of the processes between your Cloud Foundry components.

Automatic monitoring of Cloud Foundry applications, down to the code and query level

Dynatrace full-stack monitoring for Cloud Foundry environments includes built-in auto-injection for Garden-runC containers. This means that Dynatrace OneAgent auto-detects each application that’s deployed to Cloud Foundry and automatically initiates deep application monitoring.

Not only does Dynatrace OneAgent provide metrics for the applications running in Garden containers, it also provides code-level visibility into your distributed application instances.

Deep monitoring provides your microservices teams with the insights required to optimize the performance of services while ensuring complete availability and functionality.

Automatic distributed service tracing

In microservices environments — especially those deployed to Cloud Foundry — automatic distributed service-tracing is a powerful means of continuously and seamlessly tracking the health of the entire microservices architecture.

Service tracing enables tracking of how requests to microservices and Cloud Foundry apps are propagated through a system. Service tracing also helps to identify performance bottlenecks and failed requests in the service-to-service communication chain. It’s never been easier to pinpoint the root cause of poor performance in heterogeneous microservices stacks. Since Dynatrace OneAgent automatically monitors all Cloud Foundry applications on Diego cells, these automated tracing capabilities are automatically applied to your Cloud Foundry applications.

Integrate with your existing BOSH deployments

Dynatrace full-stack monitoring for Cloud Foundry integrates seamlessly with BOSH deployments. Dynatrace provides a BOSH release that you can use as an add-on to deploy OneAgent to your cluster VMs, including Diego Cells and others. The BOSH release also covers deployment of OneAgent to Windows Diego Cells, thereby enabling automatic monitoring of .NET Framework based applications.

For full details on the Dynatrace BOSH add-on, please see How do I deploy OneAgent for full-stack Cloud Foundry monitoring?

We’ve worked with Pivotal to make the Dynatrace Full-Stack Add-on for Pivotal Cloud Foundry available on Pivotal Cloud Foundry. So, if you’re using Pivotal Cloud Foundry, go ahead and download the Dynatrace Full-Stack add-on from the Pivotal Network.

The post Dynatrace is first monitoring solution to provide full-stack insight into Cloud Foundry appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

A Thousand Points of Light: Critical Performance Insights from Wire Data

Fri, 06/09/2017 - 21:11

Modernizing and optimizing. Transforming. If you’re in IT, you hear these terms frequently. They likely mean different things to different organizations, but there are a couple of recurring themes. Increasing agility to respond in real time to shifts in business demands. Managing the costs associated with increased complexity, through automation and intelligence as well as rationalization and consolidation. Cloud – private, hybrid, public – is not the result; rather, cloud is a means of reaching for these goals.

Inherent in this modernization shift is a transition – sometimes subtle, sometimes seismic – from static traditional architectures (got any 3-tier apps left?) and proprietary platforms to new paradigms of virtualization, microservices, and software-defined everything.

What does this shift mean for monitoring visibility? Specifically, for critical performance insights sourced from wire data? As network and application architectures change to support modernization goals, traditional approaches to monitoring must also adapt. Gone are the days where a SPAN on a core switch could provide comprehensive visibility into users accessing your entire application portfolio. Today, these core aggregation points have exploded into dozens or hundreds of smaller points of light. As a result, some vendors claim agents are the answer; some even suggest that NPM may be dead.

Long live wire data

This physical to virtual technology shift has many ramifications. From a network visibility perspective, it has disrupted the status quo, creating access challenges that are today being addressed by packet broker vendors such as Ixia. In fact, the approach remains consistent, mirroring the same physical to virtual shift. Modern and optimized visibility architectures incorporate virtual taps to complement (or supplant) physical taps, aggregating and pruning packets as appropriate to deliver clean traffic streams to monitoring tool destinations.

So don’t let your architecture dictate your level of performance monitoring. If it has been important to include wire data in your data center APM strategy, and you’re migrating these apps to the cloud (or your data center is becoming more cloud-like), won’t the same level of visibility still be important after the shift?

Listen to Dynatrace’s Jason Suss and Ixia’s Keith Bromley as they chat with the folks at LMTV. You’ll hear them discuss visibility architectures from the data center to the cloud, the importance of wire data APM, even compare apps to autos. You may also be interested in this free eBook, co-authored by Dynatrace and Ixia: Operational Visibility in the Software-Defined Data Center.

The post A Thousand Points of Light: Critical Performance Insights from Wire Data appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Comey Hearings: What digital experience management means to news media

Thu, 06/08/2017 - 23:30

Politics aside, today’s testimony by James Comey provides a fascinating look at how events can impact Digital Experience Management for news media organizations. I’m using the term Digital Experience Management (DEM) because the industry (Including Gartner and Forrester Research) has identified that Digital Experience needs to be considered and managed in a unique way. DEM draws the relationship between performance, availability, and end-user/consumer behavior when they interact with digital properties like web sites, mobile applications, etc.

I’m looking at various news providers using Dynatrace technology to illustrate how complex web applications require a new methodology and approach for understanding DEM impact.  Performance metrics are key here and I’ll explain more on this in a moment.

To give you an example of what we are seeing, below is a performance comparison of 20 different news media organizations as observed from locations across the US.

As you can see there is a huge difference from a performance perspective between these different news media organizations. The performance of these sites can be impacted by a wide variety of variables. Some of these include object size (page weight), cached resources, third-party contributors, client side code (javascript), and even server-side responsiveness.

Below is an example of how Dynatrace analyzes an individual page load for a real user and identifies key performance items.

On which web performance metrics should digital experience management focus?

Below is an example of an Analysis Dashboard of the front page for a major news outlet.

Let’s go over what these “tiles” would tell a digital business owner at a news organization.

The Response Time & Success Rate tile (top left) provides a performance trend view which shows aberrations and events which could be impacting end users. It’s also useful to know when you have recovered from an event and the degree to which performance has been change by an event.

The Geographic Response Time tile (bottom left) shows response time by specific regions.  This is important especially if you are using a CDN (Content Delivery Network) as high regional response times can be associated with an oversubscribed PoP (CDN Point of Presence) or misrouted traffic. CDN services are expensive, and this is a way to help manage your technology investments.

The Contribution by Domain tile (second in from the top) highlights the impact that third parties like social media, ad networks, analytics/tracking tools are having on end-user performance.  This view helps you manage technology investment and risk associated with a third-party touching your customer.

The Key Delivery Indicators tile (second in from the bottom) shows observed byte count (how much data was delivered). This often gets overlooked by retailers but will show issues related content that is not optimized (what happens when the creative team release a 10MB juggling monkey image to the landing page), or malicious activity (what happens when a hacker re-routes your site to their page). Metrics like Object, Connection and Host count also provide an indication as to the complexity of the site and if something unexpected is occurring.

Let’s switch to the right side of the screen.

The DNS performance tile (top second from the right) shows DNS resolution time. DNS can be thought of as phone book, routing site names to server addresses. Again this often goes overlooked. However the DDoS attack on DNS provider Dyn on October 21, 2016 shows that DNS is critically important. Knowing when/if your DNS is being impacted allows you to make changes and recover faster. It also allows you to understand if you are investing in the right partner for providing DNS.

The Network Latency tile (top right) is a measure of how healthy your network connections are. This data can be used to understand if you have peering issues with your network providers, or if your network infrastructure (load balancer) is under pressure.

The Server Response time tile (bottom, second from the right) is a measure of how fast the server can respond to a request. This allows you to understand from an end point of view if the server applications are causing a performance bottleneck, we will come back this later.

The last tile on the bottom right, shows Client Impression time.  This allows you to understand how long does it take for the browser/mobile browser to display something for the end user.  Understanding what is happening in the user’s browser is the final link in the chain.

Digital Experience Management and top-line revenue

News Media organizations primarily generate revenue through displaying advertisements to readers/viewers and subscriptions. When it comes to driving revenue from ad impressions, keeping the user on the site is key. This is what the industry calls “stickiness”. Performance is a key contributor to end-user behavior. One of the ways Dynatrace tracks this is by executing a Bounce Rate analysis. In the graph below you can see that as performance worsens (the time along the bottom) the higher the bounce rate becomes. These are readers/viewers navigating away (bouncing) from the site because the page takes too long to load.

If performance is poor, users will not remain on the site and the number of ad impressions will drop. This directly impacts the top-line revenue generation for a news media website.

Also, code on the page can cause issues which prevent an ad from loading or being seen.  Below is an example of how Dynatrace discovers a Javascript Error on a page. You can see a screenshot showing a blank region of a page where an ad should be located.

When we look at the Javascript error we can see it is an issue with code coming from an ad provider which is failing and causing the ad to not display.

These JavaScript errors also impact top-line revenue for a news media outlet because there is no ad displaying for a reader/viewer when the error occurs.

Digital Experience Management needs insight into the back end

We mentioned that performance can impact top-line revenue when readers/viewers bounce off a news site. One of the contributing factors to poor performance comes from the “back end”. The “back end” in this case refers to servers which respond to page requests, are hosted by the news site or cloud-based servers and services.

Below is a comparison of ten news media companies which provide the fastest server-side “back end” response times, and ten news media companies providing the slowest response times. The fastest sites provide response times faster than 200 milliseconds from their servers, and the slower outlets can take over a half a second.

While these response times might sound fast, the slower server-side response times can be expensive for the news outlet (and not just for the reader/viewers bouncing off slow pages). When you add up the processing required to service millions of visits, the sites providing the slower response times are paying more to service the same number of viewers as the sites providing the faster response times. This is all about computational capacity. The slower a transaction is on the server side the more compute resources it consumes. Compute resources, whether or not you host then yourself or use them from the cloud, cost money.

The applications which run these news websites and mobile apps are exceedingly complex. The complexity is so great that effective DEM data needs to be augmented with Artificial Intelligence based analyses to understand all of the dependencies which exist. Below is an example of a Dynatrace Smartscape automatically discovering all of the compute resources that would exist for any of the news media organizations we looked at today.

What’s going on behind the curtain?

While everyone is watching the political theater today, what interests me is happening behind the scenes. Events like this drive traffic to news outlets, however, depending on how that news site is being delivered, there can be a substantial impact on digital experience, which can lead to frustrated readers/viewers bouncing off the site. Poor digital experience impacts the ability to generate revenue from ad impressions for news sites. The news is a highly-competitive market, and the technology driving it is increasingly complex. To remain competitive news sites need to look for new ways to managing their digital experience.

OK, back to watching some political theater.

The post Comey Hearings: What digital experience management means to news media appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Dynatrace Managed feature update, version 120

Wed, 06/07/2017 - 20:31
Help us improve Dynatrace Managed

To better understand how you and your organization’s end users make use of Dynatrace Managed, we now provide you the option of sending Dynatrace usage data from your end-users’ browsers directly back to Dynatrace. We analyze this information to ensure that we focus our efforts on the aspects of Dynatrace that are most relevant to you and to identify areas where you may be having trouble understanding or using Dynatrace. Of course, privacy is a top concern. For complete details on the data we capture and how they are protected, see the Dynatrace privacy policy.

Easily switch license keys

For situations where your current Dynatrace license has expired and you’ve received a new license that you are to use going forward, it’s now possible to change license keys directly in the Dynatrace Managed UI. This is especially useful if you’ve been using a Dynatrace free trial license and have received a full license that you are to use going forward.

To update your Dynatrace Managed license key

  1. Select Licensing from the navigation menu.
  2. Paste your new license key into the License key field (see below) and click the check mark button to save the change.

Opt-out from managing firewall settings

In some situations, (for example, when a system is under certified change control) the automatic management of IP tables (iptables) that the Dynatrace Managed installer performs during upgrades may be problematic from a compliance perspective. This is why you can now opt-out of automatic iptable management by running the command-line option --firewall off

If you do opt out of automated iptable management, ensure that all ports required by Dynatrace are open and available. For full details, see all required port settings.

The post Dynatrace Managed feature update, version 120 appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Support for PHP and Staticfile apps on Cloud Foundry PaaS

Wed, 06/07/2017 - 20:03

We’re happy to announce that, in addition to support for Java and Node.js applications, Dynatrace now also provides monitoring support for PHP and Staticfile applications that are deployed in Cloud Foundry PaaS environments.

Cloud Foundry is a Platform-as-a-Service that consists of a set of open source tools that help you run applications at scale. Applications deployed on Cloud Foundry are usually run through technology-specific buildpacks that provide framework and runtime support for applications running on the Cloud Foundry platform. For instance, the Staticfile buildpack provides runtime support for applications that require no backend code other than an Nginx web server.

Dynatrace OneAgent for Cloud Foundry PaaS is integrated with release v4.3.34 of Cloud Foundry’s PHP buildpack and also with release v1.4.6 of Cloud Foundry’s Staticfile buildpack.

Start monitoring Cloud Foundry PaaS applications

To set up Cloud Foundry monitoring you first need to link your Dynatrace account with your Cloud Foundry applications. To do this, you need to create a Dynatrace service in your Cloud Foundry environment. For complete details, please see the Cloud Foundry installation guidelines.

Once your Cloud Foundry applications are monitored with Dynatrace OneAgent, you’ll receive the full range of application and service monitoring visibility that Dynatrace provides (for example, Smartscape and service-level insights with Service flow). Properties that are specific to Cloud Foundry are also provided on the process-group instance level. Note in the example below that values are provided for Cloud Foundry space IDCloud Foundry application, and—because multiple application instances are running—Cloud Foundry instance index.

Your feedback

We’d love to hear from you. Tell us what you think about the new Dynatrace integrations into the PHP and Staticfile buildpacks. Please share your feedback at Dynatrace Answers.

The post Support for PHP and Staticfile apps on Cloud Foundry PaaS appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Why “Try another browser” is no longer acceptable

Wed, 06/07/2017 - 16:58

Users can be demanding, and you won’t find a more demanding group than social savvy millennials. These punks have grown up believing they pretty much know everything, are never wrong, and demand immediate, accurate responses when interacting on social.

If websites are slow or apps crash, it’s not just lost revenue – but also damage to your brand.

Last year we ran a survey and found 51% of millennials would turn to social media to complain if a website or app performs badly. Reading between the lines, application and digital experiences are no longer just an issue for IT.

These millennials believe, and rightly so, they are not a transaction.

When digital experiences fail, it’s comical for the social observer, but its no laughing matter for the brand, or the social media team. Put simply, reply with accuracy; reply with speed; reply with intelligence…or else.

It’s no longer OK to reply with “try another browser”.

But we keep doing this:

I ran a simple search query and found that the phrase “try another browser” is mentioned quite frequently. And it’s not just small businesses, but brand and industry leaders, too. And, at this point, I should mention that I post this image with some trepidation because several of these companies are good customers of Dynatrace.

To me this suggests two things:

  1. There is an opportunity for some of these brands to take a competitive advantage
  2. Some of these social media and marketing teams need to head over to IT and have a chat about some of the crazy user-experience insights they can gain. Sorry Marketing, Google Analytics isn’t the answer.

Here is a snippet from a presentation I did last year that talks about why social media teams, and marketing teams, need access to digital experience data.

One of the unique capabilities of Dynatrace is the ability to see every user, every click, tap, swipe, on every single device. This gives you a distinct advantage by being able to see a user who is struggling to check out, failing to login in, or having trouble on their digital journey. If user experience is the single biggest differentiator, and social media is one of the most critical marketing communication channels, then aligning Dynatrace data to your social arsenal is really a no-brainer.

The post Why “Try another browser” is no longer acceptable appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Request attributes: Simplify request searches & filtering

Fri, 06/02/2017 - 17:42

Dynatrace tracks all requests from end-to-end and automatically monitors the services that underly each transaction. The performance and attributes of each request can be analyzed in detail. You can even create custom, multi-faceted filters that enable you to analyze call sequences from multiple angles. With such advanced request filtering, Dynatrace enables you to slice and dice your way through your requests to find the proverbial “needle in a haystack.” Until now such filtering was only possible on certain predefined attributes. With the latest Dynatrace release, you can now configure custom request attributes that you can use to improve filtering and analysis of problematic web requests.

What are request attributes?

Request attributes are essentially key/value pairs that are associated with a particular service request. For example, if you have a travel website that tracks the destinations of each of your customers’ bookings, you can set up a destination attribute for each service request. The specific value of the destination attribute of each request is populated for you automatically on all calls that include a destination attribute (see the easyTravel destination attribute example below).

Request attributes

If an attribute exists on multiple requests within a single PurePath then the attribute is applied to each request. In such instances, a single attribute may have varying values for each transaction request in the PurePath. You can even have multiple attributes on the service calls within a single PurePath. This makes request attributes a powerful and versatile feature when combined with Dynatrace advanced filtering.

In the image below you can see that the easyTravel User attribute exists on the triggering request (/services/AuthenticationService/authenticate) as well as on the authenticate request of  the AuthenticationService that is being called. In the same below the value is the same, however your application might be different and the values might not agree.

Request attributesRequest attributes

Create a request attribute To configure a request attribute
  1. Go to Settings > Server-side monitoring > Request attributes.
  2. Click the Create new request attribute button.
  3. Provide a unique Request attribute name. You can rename an attribute at any point in the future.
  4. Request attributes can have one or more rules. Rules define how attribute values are fetched.
Request attribute rules

Have a look at the example request attribute rule below. Note that the request attribute destination can obtain its value from two different sources, either an HTTP Post parameter (iceform:destination) or an HTTP GET parameter (destination)Rules are executed in order. If a request meets the criteria for both rules, its value will be taken from the first rule.

Each rule needs a source. In the example below, the request attribute source is a web request HTTP GET parameter (destination).

This GET parameter will be captured on each and all monitored processes that support code-level insight and it will be reported on all requests that are monitored by Dynatrace.

While this is convenient, it’s not always what’s needed. This is why you can restrict rules to a subset of process groups and services. To do this, select process group and service names from the four drop-lists above to reduce the number of process groups and services that the rule applies to.

You may not be interested in capturing every value. In other cases, a value may contain a prefix that you want to check against. To do this, specify that the designated parameter should only be used if its value matches a certain value. You can also opt to not to use an entire value, but instead extract a portion of a value. The example below is set up to only consider iceform:destination HTTP POST parameters that begin with the string Journey :. This approach will extract everything that follows the string Journey: and store it in the request attribute.

Requests can have as many attributes as you want.

Request attributes on service pages

Once you’ve defined your attributes, go to any service page where you expect to see your defined request attributes. Have a look at the Top requests section (see example below). The requests now feature attribute labels indicating that at least some of respective requests contain the new request attribute. Click any request attribute to filter the entire page view down to only those requests that carry the selected attribute.

This includes both the chart at the top of the page and the request table further down the page. Any further analysis you do is likewise focused on these same requests.

Service flow only shows those requests that contain the easyTravel destination request attribute.

A new Request attributes tab has been added next to the Top requests tab. This tab lists the request attributes that correspond to the request page. This table reflects the current filter settings and shows the same metrics as the request table.

There are four request attributes included in the example below. The Median response time is the median response time of all requests that contain the request attribute. Total time consumption represents the sum of response times of all requests in the selected timeframe that have the selected request attribute.

You can also view the corresponding throughput metrics. In the example below, there were 2,400 requests that dealt with easyTravel JourneyId and the current throughput is 16/min.

Request attributes do of course have values. You can see the values by expanding any attribute row. The table below shows the throughput numbers for all requests that contain the easyTravel destination attribute, broken out into the Top 18 values.

Here again, click any request attribute key/value pair to narrow down the results on the page to just those requests that include the selected attribute value. For example, the chart below only shows those requests that have the attribute key/value pair easyTravel destination = Zurich.

Request attributes in service analysis

Request attributes can be leveraged across all service analysis views. The service flow below shows the transaction flow of 52 Requests. 73% of the requests make about 10 calls to JourneyService. The service flow is filtered with the request attribute key/value pair destination = Berlin. This means that all 52 requests on the easyTravel Customer frontend service have a request attribute destination with the value Berlin!

We can add additional filters on JourneyService for other attributes that exist on these requests. The following service flow only shows requests that have the attribute destination = Berlin on the easyTravel Customer Frontend request and also make POST requests to the Journey Service.

request attribute

This filtering approach works across all levels of all service analysis views.

Protect confidential attribute values

Because request attributes can include confidential values, Dynatrace makes it possible to hide sensitive data from certain user groups and restrict who can define the data items that are captured within request attributes. To define or edit a request attribute, users must have the Configure capture of sensitive data permission.

If you select the Request attribute contains confidential data check box (see below), only users who have the View sensitive request data permission will be able to see the values of the attribute and use the attribute as a filter. The attribute values are hidden from all other users.

The request attribute table still indicates to unauthorized users that the attribute exists and provides overall request numbers, but the values are hidden (see example below).

Looking at the PurePath, you can see that the actual JourneyId is hidden because this user doesn’t have permission to view confidential data.

What’s next?

At the moment, we only allow the capture of web request headers and parameters. Soon we’ll extend the functionality to make it even more versatile. We also plan to further expand the use of request attributes across Dynatrace. So, please stay tuned.

The post Request attributes: Simplify request searches & filtering appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies

Hot from the lab: Latest releases center on unification, scalability and powerful analytics that tie back to business metrics

Fri, 06/02/2017 - 17:27

Innovation has always been at the core of the Dynatrace culture. We invest heavily in our product development so that our 500+ global R&D experts continue to break new ground in APM.

This is an exciting post for me, as I get to share highlights of the most recent advancements.

I even shot a quick video that hits some of the big changes.

Visually Complete

Visually complete puts you in the user’s seat and captures the precise visual experience of all your real users. Combined with Speed index, which shows how fast your page loads, and synthetic transaction data, you can now see exactly how digital performance is impacting revenue, bounce rates and conversions. Available July.

Enhanced visuals – make everyone a performance expert

The Dynatrace experience has never been more equipped to unite teams around the metrics that matter. Fresh dashboards and our AI-powered analytics gives everyone in your business precise answers to complex problems – stay high level or dive into the detail – everyone can be a performance expert.

Unifying enterprise monitoring

Heterogeneous IT landscapes continue to surge in complexity and scale. On the flip side, our customers are simplifying; taking advantage of our enterprise-wide, full stack solution that does away with monitoring in silos. From microservices to APIs, mobile to mainframe, Dynatrace is the only one that can support the depth and scale of our customers’ digital business.

So now let’s look at some techs and specs.

Dynatrace 

  • Business impact reports with every problem discovered, so you can see precisely how your customers were affected and why. More here at our blog.
  • Map and position your custom network device within our Smartscape topology using AI, to capture important custom metrics in the broader topology context. Read more here.
  • Auto-discover all hosts, applications, and services—along with their relationships— and synchronize with your ServiceNow ITIL CMDB database. More to read here.

To stay up with the latest, head here.

AppMon

  • Extended time and deployment-based PurePath problem pattern detection that fully automates analysis of millions of PurePaths across multiple deployments so you get instant feedback on common issues, reducing the chance of quality degradation.
  • Deep insight into every visit and user action, including W3C metrics and JavaScript error diagnostics, that delivers insight into every browser and app from the customer perspective.
  • Full PurePath, method hotspots, exceptions and database diagnostics in the Web UI to open up the power of Dynatrace to everyone in your company and foster collaboration.

Heaps more to read about here.

Advanced Synthetics

 Filter error analysis across time, location, error type to quickly pinpoint availability issues.

  • Emulate any mobile connection across our global performance network to optimize the digital experience for mobile devices.
  • New interactive waterfall analysis enables automatic filtering by third party service category, analyzing W3C browser timing events and more to reveal the greatest impact on user experience.

Lots more updates to read up on here

DC RUM

  • Auto-discovery of new – or recently inactive – services and servers informs you of important changes in your environment and the impact on user experience.
  • New explorer views for DNS, Network and Citrix deliver increased interactive analysis for speedier insights.
  • One-minute data collection intervals expedite alert triggering and view micro-trends in enhanced granularity.

 Read about the rest of the advancements here.

The post Hot from the lab: Latest releases center on unification, scalability and powerful analytics that tie back to business metrics appeared first on Dynatrace blog – monitoring redefined.

Categories: Companies