Async Call Path Profiling via Metering
A while back I wrote a blog entry, Call Path Metering & Runtime Governance, showing how to use OpenCore’s Probes Open API to determine the time to get from one (entry) point in the processing of a request to another (exit) point. This is somewhat different than response time measurement which is focused on the entry and exit at the same execution point. In this entry I am going to expand on the approach to enable asynchronous call path metering across threads and possibly processes.
Lets start with a graphic depicting the scenario the customer had outlined. A client browser sends a request to a starting point process which then sends a message asynchronously to mid point process which in turn sends another message to an end point process. Whilst the customer could profile (meter) each individual process and then aggregate all profiles (meterings) across the cluster in our management console they wanted to be able to connect just to a single process and see the latency up to that point in the process alongside the process’s own timing (metering).
To demonstrate a suitable approach to the customer in metering the complete workflow path I create a mock application using Akka, an Actor based framework, in which I mapped each processing point in the workflow to an Actor instance. I configured each Actor to be serviced by a separate thread dispatcher providing similar queued and asynchronous execution characteristics. Here is the StartPoint class.
In the MidPoint Actor class I added a small time delay before the next process is called.
Finally the EndPoint Actor class with a delay as well though it has no bearing on the call path time up to this point in the processing.
Here is a Test class I used to send 10 requests to the StartPoint Actor resulting in 30 message receipts.
Below is a metering model visualized in our console following completion of the request processing. Parent nodes in the tree view represent callers and child nodes represent callees. You will note that we cannot backtrack from any of the processing points. This is because each call is executed by a separate thread (and asynchronously) which is exactly how the metering model would look like in a distributed environment when clustered in the console. In addition there is no way to determine the maximum time for a complete execution of a request up to each processing point. With a console connected to any one of the processes all that would be seen is the metering profile for that particular step.
Now what we really want to do is attach metering information on outbound requests and extract such information on inbound requests. To do this we introduce a Path.State field into the Request class that tallies the number of processing hops and running time using AspectJ.
Note: In addition to working across threads the approach detailed below can very easily be applied to HTTP headers, JMS properties, CORBA service contexts.
Note: You might think an initial timestamp moved from one message to another would be much easier than metering but then you must ensure clock synchronization. What if you wanted to track more than just the value of a global counter? What about cpu.time, gc.time, waiting.time, blocking.time, in.bytes, out.bytes,…? This is not possible unless metering is used.
Before a Request is dispatched to another processing point via a ActorRef.sendXXXX call we update the request’s Path.State fields with the difference in metering since entering the current processing point.
The path.time and path.count counters are updated when a Request is received at a processing point. The Path.State fields are extracted and added to the counters for the current thread metering context. When the onReceive method completes and reads again its meters it will see a delta not just in its clock.time meter but also in path.time and path.count.
Here is the jxinsight.aspectj.filters.config used. I included the LocalActorRef to simulate the request entry point into a process point that was distinct from an exit point.
I enabled in the stack probes provider and mapped the path.* counters to meters with the following jxinsight.override.config file.
Running our Test class again we get the following metering model showing the total, avg, max and min path.time reading up to each processing step as well as the number of hops. Note how the path time metering increased substantially in the EndPoint due to the delay we added into the MidPoint class ** prior ** to the EndPoint message dispatch.
Following on from this we will be releasing an Akka probes specific extension which goes beyond what has been described here filling in some metering gaps and more importantly working with any type of message payload passed between actors. We will also be releasing updates to the OpenCore Open API to reduces some of the code above as well as offering explicit support in the model for such behavioral analysis.
Cloud Monitoring – What it’s not!
Cloud washing is in full swing this year with many traditional application performance management (APM) vendors claiming to have a solution that is specifically designed to meet the challenges in the cloud but yet is largely based on the same old approach used in previous incarnations that targeted data centers running in particular high latency applications deployed to standard application server runtimes.
So what exactly constitutes a cloud monitoring solution? That’s a pretty hard question to answer. We still do not have a clear definition for the cloud and any attempt would be out-of-date very quickly as the cloud is likely to undergo significant and rapid change over the next few years as we move beyond provisioning of bare metal in (vm) units at coarse grain periods (hours) to an ecosystem (grid) that supports mobility of code and data that is context (service, location, capacity,…) and cost (quality, financial,…) aware and which gives greater prominence to service interactions and service supply chains.
It is much easier to say what a monitoring solution for the cloud is not. Here are some examples.
1. Off-site (central) storing of monitoring data for single-point of failure viewing in a browser
This type of cloud washed APM vendor confuses the pushing of traditional response time monitoring data on a per request basis to a remote web site with cloud computing. The site might not be using any cloud infrastructure itself. More importantly none of the data collected pertains in anyway to important execution aspects of the cloud itself such as cost, collocation, and capacity (reserved or variable). Incredibly whilst you might have managed to scale your own applications you cannot be certain that the same applies to the monitoring solution. When it goes down you are completely blind because that important data that is secretly pushed out from your application runtimes is lost in the cloud – another cloud (if at all).
Our Vision: Monitoring data must travel first class with its the request/response payload. It must be accessible to multiple interception (metering) points up and down the technology stack and across execution boundaries. It should not be pushed to (or consumed by) only one (owning) destination point leading to a complete disconnect back through its chain of events & activities. It should be accessible in real-time within the context of the execution and flow supporting immediate problem determination, cost based optimization in call routing and fine grain capacity leasing. It should not measure execution behavior in one dimension.
2. Gimmicky usage of monitoring data for “shoot yourself in the foot” elasticity
This type of cloud washed APM vendor offers the ability to blindly execute a script that fires up one or more machines and servers on a publicly accessible IaaS platform such as Amazon’s EC2 based on some performance threshold being exceeded. This fire-and-forget-and-hope-for-the-best-without-a-care-in-the-world-for-cost approach is based on naive view that “more is better” whether it has any influence on the particular performance issue detected, positive or negative, is irrelevant. Who cares this [elasticity] is cool. Ops can just sit back and look at those large RED circles on the dashboard, through the hole in the donut their eating, and watch them turn GREEN as the monitoring software now becomes the master controller. There is no time for thinking, reasoning or policing. Aim and fire – AMI and EC2!
Our Vision: A monitoring solution should not be the controller especially when such solutions are focused on a single concern – performance. Monitoring data must feed into a model that unifies multiple management concerns including quality, cost, availability and value (profit) providing insight to both operations and business personnel some of which will eventually be used to introduce very specific automation but only after knowledge acquisition has taken place and derived models validated and verified. A monitoring solution should focus on ensuring data captured is sufficient, accessible, immediate, accurate and cost effective.
3. Meaningless publication of monitoring data to fake value and context
This type of cloud washed vendor has been around for countless iterations of computing. They report news for the sake of reporting. They lure a customer to a site offering flashy monitoring dashboards that present arbitrary performance & availability metrics pulled from one or more cloud platform. Whether such metrics have any relevance whatsoever to the performance and availability of other applications in such clouds is never communicated or related. The reporting seems to imply that the majority of applications in the cloud are much better engineered than the cloud itself. That might indeed be the case if you are pulling such data from a hello world web application deployed to such clouds that does nothing of significance and is rarely changed. Sadly in the real world the availability of the typical business application lies squarely with its own engineering. Such applications are routinely (daily) restarted in private data centers today around the world. It be unfortunate for a cloud platform vendor to have an outage when one of these applications was actually running reliably. So what’s the point? Marketing!
Our Vision: Metric monitoring is not suitable in effectively managing applications in the cloud. Monitoring needs to focus on the cause of costs (performance, expenditure, loss, capacity) in the cloud – users, activities, and resources. This requires a monitoring solution to be integrated at some point in the service supply (and resource consumption) chain and to be able to collect sufficient measurements. Monitoring data must reflect real life events, actions and workloads – not artificial requests. Monitoring must be performed at the point of interaction (and consumption) and then to be directly attributed to its causes (cost objects) for the purpose of cost management, charge backs, and resource metering & control.
Benchmarking OpenCore Probes – LoopTest in Ruby
In this entry I am going to go over our recent benchmarking of OpenCore’s overhead but instead of profiling a Java test class I am going to repeat some tests with a similar class in Ruby.
Lets first start with the baseline cost of the above execution.
ruby /looptest.rb
133.837908
133.837011
From the output we can see the wall clock time cost is approximately 134 nanoseconds (seconds / 1 billion iterations) per invocation of the call method.
Lets try this with the built-in ruby profiler. Well unfortunately we did and after waiting a hour and still no end in sight we decided to abandon the initial benchmark run and instead restart it with an adjusted loop count of 10,000,000 which means we must multiply the output by 100 to get back to nanoseconds.
…
ruby -rprofile /looptest.rb
205.743177 => 20,574.3177
205.789916 => 20,578.9916
Subtracting the baseline line cost from the above profiler times we get 20,442 nanoseconds (20+ microseconds) per invocation of the call method. Here is the data printed by the profiler on completion.
Lets now try our initial benchmark with the JRuby VM.
jruby /looptest.rb
86.854
84.906
Much faster – 86 nanoseconds. This is our baseline for further JRuby results presented below.
Now lets try out OpenCore integrated with JRuby as documented here.
jruby --profile /looptest.rb
84.808
85.148
Wow. No noticeable difference. In fact its slightly faster than the baseline. How come? Well OpenCore’s metering runtime is strategy based. After the first 1000 invocations the metering runtime dynamically disabled firing of all probes associated with the LoopTest.call() method based on its metering profile. Here is the probes output following completion of the benchmark.
Lets try another option added to our modified ruby script. This is the same setup as described in this recent entry.
jruby --sample /looptest.rb
148.755
149.024
Taking away the baseline cost of 86 nanoseconds the overhead in sampling the probe stack every 1 ms without any metering strategy enabled is 63 nanoseconds. Here is the metrics output.
Now before screaming foul over our usage of strategy based metering which is there to actually help you focus on what is important and to ensure an accurate metering profile is measured and collected lets re-run our the JRuby --profile test again this time with the hotspot strategy disabled ensuring all invocations are indeed metered.
jruby --profile /looptest.rb
286.529
284.593
Excluding the baseline we have an overhead of 200 nanoseconds per metered invocation which tallies somewhat with results in this recent entry. Here is the probes output.
Now what is really interesting is that in the worse case at a cost of 200 nanoseconds OpenCore’s overhead is equivalent to the cost of 2.3 empty method calls in Ruby which means OpenCore out-performs all other Ruby application performance management solutions (i.e. NewRelic), by a HUGE margin especially in light of how inefficient such agents are implemented in Ruby. They only saving grace for such solutions is that your Ruby-on-Rails apps are more than likely going to be database bound in terms of performance.
Metering in the Cloud: Visualizations Part 1
I have posted an entry on JINSPIRED’s OpenCore’s site that is the first in a series of articles graphically depicting our vision for activity based resource metering of software and services in the cloud (public or not).
Profiling JRuby/Ruby with OpenCore
Here are the Mac OS X specific steps to getting started with resource metering & metric monitoring of your Ruby applications running on the JRuby 1.5.1 VM with OpenCore.
Installation
1. Download the latest OpenCore 6.0 tar file distribution from here
2. Extract the file, rename the top directory to OpenCore and move it to /Applications/OpenCore/
3. Copy /Applications/OpenCore/lib/ext/probes/jruby/config/ to $JRUBY_HOME/config/ directory.
4. Download a modified jruby script and drop it into the $JRUBY_HOME/bin/ directory.
Profiling (Metering)
To profile a ruby script execute the following:
> ./bin/jruby --profile ${script}
Following completion of the script a probes-${timestamp-units}.xml dump file will be exported to the $JRUBY_HOME/config/profile/ruby/ directory.
To profile a ruby script in terms of the actual underlying org.jruby method calls execute the following:
> ./bin/jruby --profile-all ${script}
Following completion of the script a probes-${timestamp-units}.xml dump file will be exported to the $JRUBY_HOME/config/profile/jruby/ directory.
Sampling (Metrics)
OpenCore’s metering runtime treats all JVM languages equally so you can use the same low latency techniques I have blogged about recently such as stack sampling.
To sample a ruby script execute the following:
> ./bin/jruby --sample ${script}
Following completion of the script a metrics-${timestamp-units}.xml dump file will be exported to the $JRUBY_HOME/config/sample/ruby/ directory.
To sample a ruby script in terms of the actual underlying org.jruby method calls execute the following:
> ./bin/jruby --sample-all ${script}
Following completion of the script a metrics-${timestamp-units}.xml dump file will be exported to the $JRUBY_HOME/config/sample/jruby/ directory.
Open Data
Further information related to the XML format of both probes and metrics dumps can be found here.
Configuration
Inside each of the export directories under the $JRUBY_HOME/config/ directory you will find the jxinsight.override.config file used by each command option above. This file can be used to change settings (system properties) such as the threshold used by the hotspot metering strategy in determining whether to disable a probe dynamically.
Further information related to configuration of strategies and providers can be found here.
Low Latency Monitoring w/ Probes & Metrics
Over the last two blog entries (Benchmarking OpenCore Probes – LoopTest, OpenCore Metering: How Low Can You Go?) I have shown various ways the OpenCore metering runtime can be configured to minimize overhead in performing activity monitoring in very low latency (< 1 ms) Java request/transaction execution environments. We have achieved sub nanosecond (ns) overhead costs for dynamically determined non-hotspots and reduced the metering costs from 182 ns to 81 ns for metered hotspots whilst maintain activity counts.
In this article I am going to show two alternatively ways OpenCore can be configured that have even much lower overhead by integrating metering with metric monitoring. In addition with one alternative I will show how to collect much more meaningful measurement data than with any of our previous overhead reduction approaches used up to now creating the ultimate sampled based profiling solution.
Lets first revisit the code that will be instrumented dynamically at runtime with metering probes.

In the latest OpenCore beta release we have introduced a new probes plugin, stack.sampling, that samples the probe stack for all metered thread contexts and in turn publishes 3 metrics for each probe name that appears on a sampled probe stack. It is incredibly powerful compared to the heavy weight & largely inaccurate thread call stack sampling approach offered by other solutions because of a number of unique features including:
1. Stacks only contain methods that have been instrumented and metered.
2. No costly object allocation whatsoever in accessing a stack.
3. No pausing of thread execution in accessing one or more stacks.
4. High degree of accuracy due to presence of unique execution ids in stack frames.
Here are the system properties which should be added to a jxinsight.override.config file to install this plugin and enable dependent probes providers for low latency activity monitoring.
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.stack.enabled=true
jxinsight.server.probes.stack.forwarding.enabled=false
jxinsight.server.probes.plugin.enabled=true
jxinsight.server.probes.plugins=\
com.jinspired.jxinsight.probes.ext.plugin.stack.sampling.PluginFactory
The default stack sampling interval is 1 ms which is 10 times less than the default for all other solutions.
Note: We are currently researching various approaches to supporting sub millisecond sampling intervals.
Upon completion of the above benchmark the following metrics dump file is outputted and the overhead is calculated at 54 ns per method invocation. An impressive reduction of 28 ns!!! Even more impressive is that with the metrics data we have a way to determine relatively accurate timings for performance outliners ( > 1 ms).
For each named probe (i.e. method) 3 metrics are created by the stack.sampling plugin: sample.count, sample.top, and sample.cycle. The sample.count metric reports the number of times a probe has appeared on each of the threads sampled. The sample.top metric reports the number of times a probe has appeared exclusively on the top of stack sampled. And finally but most importantly the sample.cycle metric reports the number of times a particular probe execution frame has appeared across multiple samplings of a particular stack which is impossible today with normal thread call stack sampling. If you are still unsure of the importance of these metrics take a look at comparison of the metrics across probes. Now where was our hotspot again?
Where there is a Will there is a Way
Those who have worked with me in the past know that I rarely ever give up once I have targeted something which is probably why I have always had a passion for performance optimization and have yet to walk away from a critical and challenging customer engagement without victory.
You want to go even lower? Well here is another approach which uses the strategy probes call back to increment an associated metric counter for each firing probe which is in turn sampled by the metrics runtime.
To register our custom metering strategy which always vote no simply add the following to a jxinsight.override.config file. Nothing else.
jxinsight.server.probes.strategies=MetricsStrategyFactory
Here is what is outputted and at an extremely low overhead of 47 ns per instrumented method invocation. Metrics are sampled (default 10 ms) so we have missed a number of increments of the call() method in this dump.
In a follow-up I am going to test drive our stack.sampling probes plugin with a much more realistic simulated workload (lots or threads and deep stacks) and compare its performance with other offerings on the market.
Software
OpenCore 6.0.BETA.7b
Environment
Model Name: iMac
Model Identifier: iMac11,1
Processor Name: Intel Core i7
Processor Speed: 2.8 GHz
Number Of Processors: 1
Total Number Of Cores: 4
L2 Cache (per core): 256 KB
L3 Cache: 8 MB
Memory: 8 GB
Processor Interconnect Speed: 4.8 GT/s
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02-279-10M3065)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01-279, mixed mode)
OpenCore Metering: How Low Can You Go?
In a previous entry I showed the unit cost of an invocation of a method instrumented with calls into OpenCore’s metering can drop down to below 1 nanosecond (in the best case) for an unmetered non-hotspot probe firing and that the cost was approximately 182 nanoseconds for a metered firing with the clock.time meter measured and collected (for the particular environment tested). In this article I am going to use the same LoopTest class to explore ways we can further reduce the overhead of a metered probe firing by using alternative meters and probes providers that allow us to collect only the frequency of an metered activity (a named probe).
Note: In all examples the strategy probes provider has been disabled which is not advisable in a production unless one has previously used the output of the strategy probes provider to generate a refined instrumentation set.
Lets start with determining the lowest possible unit cost by disabling all meters using the following jxinsight.override.config file.
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.meter.clock.time.include=false
Running a timed version of the LoopTest class the overhead per method invocation is 43 nanoseconds with the following outputted on shutdown.
But we need the counts so lets enable a relatively cheap meter, probe.count, which measures the number of probes metered within the metering window of a probe for the same thread context.
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.meter.clock.time.include=false
jxinsight.server.probes.meter.probe.count.include=false
The following probes dump file is outputted and the overhead is calculated at 102 nanoseconds per method invocation.
But we might be able to do better by instead installing a single meter mapped to a counter that is never created or updated during the running of the benchmark.
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.meter.clock.time.include=false
jxinsight.server.probes.meter.counters=counter
The following probes dump file is outputted and the overhead is calculated at 98 nanoseconds per method invocation. We saved 4 nanoseconds.
Instead of installing an alternative meter to the default clock.time meter we can enable the queue probes provider which also tallies the number of metered probe firings.
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.meter.clock.time.include=false
jxinsight.server.probes.queue.enabled=true
jxinsight.server.probes.queue.forwarding.enabled=false
The overhead has now dropped down to 96 nanoseconds with the following outputted.
In my final test I have excluded the recording of queuing behavior for probe aggregates (see LoopTest group).
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.meter.clock.time.include=false
jxinsight.server.probes.queue.enabled=true
jxinsight.server.probes.queue.forwarding.enabled=false
jxinsight.server.probes.queue.aggregates.include=false
The overhead drops much more substantial with an overhead now of 81 nanoseconds.
In this article I have shown various ways to configure OpenCore metering runtime to minimize (by half) both measurement and collection costs for metered probe firings. One might ask what value does the data collected have. Well we have had a number of requests from customers to be able to simply count activities some of which are mapped to method invocations others more contextual in terms of the business transaction or service request. Reasons for such requests include having the ability to attribute some degree of risk to code blocks based on the frequency of their execution in production and to better understand what exactly goes on in production within the software irrespective of its performance characteristics.
In a follow-up to this article I am going to show other and cheaper ways of measuring similar activity using OpenCore’s Metrics runtime.
Software
OpenCore 6.0.BETA.6
Environment
Model Name: iMac
Model Identifier: iMac11,1
Processor Name: Intel Core i7
Processor Speed: 2.8 GHz
Number Of Processors: 1
Total Number Of Cores: 4
L2 Cache (per core): 256 KB
L3 Cache: 8 MB
Memory: 8 GB
Processor Interconnect Speed: 4.8 GT/s
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02-279-10M3065)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01-279, mixed mode)
Benchmarking OpenCore Probes – LoopTest
Last week I gave a Tech Talk at Google on low latency application performance monitoring using OpenCore’s activity based resource metering technology. During the presentation I used code similar to that which is listed below to discuss the importance of having a clean & dynamic separation between instrumentation and its actual measurement (metering) – the what (to collect) and when.
The code is used to assess the unit cost of instrumentation that is weaved into code at load-time by our enhanced for production AspectJ javaagent library – whether it is measured or not at runtime. Without any instrumentation the above code would execute immediately with the hotspot vm effectively removing the call to call() from within the loop() method and so on. But with instrumentation applied this changes everything allowing us to obtain a little bit of insight into the unit costs of a code level execution based performance monitoring solution.
Note: In benchmarking other solutions you might need to enable the monitoring of empty methods which some disable by default.
Running the above code instrumented dynamically with calls into OpenCore’s Probes Open API via our AspectJ probes extension library results in this probes xml dump file being generated on shutdown. Reviewing the file you will notice the count is not 10 billion but 1000 for the metering group LoopTest.call(). The reason for this is that by default the metering runtime is strategy based and configured to disable firing probes (instrumentation) which have a clock.time metering average of less than 1 ms on every 1000 metering count steps.
Note: The format of OpenCore’s metering and metrics monitoring dump files can be found here.
To assess the unit cost we could have inserted timing around the loop() calls but instead I am going to use the metering information for the LoopTest.loop() method. Multiplying the min value by 1000 (converting from microseconds to nanoseconds) and dividing by the number of iterations within the loop() calculates the best case (cpu cache,…) cost at 0.876 nanoseconds.
Pretty impressive considering that we do not offload work to other background threads or processes which a number of other solutions do to hide their high overhead.
Lets see what happens when we disable the strategy probes provider via the system property jxinsight.server.probes.strategy.enabled=false. First of all the count listed in the probes dump file is as expected – 10 billion.
Using the same calculation above we can determine the unit cost of an instrumented and metered method invocation which collects the clock.time – 182 nanoseconds. Most of this unit cost can be attributed to the cost in assessing the meter’s underlying measure.
You might be asking yourself whether such focus on extreme low latency overhead is justified in the marketplace today outside of trading platforms. Well yes very much so because it allows one to take a significant amount of guesswork out of any instrumentation & monitoring initiative a team may be undertaking because of the greater affordance of wider code coverage (due to the mitigation of risk). More importantly the data that is generated is much more accurate and relevant listing only actual metered hotspot activity.
Software
OpenCore 6.0.BETA.6
Environment
Model Name: iMac
Model Identifier: iMac11,1
Processor Name: Intel Core i7
Processor Speed: 2.8 GHz
Number Of Processors: 1
Total Number Of Cores: 4
L2 Cache (per core): 256 KB
L3 Cache: 8 MB
Memory: 8 GB
Processor Interconnect Speed: 4.8 GT/s
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02-279-10M3065)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01-279, mixed mode)
OpenCore 6.0 Beta Released
With our OpenCore 6.0 (resource metering & metric monitoring) public beta out the door this week you will now have the opportunity to try out for yourself many of the concepts and code examples I have created on this blog over the last year. In the future when posting articles with code snippets I will also make available the src and config files. Hopefully this will make it easier for everyone to follow this blog.
Note: OpenCore is the runtime and framework on which we are building our cloud metering as a service – Costicity.
Please visit the OpenCore site, download the beta, review the HelloWorld example, read up on the configuration guides, explore the Open API and try out the API samples.
Call Path Metering & Runtime Governance
Following on from an application performance management and resource metering workshop in India last week I was asked by the customer’s architects to show how one could use metering to record the roundtrip time of a message that passes through message queues (channels) processed by one or more threads (contexts) across one or more processes (runtimes).
The customer was already using our resource metering runtime to record the response time (clock.time) of HTTP requests but what they really wanted in addition was to determine at runtime and within the execution itself the time interval between the request entering and the resulting message being posted to queue. This time should not include the processing time elapsed as the request call stack was unwound and the request returned to the client.
Here is a timeline graphic depicting a hugely simplified call sequence along with the measurement intervals and the call stack as the request is processed and returned.
I am going to use the following mock Java application to demonstrate one or two of the many ways to determine the path time using our resource metering runtime Open API.
Enabling the optional billing probes provider and collecting the resulting metering information in our management console I can pretty quickly determine the path time via the “Value (E)” column – 2 .002 seconds.
Though billing history can be accessed from within a metered runtime via the Open API, billing entries are only published following completion of a fired and metered probe. I needed to see the incomplete metering history at the point of interaction with the Channel. A solution would be to enable the optional stack probes provider and access the last meter reading recorded by the probes at each end of the metered probe stack. The following shows how this can be done.
Note: It is not necessary to add Open API calls directly into the method. It can be injected via our AOP SDK or using our optional event probes provider and registering a listener. I used the Open API here to simplify & clarify things (I hope) by showing what would typically be injected or executed at runtime.
This same approach can be used to provide runtime resource governance of executing code, similar to how Salesforce.com manages Apex based applications using Governors, as meters can be mapped to any thread specific counters tracking resource consumption and cost in real-time and at the point of each interaction direct or indirectly.
Below I continuously compare the current meter measurement with the last meter reading for the stack root probe checking to see whether a defined request time threshold has elapsed.
In a follow-up I will show how we can tie up all the path times across threads and runtimes.
Business Transaction Management to Business Transaction Metering
Recently there has been a trend for Application Performance Management (APM) vendors to overnight rebrand their offering as a “Business Transaction Management” (BTM) solution.
One reason for such a move is that there is generally less instrumentation performed hence any excessive overhead of the original APM solution can be mitigated.
Whilst it is somewhat gratifying to see many of the innovations we pioneered in the early years of JXInsight, such as distributed contextual trace and transaction analysis for the Java EE and CORBA platforms, being promoted by competitors I think their time has come and passed.
The future of performance management whether it is from a technical or business perspective is best served by a solution that natively supports many of the concepts being promoted including the measurement of business value, the inclusion of business context in activity analysis and the demarcation of transactions & activities and their resource consumption.
Note: I am ignoring the distributed nature of business transaction processing here because that has been in many APM products for a very long time.
When the term “business value” is mentioned in related marketing literature I straight away see this as just another meter within our activity based costing & metering (ABC/M) system. And understandably so because ABC/M can server as the foundation for effective performance management of any type of service from an IT operations and business perspective.
To demonstrate this I have created a mock application in Java which I am going to instrument at runtime (class load-time) with our OpenCore Probes technology. The following code setups and executes the business transaction which I am going to use throughout this article.
Here is the implementation of the Shop.order(Order) method which represents our transaction point.
Running the mock application with our default application performance management centric instrumentation results in the following resource metering model display in JXInsight‘s management console. This model includes the metering of activities performed during preparation of the business transaction as well those performed in executing business transaction.
To capture the “business value” of transaction I am going to create and update a thread specific context counter, order.value, whenever Store.order(OrderItem) is called.
To have the above counter management code weaved into the runtime I have created the following META-INF/aop.xml file which I have packaged up into an extension jar along with the aspect class.
To map our counter to a meter within our resource metering runtime I have added the following system property to a jxinsight.override.config file.
Running the mock application again with our new extension jar and configuration change produces the metering model shown below. We can now measure the execution of the software from both a business and technical perspective.
Note: Each product ordered was priced at 100 monetary value units.
In our metering runtime activities can be mapped to code execution constructs or can take on the value of some aspect of the code execution context. Lets add some business context to Store.order(OrderItem) calls by firing a probe that includes the name of the shop along with the operation.
We can apply similar probe instrumentation to Shop.order(Order) calls.
Here is a revised META-INF/aop.xml that includes all three aspects.
With these changes the metering model now tracks tracks resource usage from both a technical perspective as well as a business one. Metering of order processing in terms of performance and business value is now represented at the individual shop and store instance levels.
The next thing to do is to demarcate our business transaction so we can see the metering changes within the execution period of Shop.order(Order) call.
The following tables show an individual execution instance of the Shop.order(Order) business transaction along with the metering changes that occurred during its execution within the context of the executing thread.
We can further simply things for business management by not deploying the code level metering extension instead using only the contextual extension created above.
Finally we can configure the management console to render our transactions and activities in a much more business friendly manner by simply dropping in images into the icons directory with the probe name (full or partial) prefix.
No Latency Application Performance Analysis: When wall clock time is simply too slow
During a workshop I gave this week to a team of developers working on a financial trading platform with very low latency requirements I was asked to provide a good example of when to use (thread specific) counter based meters instead of the more natural time based ones in analyzing the performance of systems. Here is example I quickly developed to show that wall clock time does not cut it when one is monitoring and optimizing at the nanosecond level.
The State class in my example is an abstraction of the runtime object state that is read and written during the execution of 5 algorithms. Likewise the Run classes are an abstraction of each algorithm which would be more than likely be implemented across multiple classes and methods.
Looking at each doWork() method implementation it is pretty easy to compare the efficiency of each method but in real life code this would be an impossible challenge especially with different execution paths.
Unfortunately to make a similar assessment with a benchmark using wall clock time requires measuring a very large number of calls to each algorithm due to the extremely short execution interval of a single call. In the example below I needed to execute each algorithm at least 50 million times.
Here is the output running the above a number of times. It seems to confirm what we expected in terms of cost.
Now if the actual doWork() methods were instrumented by an application performance management product we would not see the same performance ratios because even requesting the clock time twice will cost between 100 to 150 nanoseconds and thats without the product doing any actual processing and data collection. Comparing algorithm One with algorithm Two we would have 114 (100 + 14) versus 128 (100 + 28).
Why not instead focus on the key performance indicators? In our extreme low latency example there are two – field.read and field.write. Here is the configuration I used to map such counters to meters.
Note: OpenCore and JXInsight ship with a large number of counter instrumentation extension libraries that need only be added to the classpath of a managed & metered runtime.
Running the application again this time with our activity based resource metering runtime results in the following snapshot display in our management console. Importantly we only needed to execute each algorithm once.
To combine these into one meter to further simplify our analysis I enabled our unit.cost meter and added a cost to each counter based meter.
Here is a probes metering snapshot with both configurations merged. Simple and clear. Whats cool is that this will work no matter how dispersed the actual field read access and write access are across the code base analyzed. From micro-benchmarking to nano-benchmarking!!!
Architectural Enforcement with OpenCore Probes
A while back I promised Mattias Severson over at Jayway that I would show how to use our OpenCore activity based resource metering runtime to support the dynamic enforcement of certain architectural rules for example ensuring that execution within a particular layer is always within the execution scope of one or more stacked layers.
Lets start with a simple probes plugin factory which registers an event listener with the resource metering runtime.
Below is an implementation of a probes event listener I quickly created today to enforce the layered execution rule that Mattias had detailed in his blog entry which is that code within the com.acme.dao package must be called from within the execution scope of code within the com.acme.service package which itself must execute within the scope of code residing in the com.acme.gui package.
Here are the system properties to be added to a jxinsight.override.config file to enable event listening and to register the plugin listed above.
The finally step simply involves running up the application with our OpenCore agents and libraries as detailed here.
Note: We are looking at packaging up similar functionality as a built-in probes provider which would be driven solely by configuration and implemented much more efficiently within the metering runtime.
Note: With our JRuby integration it is possible to write such rules in Java which enforce similar constraints on Ruby codebases.
Costicity.com – A Blueprint for Cloud Service Metering & Monitoring
I have posted an entry on our company blog discussing our Costicity.com project, an activity based costing & metering service for the grid, cloud and enterprise.
Costicity is light years ahead of any recent initiatives in the application performance management (APM) space including associated areas such as business transaction management (BTM) and business activity management (BAM). Its an attempt to unify multiple management domains and to eliminate the need for separate models and tooling for areas that have overlapping management concerns such as those just listed.
Its ambitious and bold but would I have it any other way? No.
Custom Resource Metering Strategies
It has been nearly 2 1/2 years since we introduced the first production grade Java monitoring solution based on an adaptive resource metering engine driven by one or more chained strategies.
Note: Out of the box our default strategy, hotspot, acts very much Java JIT compilers using dynamic runtime metering information to determine whether to continue metering a fired probe which is typically a method though not it is not restricted to this.
Since our initial release which shipped with a number of built-in strategies we have continued to develop new strategies that restrict measurement and collection to particular events or runtime states. Today the list of strategies includes:
burst, busy, busythread, checkpoint, delay, dynamic, exclude, frequency, highmemory, hotspot, include, initial, interval, random, sample, warmup
As stated above the resource metering runtime supports the chaining of multiple strategies allowing our customers to create elaborate new composite strategies. But sometimes that is not enough to help filter down measurement to some peculiar runtime behavior within an application. In such cases you can simply create your very own implementation of ProbesStrategyFactory and ProbesStrategy interfaces.
Here is example of a custom strategy I have created that limits metering to only one at a time on a per probe name basis. The associated method will still execute but only a subset of the firings will be metered (measured). This strategy uses the queue probes provider we recently released and blogged about.
Note: We might use such a strategy in a relatively stable production environment electing to discard multiple executions of a method if we are currently metering one particular occurrence not yet completed.
Note: The strategy whilst implemented in Java can be used in metering JRuby/Ruby and Jython/Python applications with the Probes.Name argument representing an execution construct within these languages.
Here are the system properties which should be added to a jxinsight.override.config file to install the custom metering strategy within the runtime and enable the required queue probes provider.
Below is a sample of the output produced running the application listed in “Metered Software Service Queues” with this configuration. Because the strategy probes provider is layered on top of the queue probes provider when our custom metering strategy returns false for a particular probe firing no metering takes place and hence it does not appear in its associated queue. The maximum size of a queue is effectively limited to 1.
If you are interested in finding out more then please check out our strategy guides on our OpenCore website.
The Java Application Performance Management Vendor Showdown
It seems like every day sees a new software company enter the Java application performance management space claiming to have solved all the problems of legacy application performance management solutions with what is in essence the very same approach, a combination of call (path) tracing and call stack sampling, both of which don’t scale in production as highlighted here, here, here and here. Unfortunately by the time a customer truly understands this it is far too late to reverse a decision without losing face or what data of little value these solutions collect.
Such is the power of marketing and a good sales person.
Whilst it might appear a formidable task to attempt a feature wise comparison of all products even when they are essentially the same in terms of instrumentation, measurement, and collection there is one claim that most of these vendors make which can be validated and verified. And that claim is “low overhead”. So what does “low overhead” actually mean. Well for the most part this is the % of overhead that is added to average response time and/or removed from transaction throughput. Generally this is quoted as between 1% and 5%. Unfortunately it would be unlikely that anyone could hold a vendor to such fictitious claims because of a large number of “depend clauses” highlighted in any technical discussions with the vendors engineers.
There are a number of assumptions made by a vendor but the following are most common:
1. We assume your application request processing time is sufficiently high to dwarf our significant overhead.
We assume you have a slow performing database backend.
2. We assume that instrumentation is applied to a very limited section of the code source to lessen the impact of our significant overhead.
We assume that you already know your performance hotspots.
3. We assume that there is large amount of under utilized processing capacity to offload our significant overhead.
We assume that you will not notice tricks used to hide our overhead.
4. We assume that it is impossible to realistically and reliably measure our significant overhead.
We assume that you know little about performance engineering.
And my favorite is the restriction within a vendors software license on the publication of benchmark results.
We assume that you blindly accept our claims – unquestionably.
There are some valid reasons why a vendor might object to the use of standard benchmark as it might not be at all representative of a customers workload and software execution behavior. Which is why we have carefully designed a number of Java micro-benchmarks that target exclusively the actual overhead of the products deriving a unit (overhead) cost for various aspects of the runtime in terms of instrumentation, measurement, and data collection. These tests measure the unit cost in terms of the software execution model (fixed execution costs) and the system execution model (variable execution costs). With this information we can make a bold claim like the following.

Now I did say “claim” which is something I have not been entirely comfortable with as a complete official vendor unit cost comparison table would have much more meaning and weight. I had hoped following our announcement that we would have one or two vendors refute this and request an official benchmark shootout but unfortunately it appears most vendors already know the answer or are unwilling to enter into a competition in case they lose face (not just their customers).
At the end of last year I decided it was time to take the fight to them and challenge them to prove their “low overhead” claims using our unit cost approach. I contacted a number of companies repeatedly requesting either access to their software or for them to publish their unit costs per our micro-benchmarks. This list includes the following:
CA Wily (Introscope) aka Introscrap 1.0
Oracle (Oracle Enterprise Manager – Oracle Application Diagnostics for Java)
dynaTrace (dynaTrace) aka dynaCopy
Compuware (Vantage for Java)
NewRelic (RPM for Java) aka Introscrap 2.0
AppDynamics (AppDynamics Lite)
All rejected, refused or ignored my requests.
One new vendor (AppDynamics) even went as far as to change their software license agreement immediately after I had downloaded, tested and invalidated their “low overhead” claims. Then sent me emails trying to retrofit the changes to the original agreement. Fortunately I had saved the original agreement at the time of my download.
I doubt this poor standard practice amongst application performance management vendors is likely to change in the near future unless customers start demanding that the listing of a random percentage between 0 and 5 be replaced with actual unit costs which can be used as a guide in determining the suitability of a solution to a particular workload within certain response time limits and at a specified level of coverage (risk mgmt).
In future entries I will provide further information on the micro-benchmarks we use which show JXInsight and OpenCore to be hundreds if not thousands of times more efficient at a unit cost level in some very important cases.
Metered Software Service Queues
Following on from a capacity planning workshop I attended last week given by Dr. Neil J. Gunther we have released an update to JINSPIRED’s OpenCore resource metering & metrics runtime which includes a new optional probes provider extension, queue, that delivers runtime analysis of active and concurrent metered thread workload at particular service probe points and their associated hierarchical metering groups – both at the process and thread level.
Note: When I enrolled in Neils course I was hoping that by immersing myself in 5 days of discussions around capacity modeling and analysis I would have a moment of brilliance to help me see how best to integrate capacity into our metering model which already serves as a management model for performance and cost in the cloud. Well I had many such moments both during and after the class one of which has already come to fruition.
To best illustrate the benefits of this unique enhancement to our resource metering runtime I have created a small simple sample application. The application has a com.acme.Server class which creates a number of threads each of which uses a com.acme.Runner to dispatch work to a com.acme.Service with random time intervals between each work submission.
Here is the com.acme.Delay class used to simulate work and think time.
The com.acme.Service class uses the com.acme.Delay class to simulate variation in service times.
The com.acme.Runner class is the workload generator which uses the com.acme.Delay class to simulate variable think time between calls to the com.acme.Service class.
Lets explore the monitor() method in the com.acme.Server class which uses our OpenCore Open API to check on the status of Probes.Queues which in our application can represent a Java Package, Class or Method.
The first call creates a Probes.Name for the service probe point we wish to monitor in terms of metered workload queuing. The next call looks up the metering Probes.Group associated with the Probes.Name. Then later on a Probes.Queue associated with the Probes.Group is obtained and its state repeatedly read and printed out in a while loop.
Below is a sample of the output in running the above application with OpenCore’s enhanced for production dynamic load-time instrumentation agent and extension libraries. The queue.size represents the number of threads (con)currently executing the metered com.acme.Service.doWork() method. The queue.max represents the maximum degree of concurrent thread workload that has occurred in the com.acme.Service.doWork() method and the queue.count the cumulative number of com.acme.Service.doWork() calls that have been made some of which have not yet completed.
Now what is really cool about our innovative approach is that we can amalgamate queues based on the hierarchical namespace of the named probes (i.e. packages, class, method) inserted into the byte code at load-time. So instead of com.acme.Service.doWork lets lookup the metering Probes.Group for com.acme.
Here is a new sample of the output when running the application with the above change. You will notice that nothing changes across each monitor output and that 11 is listed for queue.size, queue.max, and queue.count.
So why 11? Well previously we only reported on thread activity flowing through com.acme.Service.doWork() method now with com.acme we also include the main thread which creates the threads and then loops forever in the monitor() method. By the time we get to print out for the first time in monitor all threads are performing work in either the Server, Runner, Service or Delay classes – all of which belong in the com.acme package (and its queue).
And why does the queue.count not change? The queue.count only counts non-reetrant queuing of work which means it will only count the non-terminating entry points methods, com.acme.Runner.run() and com.acme.Server.main() methods in our application for the com.acme queue. This is pretty darn cool because it allows us to observe potential workload throttling at Java Package(s) level within an application and then drill down into the detail at Class and Method level when required. We also have much more accurate statistical information to help us in sizing various resources pools consumed by service queue points within an application. Here is a diagram depicting how one might view these amalgamated queues.
This is just the beginning as we plan on providing new interfaces that help isolate software & metered resource behavioral patterns across multiple workload queues and at the various levels of sizing.
If you are interested in learning more about the interface to our OpenCore runtime then please check out our API Samples as well as our Open API docs. There are more samples on our site showing other usages of the Probes.Queue interface. Please check them out.



































































