Skip to content

CloudBees' Blog - Continuous Integration in the Cloud
Syndicate content
Updated: 1 hour 14 min ago

Dynamic Proxies (The DevOps 2.0 Toolkit)

Thu, 12/01/2016 - 22:15

The decline of hardware proxies started a long time ago. They were too expensive and inflexible even before cloud computing become mainstream. These days, almost all proxies are based on software. The major difference is what we expect from them. While, until recently, we could define all redirections as static configuration files, that changed in favor of more dynamic solutions. Since our services are being constantly deployed, redeployed, scaled, and, in general, moved around, the proxy needs to be capable of updating itself with this ever changing end-point location.

We cannot wait for an operator to update configurations with every new service (or release) we are deploying. We cannot expect him to monitor the system 24/7 and react to a service being scaled as a result of increased traffic. We cannot hope that he will be fast enough to catch a node failure which results in all services being automatically rescheduled to a healthy node. Even if we could expect such tasks to be performed by humans, the cost would be too high since an increase in the number of services and instanced we're running would mean an increase in workforce required for monitoring and reactive actions. Even if such a cost is not an issue, we are slow. We cannot react as fast as machines can and that discrepancy between a change in the system and proxy reconfiguration could, at best, result in performance issues.

Among software based proxies, Apache ruled the scene for a long time. Today, age shows its face. It is rarely the weapon of choice due to its inability to perform well under stress and relative inflexibility. Newer tools like nginx and HAProxy took over. They are capable of handling a vast amount of concurrent requests without posing a severe strain on server resources.

Even nginx and HAProxy are not enough by themselves. They were designed with static configuration in mind and require us to add additional tools to the mix. An example would be a combination of templating tools like Consul Template that can monitor changes in service registry, modify proxy configurations and reload them.

Today, we see another shift. Typically, we would use proxy services not only to redirect requests, but also to perform load balancing among all instances of a single service. With the emergence of the (new) Docker Swarm (shipped with the Docker Engine release v1.12), load balancing (LB) is moved towards software defined network (SDN). Instead performing LB among all instances, a proxy would redirect a request to an SDN end-point which, in turn, would perform load balancing.

Services architecture is switching towards microservices and, as a result, deployment and scheduling processes and tools are changing. Proxies and expectations we have from them are following those changes.

The deployment frequency is becoming higher and higher, and that poses another question. How do we deploy often without any downtime?

The DevOps 2.0 Toolkit

If you liked this article, you might be interested in The DevOps 2.0 Toolkit: Automating the Continuous Deployment Pipeline with Containerized Microservices book.

The book is about different techniques that help us architect software in a better and more efficient way with microservices packed as immutable containers, tested and deployed continuously to servers that are automatically provisioned with configuration management tools. It's about fast, reliable and continuous deployments with zero-downtime and ability to roll-back. It's about scaling to any number of servers, the design of self-healing systems capable of recuperation from both hardware and software failures and about centralized logging and monitoring of the cluster.

In other words, this book envelops the full microservices development and deployment lifecycle using some of the latest and greatest practices and tools. We'll use Docker, Ansible, Ubuntu, Docker Swarm and Docker Compose, Consul, etcd, Registrator, confd, Jenkins, nginx, and so on. We'll go through many practices and, even more, tools.

The book is available from Amazon ( and other worldwide sites) and LeanPub.

Categories: Companies

Usability and Stability Enhancements in CloudBees Jenkins Platform

Tue, 11/29/2016 - 15:29

We are excited to announce the availability of CloudBees Jenkins Platform This release delivers stability and usability by bumping the Jenkins core to 2.19.x and includes a key security fix. This is also the second “rolling release,” the output from a process we are using to provide the latest functionality to users on a more frequent release cadence. All enhancements and fixes are for the rolling release only. Fixed releases have diverged from rolling releases (locked to 2.7.X) and will follow a separate schedule.

Release Highlights Jenkins Core Bumped to 2.19.x LTS Line

This is the first LTS upgrade on the rolling release and adds key fixes, such as improved dependency management for plugins. With improved dependency management, administrators are warned when dependent plugins are absent during install time. Thus administrators can catch and fix the problem before run time and provide a smooth experience to their users.

Security-360 Fix Incorporated

All customers were sent the fix for Security-360 on Nov 16, 2016. This vulnerability allowed attackers to transfer a serialized Java object to the Jenkins CLI, making Jenkins connect to an attacker-controlled LDAP server, which in turn can send a serialized payload leading to code execution, bypassing existing protection mechanisms. If you have not installed the fix, we strongly urge you to upgrade to incorporate the security fix in your production environment.

Support for CloudBees Assurance Program in Custom Update Centers

CloudBees Assurance Program (CAP) provides a Jenkins binary and plugins that have been verified for stability and interoperability. Jenkins administrators can easily promote this distribution to their teams by setting CAP as an upstream source in their custom update centers. This reduces the operational burden by allowing admins to use CloudBees-recommended plugins for all their masters, ensuring compliance and facilitating governance.

CloudBees Assurance Program Plugin (CAP) Updates

These CloudBees verified plugins have been updated for this release of the CloudBees Jenkins Platform:

  • Mailer version 1.18
  • LDAP version 1.13
  • JUnit version 1.19
  • Email-ext version 2.51
  • Token-macro version 2.0
  • GitHub version 1.22.3
CloudBees Jenkins Platform Improvements

This release features many reliability improvements for the CloudBees Jenkins Platform, including many stability improvements to CloudBees Jenkins Operations Center connections to client masters.

Improvements & Fixes



 Jenkins core upgraded to 2.19.3 LTS (release notes)

Improved dependency management - Flags admin when plugins dependencies are present, Jenkins will not load dependent plugins, reducing errors when initializing. Creates a smoother startup through smarter scanning of plugins.

Jobs with lots of history no longer hang the UI - Improved performance from the UI for jobs with lots of build history. Lazy loading renders faster because build history will not automatically load on startup.

Reduce configuration errors caused by invalid form submissions - Browsers will not autocomplete forms in Jenkins, reducing configuration problems due to invalid data in form submissions resulting from using the browser back button. Only select form fields (e.g. job name) will offer autocompletion. For admins, Jenkins users who use the browser back button will no longer corrupt the Jenkins configuration.

CloudBees Assurance Program (CAP)

Support for Custom Update Centers - CAP is now available as an upstream source in Custom Update Centers, enabling admins to use CloudBees-recommended plugins for all their masters.

Mailer has been upgraded to version 1.18, includes a minor improvement to rendering page links and now supports the BlueOcean project.

JUnit has been upgraded to version 1.19, includes usability improvements around unsafe characters in the URI, highlighted test results.

Email-ext has been upgraded to version 2.51 contains an improvement pipeline support for expanding the tokens FAILED_TESTS, TEST_COUNTS and TRIGGER_NAME in a pipeline email notification.

Token-macro has been upgraded to 2.0 and contains improved pipeline support, allowing token macro to be used in a pipeline context, polish providing autocomplete when referencing a token name, support for variable expansion and some performance improvements when scanning large Jenkins instances.

Pipeline usability improvements

Environment variables in Pipeline jobs are now available as global Groovy variables - simplifies tracking variable scope in a pipeline.

Build and job parameters are available as environment variables and thus accessible as if they were global Groovy variables - parameters are injected directly into the Pipeline script and are no longer available in ‘bindings.’

Makes job parameters, environment variables and Groovy variables much more interchangeable, simplifying pipeline creation and making variable references much more predictable.

Skip Next Build plugin Adds the capability to skip all the jobs of a folder and its sub-folders or to skip all the jobs belonging to a “Skip Jobs Group.” Skip Jobs Group is intended to group together jobs that should be skipped simultaneously but are located in different folders. Support bundle Adds the logs of the client master connectivity to the support bundle. Fixes Details CloudBees Jenkins Platform core 
  • Possible livelock in CloudBees Jenkins Operations Center communication service.
  • Possible unbounded creation of threads in CloudBees Jenkins Operations Center communication service.
  • Fix NullPointerException in client master communication service when creating big CloudBees Jenkins Platform clusters.
  • Fix deadlock on client master when updating number of executors in CloudBees Jenkins Operations Center cloud.
  • Replace the term “slave” with “agent” in the CloudBees Jenkins Operations Center UI.
  • Unable to log into client master if a remember me cookie has been set during an authentication on the client master while CloudBees Jenkins Operations Center was unavailable.
  • “Check Now” on Manage Plugins doesn’t work when a client master is using a Custom Update Center.
  • Technical properties appear on the configuration screen of the CloudBees Jenkins Operations Center shared cloud when they should be hidden.
  • Move/copy fails in case client master is not connected to CloudBees Jenkins Operations Center.
  • Move/copy screen broken with infinite loop when the browse.js `fetchFolders` function goes to error.
Analytics and monitoring
  • Under heavy load, multiple CloudBeesMetricsSubmitter run obtaining threadInfos and slow down the application.
  • The number of available nodes in a cloud should be exposed as metrics.

Role-Based Access Control plugin

The Role-based Access Control REST API ignores requirement for POST requests (allows GET) thereby eliminating 404 HTTP errors when accessing groups from a nested client master folder.

GitHub Organization Folder plugin GitHub Organization Folder scanning issue when using custom marker files. CloudBees Assurance Program

LDAP upgraded to version 1.13, includes a major configuration bug fix.

GitHub has been upgraded to version 1.22.3 and contains a major bug fix for an issue that could crash Jenkins instances using LDAP for authentication

Frequently Asked Questions What is the CloudBees Assurance Program (CAP)?

The CloudBees Assurance Program (CAP) eliminates the risk of Jenkins upgrades by ensuring that various plugins work well together. CAP brings an unprecedented level of testing to ensure upgrades are no-risk events. The program bundles an ever-growing number of plugins in an envelope that is tested and certified together. The envelope installation/upgrade is an atomic operation - all certified versions are upgraded in lockstep, reducing the cognitive load on administrators in managing plugins.

Who is the CloudBees Assurance Program program designed for?

The program is designed for Jenkins administrators who manage Jenkins for their engineering organizations.

When was the CloudBees Assurance Program launched?

The program was launched in September 2016.

What is a rolling release?

The CAP program delivers a CloudBees Jenkins Platform on a regular cadence and this is called the “rolling” release model. A new release typically lands every 4-6 weeks. 

Do I have to upgrade on every release?

You are encouraged too but aren’t required. You can skip a release or two and the assurance program ensures your upgrades would be smooth.

What release am I on?

You can tell which version you are running by checking the footer of your CJE or CJOC instance.


How to Upgrade

Review the CloudBees Jenkins Enterprise Installation Guide and the CloudBees Jenkins Operations Center User Guide for details about upgrading, but here are the basics:

  1. Identify which CloudBees Jenkins Enterprise release line (rolling vs. fixed) you are currently running.
  2. Visit to download the latest release for your release line. (You must be logged in to see available downloads).
  3. If you are running CloudBees Jenkins Operations Center, you must upgrade it first, because you cannot connect a new CloudBees Jenkins Enterprise instance to an older version of CloudBees Jenkins Operations Center.
  4. Install the CloudBees Jenkins Platform as appropriate for your environment, and start the CloudBees Jenkins Platform instance.
  5. If the instance needs additional input during upgrade, the setup wizard prompts for additional input when you first access the instance.
Related Knowledgebase Articles Release Notes and Related Documentation


Blog Categories: JenkinsDeveloper ZoneCompany News
Categories: Companies

Now Live on DevOps Radio: Picture-Perfect CD, Featuring Dean Yu, Director, Release Engineering, Shutterfly

Mon, 11/28/2016 - 15:37

Jenkins World 2016 was buzzing with the latest in DevOps, CI/CD, automation and more. DevOps Radio wanted to capture some of that energy so we enlisted the help of Sacha Labourey, CEO at CloudBees, to host a series of episodes live at the event. We’re excited to present a new three-part series, DevOps Radio: Live at Jenkins World. This is episode two in the series.

Dean Yu, director of release engineering at Shutterfly, has been with the Jenkins community since before Jenkins was called Jenkins. Today, he’s a member of the Jenkins governance board and an expert in all things Jenkins and CI. He attended Jenkins World 2016 to catch up with the community, check out some sessions and sit down with Sacha Labourey for a special episode of DevOps Radio.

Sacha had a lot of questions for Dean, but the very first question he asked was, “What is new at Shutterfly?” Dean revealed how his team is using Jenkins, working on CI/CD and keeping pace with business during Shutterfly’s busiest season, the holidays. If you’re interested in learning CI/CD best practices or hearing what one Jenkins leader thinks about the future of software development and delivery, then you need to tune in today!

You don’t have to stop making your holiday card or photo book on, just plug in your headphone and tune into DevOps Radio. The latest DevOps Radio episode is available now on the CloudBees website and on iTunes.

Join the conversation about the episode on Twitter by tweeting to @CloudBees and including #DevOpsRadio in your post. After you listen, we want to know your thoughts. What did you think of this episode? What do you want to hear on DevOps Radio next? And, what’s on your holiday DevOps wishlist?

Sacha Labourey and Dean Yu talk about CD at Shutterfly, during Jenkins World 2016 (below).
P.S. Check out Dean’s massive coffee cup. It displays several pictures of his daughter and was created - naturally - on the Shutterfly website. 









Categories: Companies

Browser-testing with Sauce OnDemand and Pipeline

Fri, 11/18/2016 - 21:38

Testing web applications across multiple browsers on different platforms can be challenging even for smaller applications. With Jenkins and the Sauce OnDemand Plugin, you can wrangle that complexity by defining your Pipeline as Code.

Pipeline ♥ UI Testing, Too

I recently started looking for a way to do browser UI testing for an open-source JavaScript project to which I contribute. The project is targeted primarily at Node.js but we're committed to maintaining browser-client compatibility as well. That means we should run tests on a matrix of browsers. Sauce Labs has an "open-sauce" program that provides free test instances to open-source projects. I decided to try using the Sauce OnDemand Plugin and Nightwatch.js to run Selenium tests on a sample project first, before trying a full-blown suite of tests.

Starting from Framework

I started off by following Sauce Labs' instructions on "Setting up Sauce Labs with Jenkins" as far as I could. I installed the JUnit and Sauce OnDemand plugins, created an account with Sauce Labs, and added my Sauce Labs credentials to Jenkins. From there I started to get a little lost. I'm new to Selenium and I had trouble understanding how to translate the instructions to my situation. I needed a working example that I could play with.

Happily, there's a whole range of sample projects in "saucelabs-sample-test-frameworks" on GitHub, which show how to integrate Sauce Labs with various test frameworks, including Nightwatch.js. I forked the Nightwatch.js sample to bitwiseman/JS-Nightwatch.js and set to writing my Jenkinsfile. Between the sample and the Sauce Labs instructions, I was able to write a pipeline that ran five tests on one browser via Sauce Connect:

node {
    stage "Build"
    checkout scm

    sh 'npm install' // <1>

    stage "Test"
    sauce('f0a6b8ad-ce30-4cba-bf9a-95afbc470a8a') { // <2>
        sauceconnect(options: '', useGeneratedTunnelIdentifier: false, verboseLogging: false) { // <3>
            sh './node_modules/.bin/nightwatch -e chrome --test tests/guineaPig.js || true' // <4>
            junit 'reports/**' // <5>
            step([$class: 'SauceOnDemandTestPublisher']) // <6>
  • 1: Install dependencies
  • 2: Use my[previously added sauce credentials]
  • 3: Start up the Sauce Connect tunnel to Sauce Labs
  • 4: Run Nightwatch.js
  • 5: Use JUnit to track results and show a trend graph
  • 6: Link result details from Sauce Labs

NOTE: This pipeline expects to be run from a Jenkinsfile in SCM. To copy and paste it directly into a Jenkins Pipeline job, replace the checkout scm step with git url:'', branch: 'sauce-pipeline'.

I ran this job a few times to get the JUnit report to show a trend graph.

This sample app generates the SauceOnDemandSessionID for each test, enabling the Jenkins Sauce OnDemand Plugin's result publisher to link results to details Sauce Labs captured during the run.

Adding Platforms

Next I wanted to add a few more platforms to my matrix. This would require changing both the test framework configuration and the pipeline. I'd need to add new named combinations of platform, browser, and browser version (called "environments") to the Nightwatch.js configuration file, and modify the pipeline to run tests in those new environments.

This is a perfect example of the power of pipeline as code. If I were working with a separately configured pipeline, I'd have to make the change to the test framework, then change the pipeline manually. With my pipeline checked in as code, I could change both in one commit, preventing errors resulting from pipeline configurations going out of sync from the rest of the project.

I added three new environments to nightwatch.json:

"test_settings" : {
  "default": { /*---8<---8<---8<---*/ },
  "chrome": { /*---8<---8<---8<---*/ },

  "firefox": {
    "desiredCapabilities": {
      "platform": "linux",
      "browserName": "firefox",
      "version": "latest"
  "ie": {
    "desiredCapabilities": {
      "platform": "Windows 10",
      "browserName": "internet explorer",
      "version": "latest"
  "edge": {
    "desiredCapabilities": {
      "platform": "Windows 10",
      "browserName": "MicrosoftEdge",
      "version": "latest"

And I modified my Jenkinsfile to call them:

sauceconnect(options: '', useGeneratedTunnelIdentifier: false, verboseLogging: false) {
    def configs = [ // <1>
    // Run selenium tests using Nightwatch.js
    sh "./node_modules/.bin/nightwatch -e ${configs} --test tests/guineaPig.js" // <2>
} //---8<---8<---8<---8<---8<---8<---
  • 1: Using an array to improve readability and make it easy to add more platforms later.
  • 2: Changed from single-quoted string to double-quoted to support variable substitution.

WARNING: Test frameworks have bugs too. Nightwatch.js (v0.9.8) generates incomplete JUnit files, reporting results without enough information in them to distinguish between platforms. I implemented a fix for it and submitted a PR to Nightwatch.js. This blog shows output with that fix applied locally.

As expected, Jenkins picked up the new pipeline and ran Nightwatch.js on four platforms. Sauce Labs of course recorded the results and correctly linked them into this build. Nightwatch.js was already configured to use multiple worker threads to run tests against those platforms in parallel, and my Sauce Labs account supported running them all at the same time, letting me cover four configurations in less that twice the time, and that added time was most due to individual new environments taking longer to complete. When I move to the actual project, this will let me run broad acceptance passes quickly.

Conclusion: To Awesome and Beyond

Considering the complexity of the system, I was impressed with how easy it was to integrate Jenkins with Sauce OnDemand to start testing on multiple browsers. The plugin worked flawlessly with Jenkins Pipeline. I went ahead and ran some additional tests to show that failure reporting also behaved as expected.

    sh "./node_modules/.bin/nightwatch -e ${configs}" // <1>
  • 1: Removed --test filter to run all tests

Epilogue: Pipeline vs. Freestyle

Just for comparison here's the final state of this job in Freestyle UI versus fully-commented pipeline code:

NOTE: This includes the AnsiColor Plugin to support Nightwatch.js' default ANSI color output.


node {
    stage "Build"
    checkout scm

    // Install dependencies
    sh 'npm install'

    stage "Test"

    // Add sauce credentials
    sauce('f0a6b8ad-ce30-4cba-bf9a-95afbc470a8a') {
        // Start sauce connect
        sauceconnect(options: '', useGeneratedTunnelIdentifier: false, verboseLogging: false) {

            // List of browser configs we'll be testing against.
            def platform_configs = [

            // Nightwatch.js supports color ouput, so wrap this step for ansi color
            wrap([$class: 'AnsiColorBuildWrapper', 'colorMapName': 'XTerm']) {

                // Run selenium tests using Nightwatch.js
                // Ignore error codes. The junit publisher will cover setting build status.
                sh "./node_modules/.bin/nightwatch -e ${platform_configs} || true"

            junit 'reports/**'

            step([$class: 'SauceOnDemandTestPublisher'])

NOTE: This pipeline expects to be run from a Jenkinsfile in SCM. To copy and paste it directly into a Jenkins Pipeline job, replace the checkout scm step with git url:'', branch: 'sauce-pipeline'.

Not only is the pipeline as code more compact, it also allows for comments to further clarify what is being done. And as I noted earlier, changes to this pipeline code are committed the same as changes to the rest of the project, keeping everything synchronized, reviewable, and testable at any commit. In fact, you can view the full set of commits for this blog post in the sauce-pipeline branch of the bitwiseman/JS-Nightwatch.js repository.

Links Blog Categories: JenkinsDeveloper Zone
Categories: Companies

Meet the Bees: Robert Sandell

Thu, 11/17/2016 - 19:39

In every Meet the Bees blog post, you’ll learn more about a different CloudBees Bee. This time we are in Sweden, visiting Robert Sandell.

Who are you? What is your role at CloudBees?
My name is Robert Sandell, but everyone except for family calls me Bobby. I am a software engineer working on features and bug/security fixes for Jenkins, which means that I get to push code to my favorite OSS project all day, every day.

What makes CloudBees different from other companies?
It is a very distributed company; many, especially in my team, are working from home. In fact, I am the only one in the company working in Scandinavia (although I’m hoping that’s not going to be for long) and my closest colleagues, last time I measured, are in Germany, Belgium and England. And that’s pretty cool; we get to meet up at various conferences all over the world.

What do you think the future holds for Jenkins?
The future has a tendency to not hold onto much, it’s more like a big ball of wibbly-wobbly, timey-whimey… stuff. ;)
But Jenkins has a good baseline to build on, so stuff is happening all the time, and we even have some big stuff planned in the short term, which is really exciting.

If you weren’t working with Jenkins, what would you be doing?
I would probably still be in the area of software tools development. I enjoy making stuff that makes people’s lives easier. Or maybe doing architecture with some big enterprisy stuff, stitching together big systems into even bigger systems to make them do magical things at scale is also quite exciting.

Something we all have in common these days is the constant use of technology. What’s your favorite gadget and why?
I’m a computer guy, so my first thought is my media center PC that I’ve built myself and configured just the way I want it. I don’t think I’ve watched broadcast television in years. I can choose whatever I want to watch now, instead of waiting for that scheduled time or zapping channels until I find something half-interesting. (Note: No pictures exist of this custom system, even though I know you would all LOVE to see it!)

Vanilla or chocolate or some other flavor, what’s your favorite ice cream flavor and brand?
Häagen-Dazs Strawberries and Cream, even though I’m lactose intolerant I do indulge myself with a tub of it once in a while. :)

But another desserty thing that I find utterly wonderful is an apple or lemon sorbet with a glass of calvados, a cup of coffee and a tiny piece of chocolate for desert.



Blog Categories: Jenkins
Categories: Companies

Now Live: DevOps Radio Jumps into DevOps with JFrog Co-Founder and Chief Architect Fred Simon

Tue, 11/15/2016 - 17:05

Jenkins World 2016 was buzzing with the latest in DevOps, CI/CD, automation and more. DevOps Radio wanted to capture some of that energy, so we enlisted the help of Sacha Labourey, CEO at CloudBees, to host a series of episodes, live at the event. As a result of Sacha’s involvement, we’re excited to present a new three-part series, DevOps Radio: Live at Jenkins World.

What do you think happens when two long-time friends get together to talk DevOps, CI/CD and automation? At Jenkins World 2016, Sacha sat down with JFrog’s Fred Simon to catch up and talk about the DevOps movement, Jenkins and developer’s dirty bits (yes, that’s really in there!). There are also a lot of laughs and great knowledge. In-between, they discuss meeting in France, the beaches in Israel…and golf - or the lack thereof.

Fred and Sacha, both industry veterans, have wisdom to share and anecdotes from years of knowing each other and even working together. If you want to learn more about the development of DevOps, how to achieve DevOps success and what’s the real deal with Docker, tune in and listen!

The latest DevOps Radio episode is available now on the CloudBees website and on iTunes. Join the conversation about the episode on Twitter by tweeting out to @CloudBees and including #DevOpsRadio in your post.


Fred and Sacha laughing recording live at Jenkins World 2016!










Blog Categories: JenkinsCompany News
Categories: Companies

Joining the Big Leagues: Tuning Jenkins GC For Responsiveness and Stability

Wed, 11/09/2016 - 22:47

Today I'm going to show you how easy it is to tune Jenkins Java settings to make your masters more responsive and stable, especially with large heap sizes.

The Magic Settings:
  • Basics: -server -XX:+AlwaysPreTouch
  • GC Logging: -Xloggc:$JENKINS_HOME/gc-%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=20m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy
  • G1 GC settings: -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1
  • Heap settings: set your minimum heap size (-Xms) to at least 1/2 of your maximum size (-Xmx).

Now, let's look at where those came from! We're going to focus on garbage collection (GC) here and dig fast and deep to strike for gold; if you're not familiar with GC fundamentals take a look at this source.

Because performance tuning is data driven, I'm going to use real-world data selected from three of our customers running large masters (all are Global 500 companies, 2 in the top 100).

What we're not going to do: Jenkins basics, or play with max heap. See the section "what should I do before tuning." This is for cases where we really do need a big heap and can't easily split our Jenkins masters into smaller ones.

The Problem: Hangups

Symptom: Users report that the Jenkins instance periodically hangs, taking several seconds to handle normally fast requests. We may even see lockups or timeouts from systems communicating with the Jenkins master (build agents, etc). In long periods of heavy load, users may report Jenkins running slowly. Application monitoring shows that during lockups all or most of the CPU cores are fully loaded, but there's not enough activity to justify it. Process and JStack dumps will reveal that the most active Java threads are doing garbage collection.

At company A, they had this problem. Their Jenkins Java arguments are very close to the default, aside from sizing the heap:

  • 24 GB max heap, 4 GB initial, default GC settings (ParallelGC)
  • A few flags set (some coming in as defaults): -XX:-BytecodeVerificationLocal -XX:-BytecodeVerificationRemote -XX:+ReduceSignalUsage -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:-UseLargePagesIndividualAllocation

After enabling garbage collection (GC) logging we see the following rough stats:

HeapStats Company A System Red CPU use-parallelGC.

Diving deeper, we get this chart of GC pause durations:

Company A Jenkins Red GC duration use-parallelGC.

Key stats:

  • Throughput: 99.64% (percent of time spent executing application code, not doing garbage collection)
  • Average GC time: 348 ms (ugh!)
  • GC cycles over 2 seconds: 36 (2.7%)
  • Minor/Full GC average time: 263 ms / 2.803 sec
  • Object creation & promotion rate: 42.4 MB/s & 1.99 MB/s


As you can see, young GC cycles very quickly clear away freshly-created garbage, but the deeper old-gen GC cycles run very slowly: 2-4 seconds. This is where our problems happen. The default Java garbage collection algorithm (ParallelGC) pauses everything when it has to collect garbage (often called a "stop the world pause"). During that period, Jenkins is fully halted: normally (with small heaps) these pauses are too brief to be an issue. With heaps of 4 GB or larger, the time required becomes long enough to be a problem: several seconds over short windows, and over a longer interval you occasionally see much longer pauses (tens of seconds, or minutes.)

This is where the user-visible hangs and lock-ups happen. It also adds significant latency to those build/deploy tasks. In periods of heavy load, the system was even experiencing hangs of 30+ seconds for a single full GC cycle. This was long enough to trigger network timeouts (or internal Jenkins thread timeouts) and cause even larger problems.

Fortunately there's a solution: the concurrent low-pause garbage collection algorithms, Concurrent Mark Sweep (CMS) and Garbage First (G1). These attempt to do much of the garbage collection concurrently with application threads, resulting in much shorter pauses (at a slight cost in extra CPU use). We're going to focus on G1, because it is slated to become the default in Java 9 and is the official recommendation for large heap sizes.

Let's see what happens when someone uses G1 on a similarly-sized Jenkins master at company B (17 GB heap):

Their settings:

  • 16 GB max heap, 0.5 GB initial size
  • Java flags (mostly defaults, except for G1): -XX:+UseG1GC -XX:+UseCompressedClassPointers -XX:+UseCompressedOops

And the GC log analysis:

Company B Jenkins G1 duration.

Key stats:

  • Throughput: 98.76% (not great, but still only slowing things down a bit)
  • Average GC time: 128 ms
  • GC cycles over 2 seconds: 11, 0.27%
  • Minor/Full GC average time: 122 ms / 1 sec 232 ms
  • Object creation & promotion rate: 132.53 MB/s & 522 KB/s

Okay, much better: some improvement may be expected from a 30% smaller heap, but not as much as we've seen. Most of the GC pauses are under well under 2 seconds, but we have 11 outliers - long Full GC pauses of 2-12 seconds. Those are troubling; we'll take a deeper dive into their causes in a second. First, let's look at the big picture and at how Jenkins behaves with G1 GC for a second company.

G1 Garbage Collection at Company C (24 GB heap):

Their settings:

  • 24 GB max heap, 24 GB initial heap, 2 GB max metaspace
  • Some custom flags: -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+UseStringDeduplication -XX:+UseCompressedClassPointers -XX:+UseCompressedOops

Clearly they've done some garbage collection tuning and optimization. The AlwaysPreTouch pre-zeros allocated heap pages, rather than waiting until they're first used. This is suggested especially for large heap sizes, because it trades slightly slower startup times for improved runtime performance. Note also that they pre-allocated the whole heap. This is a common optimization.

They also enabled StringDeduplication, a G1 option introduced in Java 8 Update 20 that transparently replaces identical character arrays with pointers to the original, reducing memory use (and improving cache performance). Think of it like String.intern() but it silently happens during garbage collection. This is a concurrent operation added on to normal GC cycles, so it doesn't pause the application. We'll look at its impacts later.

Looking at the basics:

Company C G1 duration

Similar picture to company B, but it's hidden by the sheer number of points (this is a longer period here, 1 month). Those same occasional Full GC outliers are present!

Key stats:

  • Throughput: 99.93%
  • Average GC time: 127 ms
  • GC cycles over 2 seconds: 235 (1.56%)
  • Minor/Full GC average time: 56 ms / 3.97 sec
  • Object creation & promotion rate: 34.06 MB/s & 286 kb/s

Overall fairly similar to company B: ~100 ms GC cycles, all the minor GC cycles are very fast. Object promotion rates sound similar.

Remember those random long pauses?

Let's find out what caused them and how to get rid of them. Company B had 11 super-long pause outliers. Let's get some more detail, by opening GC Logs in GCViewer. This tool gives a tremendous amount of information. Too much, in fact -- I prefer to use except where needed. Since GC logs do not contain compromising information (unlike heap dumps or some stack traces), web apps are a great tool for analysis.

Company B Jenkins G1 causes

What we care about are at the Full GC times in the middle (highlighted). See how much longer they are vs. the young and concurrent GC cycles up top (2 seconds or less)?

Now, I lied a bit earlier - sorry! For concurrent garbage collectors, there are actually 3 modes: young GC, concurrent GC, and full GC. Concurrent GC replaces the Full GC mode in Parallel GC with a faster concurrent operation that runs in parallel with the application. But in a few cases, we will are forced to fall back to a non-concurrent Full GC operation, which will use the serial (single-threaded) garbage collector. That means that even if we have 30+ CPU cores, only one is working. This is what is happening here, and on a large-heap, multicore system it is S L O W. How slow? 280 MB/s vs. 12487 MB/s for company B (for company C, the difference is also about 50:1).

What triggers a full GC instead of concurrent:

  • Explicit calls to System.gc() (most common culprit, often tricky to trace down)
  • Metadata GC Threshold: Metaspace (used for Class data mostly) has hit the defined size to force garbage collection or increase it. Documentation is terrible for this, Stack Overflow will be your friend.
  • Concurrent mode failure: concurrent GC can't complete fast enough to keep up with objects the application is creating (there are JVM arguments to trigger concurrent GC earlier)

How do we fix this?

For explicit GC:

  • -XX:+DisableExplicitGC will turn off Full GC triggered by System.gc(). Often set in production, but the below option is safer.
  • We can trigger a concurrent GC in place of a full one with -XX:+ExplicitGCInvokesConcurrent - this will take the explicit call as a hint to do deeper cleanup, but with less performance cost.

Gotcha for people who've used CMS: if you have used CMS in the past, you may have used the option -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses -- which does what it says. This option will silently fail in G1, meaning you still see the very long pauses from Full GC cycles as if it wasn't set (no warning is generated). I am in the presence of trying to log a JVM bug for this issue.

For the Metadata GC threshold:

  • Increase your initial metaspace to the final amount to avoid resizing. For example: -XX:MetaspaceSize=500M

Company C also suffered the same problem with explicit GC calls, with almost all our outliers accounted for (230 out of 235) by slow, nonconcurrent Full GC cycles (all from explicit System.gc() calls, since they tuned metaspace):

Company C Jenkins G1 GC causes

Here's what GC pause durations look like if we remove the log entries for the explicit System.gc() calls, assuming that they'll blend in with the other concurrent GC pauses (not 100% accurate, but a good approximation):

Company B:

Company B Jenkins GC duration - G1 - no explicit pauses

The few long Full GC cycles at the start are from metaspace expansion -- they can be removed by increasing initial Metaspace size, as noted above. The spikes? That's when we're about to resize the Java heap, and memory pressure is high. You can avoid this by setting the minimum/initial heap to at least half of the maximum, to limit resizing.


  • Throughput: 98.93%
  • Average GC time: 111 ms
  • GC cycles over 2 seconds: 3
  • Minor & Full or concurrent GC average time: 122 ms / 25 ms (yes, faster than minor!)
  • Object creation & promotion rate: 132.07 MB/s & 522 kB/s

Company C:

Company C Jenkins G1 - no explicit pauses


  • Throughput: 99.97%
  • Average GC time: 56 ms
  • GC cycles over 2 seconds: 0 (!!!)
  • Minor & Full or concurrent GC average time: 56 ms & 10 ms (yes, faster than minor!)
  • Object creation & promotion rate: 33.31 MB/s & 286 kB/s
  • Side point: GCViewer is claiming GC performance of 128 GB/s (not unreasonable, we clear ~10 GB of young generation in under 100 ms usually)

Okay, so we've tamed the long worst-case pauses!

But What About Those Long Minor GC Pauses We Saw?

Okay, now we're in the home stretch! We've tamed the old-generation GC pauses with concurrent collection, but what about those longer young-generation pauses? Lets look at stats for the different phases and causes again in GCViewer.

Company C Jenkins G1 causes -no explicit pauses

Highlighted in yellow we see the culprit: the remark phase of G1 garbage collection. This stop-the-world phase ensures we've identified all live objects, and processes references (more info).

Let's look at a sample execution to get more info:

2016-09-07T15:28:33.104+0000: 26230.652: [GC remark 26230.652: [GC ref-proc, 1.7204585 secs], 1.7440552 secs]

[Times: user=1.78 sys=0.03, real=1.75 secs]

This turns out to be typical for the GC log: the longest pauses are spent in reference processing. This is not surprising because Jenkins internally uses references heavily for caching, especially weak references, and the default reference processing algorithm is single-threaded. Note that user (CPU) time matches real time, and it would be higher if we were using multiple cores.

So, we add the GC flag -XX:+ParallelRefProcEnabled which enables us to use the multiple cores more effectively.

Tuning young-generation GC further based on Company C:

Back to GCViewer we go, to see what's time consuming with the GC for company C.

Company C Jenkins G1 causes -no explicit pauses

That's good, because most of the time is just sweeping out the trash (evacuation pause). But the 1.8 second pause looks odd. Let's look at the raw GC log for the longest pause:

2016-09-24T16:31:27.738-0700: 106414.347: [GC pause (G1 Evacuation Pause) (young), 1.8203527 secs]

[Parallel Time: 1796.4 ms, GC Workers: 8]

 [GC Worker Start (ms): Min: 106414348.2, Avg: 106414348.3, Max: 106414348.6, Diff: 0.4]

[Ext Root Scanning (ms): Min: 0.3, Avg: 1.7, Max: 5.7, Diff: 5.4, Sum: 14.0]

  [Update RS (ms): Min: 0.0, Avg: 7.0, Max: 19.6, Diff: 19.6, Sum: 55.9]

    [Processed Buffers: Min: 0, Avg: 45.1, Max: 146, Diff: 146, Sum: 361]

 [Scan RS (ms): Min: 0.2, Avg: 0.4, Max: 0.7, Diff: 0.6, Sum: 3.5]

 [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.2]

 [Object Copy (ms): Min: 1767.1, Avg: 1784.4, Max: 1792.6, Diff: 25.5, Sum: 14275.2]

 [Termination (ms): Min: 0.3, Avg: 2.4, Max: 3.5, Diff: 3.2, Sum: 19.3]

    [Termination Attempts: Min: 11, Avg: 142.5, Max: 294, Diff: 283, Sum: 1140]

 [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.4, Diff: 0.3, Sum: 0.8]

 [GC Worker Total (ms): Min: 1795.9, Avg: 1796.1, Max: 1796.2, Diff: 0.3, Sum: 14368.9]

 [GC Worker End (ms): Min: 106416144.4, Avg: 106416144.5, Max: 106416144.5, Diff: 0.1]

...oh, well dang. Almost the entire time (1.792 s out of 1.820) is walking through the live objects and copying them. And wait, what about this line, showing the summary statistics:

Eden: 13.0G(13.0G)->0.0B(288.0M) Survivors: 1000.0M->936.0M Heap: 20.6G(24.0G)->7965.2M(24.0G)]

Good grief, we flushed out 13 GB (!!!) of freshly-allocated garbage in one swoop and compacted the leftovers! No wonder it was so slow. I wonder how we accumulated so much...

Company C Jenkins G1-ExplictGC removed

Oh, right... we set up for 24 GB of heap initially, and each minor GC clears most of the young generation. Okay, so we've set aside tons of space for trash to collect, which means longer GC periods. We can control GC pause time by setting the -XX:MaxGCPauseMillis flag, but if we look up, most pauses are right under the 250 ms default value. For Jenkins on this hardware, this appears to work just fine. We could consider reducing the heap size a bit if young-generation GC pauses were too long as well, but first should let G1 try to tune itself for pause times.

A Few Final Settings

We mentioned StringDeduplication was on at Company C, what is the impact? This only triggers on Strings that have survived a few generations (most of our garbage does not), has limits on the CPU time it can use, and replaces duplicate references to their immutable backing character arrays. For more info, look here. So, we should be trading a little CPU time for improved memory efficiently (similarly to string interning).

At the beginning, this has a huge impact:

[GC concurrent-string-deduplication, 375.3K->222.5K(152.8K), avg 63.0%, 0.0 024966 secs]

[GC concurrent-string-deduplication, 4178.8K->965.5K(3213.2K), avg 65.3%, 0 .0272168 secs]

[GC concurrent-string-deduplication, 36.1M->9702.6K(26.6M), avg 70.3%, 0.09 65196 secs]

[GC concurrent-string-deduplication, 4895.2K->394.9K(4500.3K), avg 71.9%, 0 .0114704 secs]

This peaks at an average of about ~90%:

After running for a month, less of an impact - many of the strings that can be deduplicated already are:

[GC concurrent-string-deduplication, 138.7K->39.3K(99.4K), avg 68.2%, 0.0007080 secs]

[GC concurrent-string-deduplication, 27.3M->21.5M(5945.1K), avg 68.1%, 0.0554714 secs]

[GC concurrent-string-deduplication, 304.0K->48.5K(255.5K), avg 68.1%, 0.0021169 secs]

[GC concurrent-string-deduplication, 748.9K->407.3K(341.7K), avg 68.1%, 0.0026401 secs]

[GC concurrent-string-deduplication, 3756.7K->663.1K(3093.6K), avg 68.1%, 0.0270676 secs]

[GC concurrent-string-deduplication, 974.3K->17.0K(957.3K), avg 68.1%, 0.0121952 secs]

However it's cheap to use: in average, each dedup cycle takes 8.8 ms and removes 2.4 kB of duplicates. The median takes 1.33 ms and removes 17.66 kB from the old generation. A small change per cycle, but in aggregate it adds up quickly -- in periods of heavy load, this can save hundreds of megabytes of data. But that's still small, relative to multi-GB heaps.

Conclusion: turn string deduplication on string deduplication is fairly cheap to use, and reduces the steady-state memory needed for Jenkins. That frees up more room for the young generation, and should overall reduce GC time by removing duplicate objects. I think it's worth turning on.

Soft reference flushing: Jenkins uses soft references for caching build records and in pipeline FlowNodes. The only guarantee for these is that they will be removed instead of causing an OutOfMemoryError... however Java applications can slow to a crawl from memory pressure long before that happens. There's an option that provides a hint to the JVM based on time & free memory, controlled by -XX:SoftRefLRUPolicyMSPerMB (default 1000). The SoftReferences become eligible for garbage collection after this many milliseconds have elapsed since last touch... per MB of unused heap (vs the maximum). The referenced objects don't count towards that target. So, with 10 GB of heap free and the default 1000 ms setting, soft references stick around for ~2.8 hours (!).

If the system is continuously allocating more soft references, it may trigger heavy GC activity, rather than clearing out soft references. See the open bug JDK-6912889 for more details.

If Jenkins consumes excessive old generation memory, it may help to make soft references easier to flush by reducing -XX:SoftRefLRUPolicyMSPerMB from its default (1000) to something smaller (say 10-200). The catch is that SoftReferences are often used for objects that are relatively expensive to load, such lazy-loaded build records and pipeline FlowNode data.


G1 vs. CMS: G1 was available on later releases of JRE 7, but unstable and slow. If you use it you absolutely must be using JRE 8, and the later the release the better (it's gotten a lot of patches). Googling around will show horrible G1 vs CMS benchmarks from around 2014: these are probably best ignored, since the G1 implementation was still immature then. There's probably a niche for CMS use still, especially on midsized heaps (1-3 GB) or where settings are already tuned. With appropriate tuning it can still perform generally well for Jenkins (which mostly generates short-lived garbage), but CMS eventually suffer from heap fragmentation and need a slow, non-concurrent Full GC to clear this. It also needs considerably more tuning than G1.

General GC tuning caveats: No single setting is perfect for everybody. We avoid tweaking settings that we don't have strong evidence for here, but there are of course many additional settings to tweak. One shouldn't change them without evidence though, because it can cause unexpected side effects. The GC logs we enabled earlier will collect this evidence. The only setting that jumps out as a likely candidate for further tuning is G1 region size (too small and there are many humungous object allocations, which hurt performance). Running on smaller systems, I've seen evidence that regions shouldn't be smaller than 4 MB because there are 1-2 MB objects allocated somewhat regularly -- but it's not enough to make solid guidance without more data.

What Should I Do Before Tuning Jenkins GC:

If you've seen Stephen Connolly's excellent Jenkins World talk, you know that most Jenkins instances can and should get by with 4 GB or less of allocated heap, even up to very large sizes. You will want to turn on GC logging (suggested above) and look at stats over a few weeks (remember If you're not seeing periodic longer pause times, you're probably okay.

For this post we assume we've already done the basic performance work for Jenkins:

  1. Jenkins is running on fast, SSD-backed storage.
  2. We've set up build rotation for your Jobs, to delete old builds so they don't pile up.
  3. The weather column is already disabled for folders.
  4. All builds/deploys are running on build agents (formerly slaves), not on the master. If the master has executors allocated, they are exclusively used for backup tasks.
  5. We've verified that Jenkins really does need the large heap size and can't easily be split into separate masters.

If not, we need to do that FIRST before looking at GC tuning, because those will have larger impacts.


We've gone from:

  • Average 350 ms pauses (bad user experience) including less frequent 2+ second generation pauses
  • To an average pause of ~50 ms, with almost all under 250 ms
  • Reduced total memory footprint from String deduplication


  1. Use Garbage First (G1) garbage collection, which performs generally very well for Jenkins. Usually there's enough spare CPU time to enable concurrent running.
  2. Ensure explicit System.gc() and metaspace resizing do not trigger a Full GC because this can trigger a very long pause
  3. Turn on parallel reference processing for Jenkins to use all CPU cores fully.
  4. Use String deduplication, which generates a tidy win for Jenkins
  5. Enable GC logging, which can then be used for the next level of tuning and diagnostics, if needed.

There's still a little unpredictability, but using appropriate settings gives a much more stable, responsive CI/CD server... even up to 20 GB heap sizes!

Further Reading: Blog Categories: Jenkins
Categories: Companies

Ensuring Corporate Standards in Pipelines with Custom Marker Files

Mon, 11/07/2016 - 22:27

Pipeline-as-Code revolutionized how continuous delivery pipelines are defined in Jenkins by checking in the Pipeline as a  ‘Jenkinsfile’ in your repository instead of storing the definition locally in Jenkins. This becomes especially useful when leveraging the direct integrations Jenkins has with Github and Bitbucket. In this case, Jenkins will scan the entire Organization for repositories containing Jenkinsfiles and then create the associated pipelines in Jenkins automatically.

The Jenkinsfile approach is great for many use cases but many times larger organizations would like to use Pipeline-as-Code while still setting certain standards to be used throughout the organization. Pipeline-as-Code allows free reign to developers to create their own Jenkinsfiles without any regard to corporate standards or practices. This may be fine with a small implementation, but as the as the number of projects and repos increase, there is a higher chance of teams deviating from the best practices developed by the shared services team. Additionally, there are many instances where you may just want to ensure that certain commands run before or after the general Pipeline, such as requiring a cleanup to occur after any build runs in the Organization. The Custom Marker File feature in the CloudBees Jenkins Platform was created to address some of these concerns.

As blogged about earlier, Custom Marker Files allow you to associate repositories in your SCM that have a given identifier with a generic Pipeline. For example, instead of having the same Jenkinsfile defined in every Java project, Custom Marker Files will instead allow you to define one Pipeline that should be used by all repositories that have a ‘pom.xml’ file in it (with pom.xml being the identifier that lets you know that this is a Java project). This functions as an easy way to get new teams onboarded because once they create a new Java Project in Github/Bitbucket, Jenkins will start building the project without the need of creating a new Jenkinsfile.

In many cases, fully templatizing the Jenkinsfile for the entire organization as described above​ may be too restrictive, which is why Custom Marker Files also allow you to set standards while still giving teams the flexibility to create their own Jenkinsfiles. Let’s go through an example of how this would work in a Github Organization (works with Bitbucket Teams as well). Make sure to have updated versions of the CloudBees Pipeline: Templates plugin.

  • Click New Item -> Github Organization and give it a name to create a new Github Organization
  • In the Configuration page, enter in the Github Organization and Scan Credentials
  • In the Project Recognizers section
    • Delete the option “Pipeline Jenkinsfile”
    • Click “Add” and select “Custom script”
  • Enter in the information as shown below​
  • Click Save to start scanning your organization

Notice that in this case the Marker File is set to a Jenkinsfile. This means that any repository or branch in the Organization with a Jenkinsfile will be detected. The Pipeline section then defines the Pipeline that will be used for all projects containing the ‘Jenkinsfile’ Marker File. Let’s analyze this Pipeline script.

  • Lines 1-3 prepends a ‘preflight checks’ stage that allows custom actions such as setting up the build environment before the main Jenkinsfile Pipeline.
  • Line 5 uses the ‘readTrusted’ function to read the Jenkinsfile file into a variable. readTrusted allows you to read files from the project that is being checked out without the requirement of being on a node.
  • Line 6 then actually runs the Jenkinsfile pipeline by using the ‘evaluate’ function. ‘evaluate’ is similar to ‘load’ but does not require to be run on a ‘node’ as well.
  • Lines 8-10 appends a ‘postflight cleanup’ stage where cleanup commands can be run after the pipeline has completed.

Looking at the Stage View above, you can see that the main Pipeline ran with the ‘preflight checks’ stage prepended and the ‘postflight cleanup’ stage appended.

This method of reading in the Jenkinsfile for Custom Marker Files can be useful for requiring additional actions to be run across the organization. The example in this blog is very simple but this can also be a good starting place in the future to analyze the Jenkinsfile that developers create and ensure that it meets certain standards. An example of this would be to read in the Jenkinsfile into a variable (look at Line 5) and parse that variable to make sure that certain functions are used or that it adheres to a certain schema. With all of these different options, Custom Marker Files in the CloudBees Jenkins Platform help you strike a balance between standardizing and providing adequate freedom for your developers.

Isaac Cohen is a Solutions Architect at CloudBees, enabling customers to achieve their DevOps \ Continuous Delivery goals and making the overall SDLC as efficient and transparent as possible. Isaac has extensive knowledge of Jenkins and its various tool integrations as well as hands-on experience leading DevOps teams for large enterprises.


Blog Categories: Developer ZoneJenkins
Categories: Companies

Level-Up Your Automation Game with the New DevOps Radio Episode Featuring Joshua Nixdorf, Technical Director at Electronic Arts

Tue, 11/01/2016 - 15:53

Imagine you spent all day playing with, working on video games. Seems like a dream, right? For Joshua Nixdorf, technical director at EA Games, working with video games all day is his reality. However, even a company that focuses on fun entertainment can be confronted with challenges to deliver that entertainment. In fact, when EA Games started to grow development operations, that’s exactly what happened.

Josh Nixdorf started at EA Games during a time when automation and continuous integration (CI) were just starting to spread. Early on, EA Games had dozens of CI jobs, today, they have hundreds. Several years ago, the company had a handful of QA disk builds, in total; they now have hundreds - for each of the world’s regions. In order to keep up with demand and growth, the company had to expand their CI/ CD practices. The first question that comes to mind… How? Well, we’ll let Josh tell you that part! He would know, as he was the driver behind automating the software delivery process.

In episode 8 of DevOps Radio, Josh and DevOps Radio host Andre Pino talk about what it’s like working for EA Games and how Josh has grown the company’s automation practices. Josh provides insight into how developing video games is different from business software, and how EA Games has worked to automate testing and more. Finally, Josh lets slip some of the games he’s had a hand in…

Now, plug in your headphones, turn on your gaming console and get ready to listen to the latest episode of DevOps Radio. The episode is available immediately on the CloudBees website and on iTunes. Join the conversation about the episode on Twitter by tweeting out to @CloudBees and including #DevOpsRadio in your post.


Blog Categories: Developer Zone
Categories: Companies

Sending Notifications in Pipeline

Mon, 10/31/2016 - 19:46

Rather than sitting and watching Jenkins for job status, I want Jenkins to send notifications when events occur. There are Jenkins plugins for Slack, HipChat or even email among others.

Note: Something is Happening!

I think we can all agree getting notified when events occur is preferable to having to constantly monitor them just in case. I'm going to continue from where I left off in my previous post with the hermann project. I added a Jenkins Pipeline with an HTML publisher for code coverage. This week, I'd like to make Jenkins notify me when builds start and when they succeed or fail.

Setup and Configuration

First, I select targets for my notifications. For this blog post, I'll use sample targets that I control. I've created Slack and HipChat organizations called "bitwiseman", each with one member - me. And for email I'm running a Ruby SMTP server called mailcatcher, that is perfect for local testing such as this. Aside from these concessions, configuration would be much the same in a non-demo situation.

Next, I install and add server-wide configuration for the Slack, HipChat, and Email-ext plugins. Slack and HipChat use API tokens - both products have integration points on their side that generate tokens which I copy into my Jenkins configuration. Mailcatcher SMTP runs locally. I just point Jenkins at it.

Here's what the Jenkins configuration section for each of these looks like:

Slack Configuration

HipChat Configuration

Email Configuration

Original Pipeline

Now I can start adding notification steps. Just like in the previous post, I'll use the Jenkins Pipeline Snippet Generator to explore the step syntax for the notification plugins.

Here's the base pipeline before I start making changes:

stage 'Build'

node {
  // Checkout
  checkout scm

  // install required bundles
  sh 'bundle install'

  // build and run tests with coverage
  sh 'bundle exec rake build spec'

  // Archive the built artifacts
  archive (includes: 'pkg/*.gem')

  // publish html
  // snippet generator doesn't include "target:"
  publishHTML (target: [
      allowMissing: false,
      alwaysLinkToLastBuild: false,
      keepAll: true,
      reportDir: 'coverage',
      reportFiles: 'index.html',
      reportName: "RCov Report"

NOTE: This pipeline expects to be run from a Jenkinsfile in SCM. To copy and paste it directly into a Jenkins Pipeline job, replace the checkout scm step with git ''.

Job Started Notification

For the first change, I decide to add a "Job Started" notification. The snippet generator and then reformatting makes this straightforward:

node {


  /* ... existing build steps ... */

def notifyStarted() {
  // send to Slack
  slackSend (color: '#FFFF00', message: "STARTED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]' (${env.BUILD_URL})")

  // send to HipChat
  hipchatSend (color: 'YELLOW', notify: true,
      message: "STARTED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]' (${env.BUILD_URL})"

  // send to email
  emailext (
      subject: "STARTED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'",
      body: """

STARTED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]':

Check console output at "${env.JOB_NAME} [${env.BUILD_NUMBER}]"

""", recipientProviders: [[$class: 'DevelopersRecipientProvider']] ) }

Since Pipeline is a Groovy-based DSL, I can use string interpolation and variables to add the exact details I want in my notification messages. When I run this I get the following notifications:

Started Notifications

Started Email Notification

Job Successful Notification

The next logical choice is to get notifications when a job succeeds. I'll copy and paste based on the notifyStarted method for now and do some refactoring later.

node {


  /* ... existing build steps ... */


def notifyStarted() { /* .. */ }

def notifySuccessful() {
  slackSend (color: '#00FF00', message: "SUCCESSFUL: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]' (${env.BUILD_URL})")

  hipchatSend (color: 'GREEN', notify: true,
      message: "SUCCESSFUL: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]' (${env.BUILD_URL})"

  emailext (
      subject: "SUCCESSFUL: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'",
      body: """


Check console output at "${env.JOB_NAME} [${env.BUILD_NUMBER}]"

""", recipientProviders: [[$class: 'DevelopersRecipientProvider']] ) }

Again, I get notifications, as expected. This build is so fast that some of them are even on the screen at the same time:

Multiple Notifications

Job Failed Notification

Next I want to add failure notification. Here's where we really start to see the power and expressiveness of Jenkins Pipeline. A Pipeline is a Groovy script, so as we'd expect in any Groovy script, we can handle errors using try-catch blocks.

node {
  try {

    /* ... existing build steps ... */

  } catch (e) {
    currentBuild.result = "FAILED"
    throw e

def notifyStarted() { /* .. */ }

def notifySuccessful() { /* .. */ }

def notifyFailed() {
  slackSend (color: '#FF0000', message: "FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]' (${env.BUILD_URL})")

  hipchatSend (color: 'RED', notify: true,
      message: "FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]' (${env.BUILD_URL})"

  emailext (
      subject: "FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'",
      body: """

FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]':

Check console output at "${env.JOB_NAME} [${env.BUILD_NUMBER}]"

""", recipientProviders: [[$class: 'DevelopersRecipientProvider']] ) }

Failed Notifications

Code Cleanup

Lastly, now that I have it all working, I'll do some refactoring. I'll unify all the notifications in one method and move the final success/failure notification into a finally block.

stage 'Build'

node {
  try {

    /* ... existing build steps ... */

  } catch (e) {
    // If there was an exception thrown, the build failed
    currentBuild.result = "FAILED"
    throw e
  } finally {
    // Success or failure, always send notifications

def notifyBuild(String buildStatus = 'STARTED') {
  // build status of null means successful
  buildStatus =  buildStatus ?: 'SUCCESSFUL'

  // Default values
  def colorName = 'RED'
  def colorCode = '#FF0000'
  def subject = "${buildStatus}: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'"
  def summary = "${subject} (${env.BUILD_URL})"
  def details = """

STARTED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]':

Check console output at "${env.JOB_NAME} [${env.BUILD_NUMBER}]"

""" // Override default values based on build status if (buildStatus == 'STARTED') { color = 'YELLOW' colorCode = '#FFFF00' } else if (buildStatus == 'SUCCESSFUL') { color = 'GREEN' colorCode = '#00FF00' } else { color = 'RED' colorCode = '#FF0000' } // Send notifications slackSend (color: colorCode, message: summary) hipchatSend (color: color, notify: true, message: summary) emailext ( subject: subject, body: details, recipientProviders: [[$class: 'DevelopersRecipientProvider']] ) }
You Have Been Notified!

I now get notified twice per build on three different channels. I'm not sure I need to get notified this much for such a short build. However, for a longer or complex CD pipeline, I might want exactly that. If needed, I could even improve this to handle other status strings and call it as needed throughout my pipeline.

Final View of Notifications

Links Blog Categories: JenkinsDeveloper Zone
Categories: Companies

Level-Up Your Automation Game with the New DevOps Radio Episode, Featuring Joshua Nixdorf, Technical Director at Electronic Arts

Thu, 10/27/2016 - 16:54

Imagine you spent all day working on video games. Seems like a dream, right? For Joshua Nixdorf, Technical Director at EA Games, working with video games all day is his reality. However, even a company that focuses on fun entertainment can be confronted with challenges to deliver that entertainment. In fact, when EA Games started to grow development operations, that’s exactly what happened.

Josh Nixdorf started at EA Games during a time when automation and continuous integration (CI) were just starting to spread. In the past, EA Games had dozens of CI jobs, today, they have hundreds. Where the company had a handful of QA disk builds, they now have hundreds for each of the world’s regions. In order to keep up with demand and growth, the company had to expand their CI/ CD practices. The first question that comes to mind… How? Well, we’ll let Josh tell you that part.

In episode 8 of DevOps Radio, Josh and DevOps Radio host Andre Pino talk about what it’s like working for EA Game and how Josh has grown the company’s automation practices. Josh provides insight into how developing for video games is different from business software and  how EA Games has worked to automate testing and more. Finally, Josh let’s slip some of the games he’s had a hand in…

Now, plug in your headphones, turn on your gaming console, and get ready to listen to the latest episode of DevOps Radio. The episode is available now on the CloudBees website and on iTunes. Join the conversation about the episode on Twitter by tweeting out to @CloudBees and including #DevOpsRadio in your post.


Josh Nixdork Presenting at Jenkins World 2016   Josh Nixdorf Presenting at Jenkins World 2016 
Categories: Companies

Publishing HTML Reports in Pipeline

Tue, 10/25/2016 - 20:53

Most projects need more than just JUnit result reporting. Rather than writing a custom plugin for each type of report, we can use the HTML Publisher Plugin.

Let's Make this Quick

I've found a Ruby project, hermann, I'd like to build using Jenkins Pipeline. I'd also like to have the code coverage results published with each build job. I could write a plugin to publish this data, but I'm in a bit of a hurry and the build already creates an HTML report file using SimpleCov when the unit tests run.

Simple Build

I'm going to use the HTML Publisher Plugin to add the HTML-formatted code coverage report to my builds. Here's a simple pipeline for building the hermann project.

stage 'Build'

node {
  // Checkout
  checkout scm

  // install required bundles
  sh 'bundle install'

  // build and run tests with coverage
  sh 'bundle exec rake build spec'

  // Archive the built artifacts
  archive (includes: 'pkg/*.gem')

NOTE: This pipeline expects to be run from a Jenkinsfile in SCM. To copy and paste it directly into a Jenkins Pipeline job, replace the checkout scm step with git ''.

Simple enough, it builds, runs tests and archives the package.

Job Run Without Report Link

Now I just need to add the step to publish the code coverage report. I know that rake spec creates an index.html file in the coverage directory. I've already installed the HTML Publisher Plugin. How do I add the HTML publishing step to the pipeline? The plugin page doesn't say anything about it.

Snippet Generator to the Rescue

Documentation is hard to maintain and easy to miss, even more so in a system like Jenkins with hundreds of plugins with each potentially having one or more groovy fixtures to add to the Pipeline. The Pipeline Syntax "Snippet Generator" helps users navigate this jungle by providing a way to generate a code snippet for any step using provided inputs.

It offers a dynamically generated list of steps, based on the installed plugins. From that list I select the publishHTML step:

Snippet Generator Menu

Then it shows me a UI similar to the one used in job configuration. I fill in the fields, click "generate" and it shows me snippet of groovy generated from that input.

Snippet Generator Output

HTML Published

I can use that snippet directly or as a template for further customization. In this case, I'll just reformat and copy it in at the end of my pipeline. (I ran into a minor bug in the snippet generated for this plugin step. Typing error string in my search bar immediately found the bug and a workaround.)

  /* ...unchanged... */

  // Archive the built artifacts
  archive (includes: 'pkg/*.gem')

  // publish html
  // snippet generator doesn't include "target:"
  publishHTML (target: [
      allowMissing: false,
      alwaysLinkToLastBuild: false,
      keepAll: true,
      reportDir: 'coverage',
      reportFiles: 'index.html',
      reportName: "RCov Report"


When I run this new pipeline I am rewarded with an RCov Report link on left side, which I can follow to show the HTML report.

Job Run With Report Link

RCov Report

I even added the keepAll setting to let me go back and look at reports on old jobs as more come in. As I said to begin with, this is not as slick as what I could do with a custom plugin, but it is much easier and works with any static HTML.

Links Blog Categories: JenkinsDeveloper Zone
Categories: Companies

Web-scale Enterprise Jenkins using CloudBees Jenkins Platform - Private SaaS Edition

Wed, 10/19/2016 - 00:20

Jenkins World 2016 is now over and boy was it a blast! We had a huge turnout, great speakers, informative tech sessions, certifications, a great keynote by Kohsuke Kawaguchi and another by Sacha Labourey, CEO of CloudBees, Inc. For Sacha’s keynote demo, we wanted to shine the spotlight on the flagship product from CloudBees: CloudBees Jenkins Platform - Private SaaS Edition, and showcase the key capabilities of the product and how it enables enterprises to achieve CI/CD, even at extreme scale. Our goal was to set up the world’s largest enterprise Jenkins cluster using Private SaaS Edition and demonstrate how enterprises can “go from code check-in to production in under an hour.”

Setting the Stage

During the keynote demo, Sacha highlighted the following use-cases with Private SaaS Edition, showing how enterprises can go from “code check-in to production in under an hour” -

  • Set up an enterprise Jenkins cluster on AWS within minutes, using Private SaaS Edition

  • Onboard a new project team on the cluster with the click of a button (provision a new Jenkins master)

  • Provision CI jobs of an entire GitHub org and spin up agents on-demand using the power of the GitHub Organization Folder plugin

  • Showcase the auto-healing capability of Private SaaS Edition - in the event a Jenkins master or agent were to crash, the cluster should automatically provision a new master/agent without loss of data, and in a matter of minutes

Screen Shot 2016-10-11 at 10.44.41 PM.png

We also showed a live “world’s largest enterprise Jenkins cluster” using Private SaaS Edition, running real CI/CD workloads.

Screen Shot 2016-10-11 at 10.41.53 PM.png


In this blog we want to talk about how we got Private SaaS Edition to set up and manage the above cluster, and also share key lessons we learned along the way. Read on and enjoy!

“Go Big, or Go Home!”

We wanted to showcase the above use-cases on a Private SaaS Edition cluster running on AWS with 2000 Jenkins masters and 10,000 executors, at any given point in time.  

JW demo 2000 mastesr 9000 executors.png


Let’s set some context here for our readers - why 2000 masters? Is that even important?

Here’s why it makes sense - with Private SaaS Edition, enterprises can essentially spin up a Jenkins master for every single active development project. So, each project could essentially get its own CI/CD workspace - with Jenkins masters and agents to handle the project’s CI/CD workloads. Isn’t that awesome!

“It’s a Marathon, not a Sprint!”

We wanted the audience to get a visual representation in real-time of the above use cases as Sacha walked through each step. So, we needed the following -

  • A dashboard (Blue Ocean-based) to show the size and health of the cluster in real-time

  • Additional health metrics from the cluster persisted in Elasticsearch (ES), that would be displayed on the dashboard

We built a prototype fulfilling the above requirements in less than four weeks. The next step was to scale this prototype to the world’s largest enterprise Jenkins cluster. We spent the next week determining the budget and appropriate EC2 instances to run the different components.

As we got the cluster to ~1400 Jenkins masters, a couple hundred “worker” VMs and a few thousand executors, we seemed to have hit a dead-end. We were unable to scale the cluster any higher. Marathon became unresponsive thus preventing the spinning up of additional Jenkins masters or agents. After a good deal of troubleshooting we discovered (quick shout-out to Dario for confirming our suspicions on a crazy Saturday morning!) that Marathon was stuffing a lot of state data (our state data) into Zookeeper zNodes. And once that happens, Marathon gets very confused and becomes unresponsive. Fortunately, we could make a relatively minor tweak to Zookeeper’s configuration, and increase the zNode limit significantly. This fix did the trick. The pressure valve was opened and we quickly scaled the cluster to 2000 Jenkins masters

Side note: There is good news on this front. The very next release of Marathon (1.4.0) is supposed to address the limitation referenced above.

“Houston, We Have a Problem!”

Two days before Sacha’s keynote demo, disaster struck! The cluster we had so painstakingly set up was destroyed inadvertently. Here’s what happened - we had spun up a bunch of test clusters on AWS as part of this effort and in a haste to clean up these test clusters, our primary demo cluster was destroyed instead! To destroy a Private SaaS Edition cluster, you would type the following command in the CLI -

bees-pse destroy

Except this time, someone accidentally also did this -

bees-pse destroy -f

The -f here is the same as the -f in the quintessential “rm -rf /.” In other words, if you have administrative privileges, you can delete the whole cluster without any verification.

This was a disaster scenario in every sense of the word. We had to get the entire cluster up and running for rehearsals the next day. Well as it turns out, we had designed Private SaaS Edition with these types of situations in mind from the get-go -

  • In a Private SaaS Edition cluster, the $JENKINS_HOME data is continually backed up (snapshots stored in EBS)

  • Customers can recreate their cluster from a snapshot, in the event of a disaster

“Bright, Sunshiny Day!”

Two hours later after the above disaster struck, we had the cluster back up and running - 300+ VMs, 2000 Jenkins masters, 9000+ executors, Elasticsearch and our prototype monitoring system! A few hours later we were able to scale the cluster even more, and hit our goal of running 10,000 executors on this cluster. Here are some key stats from this cluster -

12 TB RAM (total cluster)

~320 EC2 instances

~2M jobs run in a given day

TBs of data in Elasticsearch



So in conclusion, if you are an enterprise looking for a turnkey solution to setup a Jenkins cluster at scale on your private cloud, do take Private SaaS Edition out for a spin and let us know your thoughts.

In the immortal words of Gordon Gecko - “Greed is good,” so maybe for next year’s CloudBees Keynote Demo at Jenkins World we can showcase the world’s largest enterprise Jenkins cluster that can span multiple regions and multiple cloud service providers, in a matter of minutes!

Finally, a quick plug for Stephen Connolly’s great blog post where he presents a blueprint to scale Jenkins to an even larger scale. We highly recommend it.

Kal Vissa, Senior Product Manager
John Pampuch, Engineering Manager


Blog Categories: Cloud PlatformJenkins
Categories: Companies

First Rolling Release Improves Pipeline in CloudBees Jenkins Platform 2.7.20

Mon, 10/17/2016 - 18:31

We are excited to announce the availability of release 2.7.20, which includes significant improvements to pipeline functionality, as well as important bug fixes. This is also the first “rolling” release, as we transition to a more frequent release model to deliver our newest functionality to users as soon as possible. In conjunction with Jenkins 2.x, this release is the first available in two channels, rolling and fixed:

  • Rolling releases are issued monthly
  • Fixed releases are issued semiannually, although key fixes will also be applied throughout each six-month period as needed.
Release Highlights Shared Pipeline Libraries Make Scripts Easier to Reuse

When you have multiple Pipeline jobs, you often want to share some parts of the Pipeline scripts between jobs to keep from repeating yourself. The new Pipeline shared libraries plugin adds re-use functionality by allowing you to create “shared library script” in SCM repositories, so that Pipeline snippets can be created once and called upon in multiple projects. Centrally managed shared Pipeline libraries reduce coding and save time.

Improved Pipeline Control with New Stage, Lock and Milestone Steps

The stage step has been used for all functionality related to the flow and visualization of builds through the Pipeline: grouping build steps into visualized stages, limiting concurrent builds, and discarding stale builds. To improve upon each of these areas, we decided to break this functionality into discrete steps rather than push more and more features into an already packed stage step.

  • Stage - the stage step controls segmentation of a Pipeline, grouping steps to provide clear, predictable boundaries for each section.
  • Lock - the lock step controls the number concurrent builds a Pipeline can run within a defined section of the Pipeline.
  • Milestone - the milestone step controls the order that concurrent builds of a Pipeline will finish, discarding older builds that will not finish before more recent builds.

As before, Stage groups steps, so they can be visually displayed in the Pipeline stage view. Lock and milestone increase control of builds, so that there is only one build in a Pipeline section and pipeline integrity is protected because older builds are automatically stopped when superseded by a newer build.

Improvements & Fixes Rolling Release Fixed Release Features & Improvements ✓   Jenkins Core upgraded to 2.7.20 (release note) ✓   Improved management of Pipeline Shared libraries, including the @Library annotation and hosting shared libraries in external SCMs. ✓   New block syntax for the stage step ✓   Fixes and performance improvements in the Pipeline Stage View, including support for the new stage step syntax. ✓   More deterministic rules for the use of @NonCPS and additional features available in Pipeline Groovy Scripts, in the use of environment variables and build parameters. ✓   Improvements in the handling of resumed jobs ✓   Better handling of parallel branches. ✓   New milestone step ✓   Checkpoint integration with the new shared library features ✓   Inclusion of the Operation Center Analytics Reporter in the Setup Wizard selection of plugins. Rolling Release Fixed Release Fixes ✓ ✓ Fixes to RBAC group membership propagation from CJOC to client masters. ✓ ✓

Fixes to the Beekeeper plugin, including:

  • No filtering is performed in the Plugin Manager when Beekeeper is enabled but the Update Center configuration is enforced by CJOC.
  • When changing Beekeeper configuration, update center is automatically refreshed.
✓ ✓ Additional fixes related to Move / Copy / Promote operations. ✓ ✓ A fix to the issue that caused Manage Jenkins page shows an error when a core upgrade was available. To avoid that current users of CJE and CJOC and are affected by this issue, no core upgrades will be offered for that specific version through CloudBees Update Centers. ✓ ✓ Improved documentation about CloudBees Assurance Program (CAP) and the Beekeeper plugin (link) Upgrading to the New Release What Release Am I On?

You can tell which release line you are running by checking the footer of your Jenkins instance:

  • Rolling releases have a version scheme with 4 numeric parts (example:
  • Fixed releases have a version scheme with 5 numeric parts (example:
  • Legacy releases have a version scheme with 2 numeric parts (example: 15.11 or 16.06)

Here are some example footers:

  • Rolling Release - > Jenkins ver. (CloudBees Jenkins XXX
  • Fixed Release -> Jenkins ver. (CloudBees Jenkins XXX
  • Legacy Release -> Jenkins ver. 1.651 (CloudBees Jenkins XXX 16.06)
  • Legacy Release -> Jenkins ver. 1.625 (CloudBees Jenkins XXX 15.11)
How to Upgrade

Review the CloudBees Jenkins Enterprise Installation guide and the CloudBees Jenkins Operations Center (CJOC) User Guide for details about upgrading, but here are the basics:

  1. Identify which CloudBees Jenkins Enterprise (CJE) release line (rolling vs. fixed) you are currently running.
  2. Visit to download the latest release for your release line (WARNING: You must be logged in to see available downloads).
  3. If you are running CloudBees Jenkins Operations Center (CJOC), you must upgrade CJOC first, because you cannot connect a new CJE instance to an older version of CJOC.
  4. Install CJP as appropriate for your environment, and start the CJP instance.
  5. If the instance needs additional input during upgrade, the setup wizard prompts for additional input when you first access the instance.
Related Knowledgebase Articles Related Documentation Technical Release Notes Blog Categories: JenkinsCompany NewsDeveloper Zone
Categories: Companies