By 

I have spent the last five years working as a DevOps engineer/operations manager, and I have been very frustrated at times by a lot of Java applications that didn’t provide an easy way for operators to plug them inside their monitoring infrastructure.

I mean, have you tried to monitor the number of requests/sec an application running in a JBoss/Jetty/Tomcat container will handle? On one hand, you have simple tools that can easily parse JSON documents (for instance) or send HTTP requests, and on the other side, you enter the realm of JMX, which is as “simple” as the “Simple” Network Management Protocol is…

There is simply no way you can collect statistics/healthchecks from a Java application without installing at least a Java VM and deciphering the very complex settings of the JMX console.

There are very nice solutions to this (yes, I’m talking about you, Newrelic’s RPM) but they aren’t cheap. I know there are good command-line alternatives, but they all require you to have the JVM installed, and they all are based on that damn JMX protocol.

The consequences are pretty damning. I have yet to meet a Java developer who will pro-actively implement that kind of monitoring tools inside their application, or even plan it for the future.

When I’m looking at an application as an operator, my ultimate goal is usually to integrate it into some chef or deployment tool recipes (Capistrano, etc.), and in a complex platform with dozens of servers, load balancers, and so on.

So I would like this application to behave properly, and in a more than ideal world, what I would want to find would be :

  1. Operation manuals – how to setup and configure the application
  2. Start and stop scripts
  3. Monitoring entry points
  4. Healthchecks for load balancers and monitoring tools
  5. Latches for maintenance: easy way to make a web app return 503 errors on purpose (maintenance mode). For instance, to remove it from a load balancer and update its configuration while the rest of the servers are happily serving requests).

You often find 1 and 2, but I’ve rarely found any of last 3 items in that list. Have you ever seen all of that in your standard, out-of-the-box Java application?

To my knowledge, only one Java framework has most of that embedded right into its core: Dropwizard. Check it out. That fat jar idea: including everything the application needs into one big jar file, including the servlet container, and use an external configuration file in YML is a dream come true for an operator.

Enter metrics

From the same nice guys who wrote DropWizard comes a very fine library, metrics, which plugs easily into any Java application and provides annotations to add meters, gauges, histograms, etc…to any method in your application. You can also easily write simple health checks.

The library also comes with a servlet that allows anyone (so you have to take care of controlling access to that servlet in the container) to access the recorded metrics as simple JSON documents, and a simple page which will return a 500 HTTP return code if one of the registered health checks doesn’t pass.

Grails + metrics

When I set out about adding metrics to the prototype of the “perfect Java application” which I am currently working on, I found out that there is a plugin for it (yes ! Grails is the iPhone of the web frameworks,“There’s an app for that” :-))

install the yammer-metrics plugin

Add the yammer-metrics plugin to the project’s BuildConfig.groovy

plugins {
        //...
        compile ":yammer-metrics:3.0.1-2"
        //...
    }

After this, you can add @Metered and @Timed annotations to any method in your project, and metrics will start collecting profiling information about the methods you instrument…

//...
    @Metered
    @Timed
    def index(Integer max) {
      params.max = Math.min(max ?: 15, 100)
      respond DataSet.list(params), model: [dataSetInstanceCount: DataSet.count()]
    }

    def show(DataSet dataSetInstance, Integer max) {
      params.max = Math.min(max ?: 15, 100)
//...

monitoring the JVM

Monitoring the VM (particularly the Garbage Collection) process can help understand performance issues (I’ve seen applications getting a 10 x performance boost just by working on decreasing the number of times a new String instance was created, and using StringBuffer instances instead).

The metrics plugin doesn’t provide this per default, but it’s quite easy to add one of the existing metrics modules or write your own…

Add a dependency in BuildConfig.groovy

    dependencies {
      //...
      compile 'com.codahale.metrics:metrics-jvm:3.0.1'
      //...
    }

Then you just need to update BootStrap.groovy:

      // Instrument the JVM

      Metrics.getRegistry().register("jvm.buffers", new BufferPoolMetricSet(ManagementFactory.getPlatformMBeanServer()));
      Metrics.getRegistry().register("jvm.gc", new GarbageCollectorMetricSet());
      Metrics.getRegistry().register("jvm.memory", new MemoryUsageGaugeSet());
      Metrics.getRegistry().register("jvm.threads", new ThreadStatesGaugeSet());

adding healthchecks

Say you’d like to setup an alert if the application storage’s goes below a certain limit (I know there are many different ways to do that, but i really like this one because it’s integrated right in the application that needs that storage !)

 class StorageHealthCheck extends HealthCheck {

      private final File storagePath
      private final long    minimumSpace // minimum space, default is 1GB

      public static final long DEFAULT_FREE_SPACE = 10e8 / 2 // 1GB

      StorageHealthCheck(String storagePath,long minimumSpace = DEFAULT_FREE_SPACE) {
        this.storagePath = new File(storagePath)
        this.minimumSpace = minimumSpace
      }

      @Override
      public HealthCheck.Result check() throws Exception {
        if (storagePath.getFreeSpace() > minimumSpace) {
          return HealthCheck.Result.healthy()
        } else {
          return HealthCheck.Result.unhealthy("Free space below ${minimumSpace/10e8} GB !")
        }
      }
    }

Then you just need to register that new health check into metrics’ registry. Update BootStrap.groovy (or somewhere down the chain of your application initialization code):

  log.info("Setting up StorageService healthcheck")
    def minimumFreeSpace = grailsApplication.config.storage?.minimumSpace ?: StorageHealthCheck.DEFAULT_FREE_SPACE
    HealthChecks.register(StorageService.name,new StorageHealthCheck(storageBase.getPath(),minimumFreeSpace))

You’ll want to add all sorts of health checks for every services on which your application depends for working properly. Database, work queue server (what about regularly checking that a message is properly received after having been sent to a specific work queue ?).

maintenance latches

One health check that i find especially important is a very simple one.

It just checks for the presence of a file somewhere on the file system and starts to fail if the file is present.

The application is put in maintenance mode the second an operator runs touch /tmp/myApplicationMaintenance ! Very handy.

self test

We can imagine that before performing anything, an application would check its own health. But this has implications…You don’t want to perform a full-blown list of healthchecks with every requests your web application receives, and this could lead to re-entrance issues.

metrics servlet

The plugin setups a /metrics endpoint which displays a very simple webpage with a few interesting points:

  • JSON dump of every meters/timers collected
  • thread dump of your application (kill -QUIT without having to access the server)
  • healthchecks : calls every healthchecks registered and returns 200 if everything’s OK, and 500 if not.

Results, at last!

Now I can just monitor servers with curl or any scripting based monitoring tool (nagios, icinga…) without having to mess with Java on my monitoring infrastructure!

curl -v http://localhost:8090/app/metrics/metrics?pretty=true
* Adding handle: conn: 0x7fc059803a00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7fc059803a00) send_pipe: 1, recv_pipe: 0
* About to connect() to localhost port 8090 (#0)
*   Trying ::1...
* Connected to localhost (::1) port 8090 (#0)
> GET /app/metrics/metrics?pretty=true HTTP/1.1
> User-Agent: curl/7.30.0
> Host: localhost:8090
> Accept: */*
>
< HTTP/1.1 200 OK
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: application/json
< Transfer-Encoding: chunked
< Date: Sun, 28 Dec 2014 17:18:48 GMT
<
...
"meters" : {
  "RabbitConsumer.handleMessageMeter" : {
    "count" : 85,
    "m15_rate" : 0.7732454500737989,
    "m1_rate" : 0.08930519529809891,
    "m5_rate" : 0.24577069821886505,
    "mean_rate" : 0.6213198221855765,
    "units" : "events/second"
    },
    ...
    "timers" : {
      "RabbitConsumer.handleMessageTimer" : {
        "count" : 84,
        "max" : 5.312139,
        "mean" : 0.9895226904761905,
        "min" : 0.156007,
        "p50" : 0.731279,
        "p75" : 1.250077,
        "p95" : 2.82703675,
        "p98" : 5.224355500000001,
        "p99" : 5.312139,
        "p999" : 5.312139,
        "stddev" : 0.9116050657129663,
        "m15_rate" : 0.7690302373777833,
        "m1_rate" : 0.08829140595516632,
        "m5_rate" : 0.2432387618368667,
        "mean_rate" : 0.6140014208723833,
        "duration_units" : "seconds",
        "rate_units" : "calls/second"
        },
      }
...

And the most important: I can easily monitor the business-critical aspects of my application and detect when a change or deployment induced a problem in those figures.

Clearer information leads to better, informed decisions…

I think this plugin should be part of the Grails distribution. Maybe there would be licensing issues between SpringSource and Codahale, but having that right in the framework would be just plain awesome!

(JVM) Metrics + Grails = Awesomeness

| Programming Languages| 1,786 views | 4 Comments
About The Author
-

4 Comments

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>