By Erwan Arzur

What do I like the most about Groovy?

  • It’s tightly connected to the JVM, so you can use any existing Java class in your project
  • It offers dynamic, meta-programming

Why ? let me show you a few examples of why Groovy doesn’t suck…

For my current project, i need to use the Weka toolkit to perform some text-based analysis, and while this toolkit is geared toward their own Swing user interface (yuck!), they provide an API. But even if isn’t that tightly coupled with their tools, many classes are clearly built for being called by a program with a main() method and users happy to type switches on the command line (and I am one of these happy users…). It’s not very programmer friendly, to say the least.

Proxying instances

The Weka toolkit provides a simple class called NGramTokenizer. Its function is just to find multigrams in a String.

NGramTokenizer tokenizer = new NGramTokenizer()
tokenizer.setNGramMinSize(ngramMinSize)
tokenizer.setNGramMaxSize(ngramMaxSize)
tokenizer.setDelimiters("(p{Blank}+|p{Punct})")
tokenizer.tokenize(words)

And then you can iterate over each multigram found in the text you’re working on. It provides two methods for this: hasMoreElements() and nextElement() (classical, 10 years-old JDK 1.0 way to iterate over a Collection in java)

Now, suppose you want to extract a dictionary from this text (number of different words with their counts), the naive approach would make you do this;

Map dict = [:]

  while(tokenizer.hasMoreElements()) {
    String token = tokenizer.nextElement()
    count = dict[token]
    if (count != null) {
      dict[token] = dict[token] + 1
    } else {
      dict[token] = 1
    }
  }

Nice… but not thrilling 🙂

Now, the Groovy Development Kit provides a nice method added to every instances of Collection, named countBy(), which just does what it says… exactly what’s in the piece of code above. Could it be more efficient, using a faster tree-based algorithm ? That’s something I still need to check…

I can’t use that countBy() method on my NGramTokenizer instance because it’s not a Collection instance and doesn’t inherit from it. So in the Java world, I would have to stick to that. It’s a different story in Groovy:

  def tokenizerProxy = [next: tokenizer.nextElement(),
              hasNext: tokenizer.hasMoreElements()]

  Map dict = tokenizerProxy.countBy { it }

That’s all. I define a Map instance which contains the 2 closures (next() and hasNext()) that countBy() relies on to generate the dictionnary, and use it as a proxy that mimics a Collectioninstance.

All this happens at the instance level, i don’t have to change anything at the class level and change the NGramTokenizer in any way…

There are no evident benefits, but the code is shorter and more readable – your mileage may vary, of course. But, more importantly, it relies on well-tested and proven code (countBy()), instead of my half-assed, trivial while loop…

Short-circuiting methods in unit tests

Sometimes, your code relies on services such as HTTP server, database server, RabbitMQ, etc, which aren’t available on the test server. You really want to write a unit test, but have no time to tinker with the infrastructure on the build/test server (you have a non-negligeable change of pissing-off your Operations team in the process, too ;-)). So what are your solutions?

Imagine a system that collect data from URLs, and builds a dictionary out of it… You have developed a service class that relies on an HTTP server being present to retrieve the content from a URL, and returns a Map out of the dictionary.

The service would look like this;

  String retrieveContent(URL u) throws MalformedURLException,IOException {
    // connect and collect only the text from the document
  }

  Map buildDictionnary(URL u) {
    String content
    try {
      content = retrieveContent(u)
    } catch...

    // there goes the code from the previous example
  }

A test for this might look like;

 void testAcmeDictionnary() {}
     def service = new URLService()

     Map dict = service.buildDictionnary(new URL("http://www.acme.com"))

     assert dict.keySet().size() == /// expected number of keywords
     assert dict['ACME'] == // expected number of instances of the word 'ACME'
   }

There are many problems here:

  • You may not be able to connect to http://www.acme.com
  • They might not be too happy with your tests hitting their servers regularly
  • Their public page might change and your assertions would make your test fail, meaning you having to change your test every time they edit their home page.

So what if you did this instead?:

void testAcmeDictionnary() {}
  def service = new URLService()

  // make retrieveContent() always succeed with a fixed content
  service.metaClass.retrieveContent = { URL u -> return "<HTML><BODY>....</BODY></HTML>"}

  Map dict = service.buildDictionnary(new URL("http://www.acme.com"))

  assert dict.size == /// expected number of keywords
  assert dict['ACME'] == // expected number of instances of the work 'ACME'
}

And there you go. Your test doesn’t rely on an external service you have no control of anymore.

Nice, eh ! 😉

Some Groovy Magic

| Programming Languages| 658 views | 0 Comments
About The Author
-

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>