In this article, we will discuss:

  1. File system monitoring using Java NIO.2
  2. Common pitfalls of the default Java library
  3. How to design a simple thread-based file system monitor
  4. How to use the above to design a reactive file system monitor using the actor model.

Note: Although all the code samples here are in Scala, it can be rewritten in simple Java too. To quickly familiarize yourself with Scala syntax, here is a very short and nice Scala cheat sheet. For a more comprehensive guide to Scala for Java programmers, consult this (not needed to follow this article).

For the absolute shortest shortcut, the following Java code…

public void foo(int x, int y) {
  int z = x + y
  if (z == 1) {
    System.out.println(x);
  } else {
    System.out.println(y);
  }
}

…is equivalent to the following Scala code:

def foo(x: Int, y: Int): Unit = {
  val z: Int = x + y
  z match {
   case 1 => println(x)
   case _ => println(y)
  }
}

All the code presented here is available under MIT license as part of the better-files library on GitHub.


Let’s say you are tasked to build a cross-platform desktop file-search engine. You quickly realize that after the initial indexing of all the files, you need to also quickly reindex any new files (or directories) that got created or updated. A naive way would be to simply rescan the entire file system every few minutes; but that would be incredibly inefficient since most operating systems expose file system notification APIs that allow the application programmer to register callbacks for changes e.g. ionotify in Linux, FSEvenets in Mac and FindFirstChangeNotification in Windows.

But now you are stuck dealing with OS-specific APIs! Thankfully, beginning Java SE 7, we have a platform independent abstraction for watching file system changes via the WatchService API. The WatchService API was developed as part of Java NIO.2, under JSR-51 and here is a “hello world” example of using it to watch a given Path:


import java.nio.file._
import java.nio.file.StandardWatchEventKinds._
import scala.collection.JavaConversions._

def watch(directory: Path): Unit = {
  // First create the service
  val service: WatchService = directory.getFileSystem.newWatchService()

  // Register the service to the path and also specify which events we want to be notified about
  directory.register(service,  ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY)

  while (true) {
    val key: WatchKey = service.take()  // Wait for this key to be signalled
    for {event <- key.pollEvents()} {
      // event.context() is the path to the file that got changed  
      event.kind() match {
        case ENTRY_CREATE => println(s"${event.context()} got created")
        case ENTRY_MODIFY => println(s"${event.context()} got modified")
        case ENTRY_DELETE => println(s"${event.context()} got deleted")        
        case _ => 
          // This can happen when OS discards or loses an event. 
          // See: http://docs.oracle.com/javase/8/docs/api/java/nio/file/StandardWatchEventKinds.html#OVERFLOW
          println(s"Unknown event $event happened at ${event.context()}")
      }
    }
    key.reset()  // Do not forget to do this!! See: http://stackoverflow.com/questions/20180547/
  }
}

Although the above is a good first attempt, it lacks in several aspects:

  1. Bad Design: The above code looks unnatural and you probably had to look it up on StackOverflow to get it right. Can we do better?
  2. Bad Design: The code does not do a very good job of handling errors. What happens when we encounter a file we could not open?
  3. Gotcha: The Java API only allows us to watch the directory for changes to its direct children; it does not recursively watch a directory for you.
  4. Gotcha: The Java API does not allow us to watch a single file, only a directory.
  5. Gotcha: Even if we resolve the aformentioned issues, the Java API does not automatically start watching a new child file or directory created under the root.
  6. Bad Design: The code as implemented above, exposes a blocking/polling, thread-based model. Can we use a better concurrency abstraction?

Let’s start with each of the above concerns.

  • A better interface: Here is what my ideal interface would look like:
abstract class FileMonitor(root: Path) {
  def start(): Unit
  def onCreate(path: Path): Unit
  def onModify(path: Path): Unit
  def onDelete(path: Path): Unit
  def stop(): Unit
}

That way, I can simply write the example code as:

val watcher = new FileMonitor(myFile) {
  override def onCreate(path: Path) = println(s"$path got created")
  override def onModify(path: Path) = println(s"$path got modified")
  override def onDelete(path: Path) = println(s"$path got deleted")
}
watcher.start()

Ok, let’s try to adapt the first example using a Java Thread so that we can expose “my ideal interface”:

trait FileMonitor {                               // My ideal interface
  val root: Path                                  // starting file
  def start(): Unit                               // start the monitor
  def onCreate(path: Path) = {}                   // on-create callback
  def onModify(path: Path) = {}                   // on-modify callback
  def onDelete(path: Path) = {}                   // on-delete callback
  def onUnknownEvent(event: WatchEvent[_]) = {}   // handle lost/discarded events
  def onException(e: Throwable) = {}              // handle errors e.g. a read error
  def stop(): Unit                                // stop the monitor
}

And here is a very basic thread-based implementation:

class ThreadFileMonitor(val root: Path) extends Thread with FileMonitor {
  setDaemon(true)        // daemonize this thread
  setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler {
    override def uncaughtException(thread: Thread, exception: Throwable) = onException(exception)
  })

  val service = root.getFileSystem.newWatchService()

  override def run() = Iterator.continually(service.take()).foreach(process)

  override def interrupt() = {
    service.close()
    super.interrupt()
  }

  override def start() = {
    watch(root)
    super.start()
  }

  protected[this] def watch(file: Path): Unit = {
    file.register(service, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY)
  }

  protected[this] def process(key: WatchKey) = {
    key.pollEvents() foreach {
      case event: WatchEvent[Path] =&gt; dispatch(event.kind(), event.context())
      case event =&gt; onUnknownEvent(event)
    }
    key.reset()
  }

  def dispatch(eventType: WatchEvent.Kind[Path], file: Path): Unit = {
    eventType match {
      case ENTRY_CREATE =&gt; onCreate(file)
      case ENTRY_MODIFY =&gt; onModify(file)
      case ENTRY_DELETE =&gt; onDelete(file)
    }
  }
}

The above looks much cleaner! Now we can watch files to our heart’s content without poring over the details of JavaDocs by simply implementing the onCreate(path), onModify(path),onDelete(path) etc.

  • Exception handling: This is already done above. onException gets called whenever we encounter an exception and the invoker can decide what to do next by implementing it.
  • Recursive watching: The Java API does not allow recursive watching of directories. We need to modify the watch(file) to recursively attach the watcher:
def watch(file: Path, recursive: Boolean = true): Unit = {
  if (Files.isDirectory(file)) {
    file.register(service, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY)
     // recursively call watch on children of this file
     if (recursive) {
       Files.list(file).iterator() foreach {f =&gt; watch(f, recursive)}
     }
  }
}
  • Watching regular files: As mentioned before, the Java API can only watch directories. One hack we can do to watch single files is to set a watcher on its parent directory and only react if the event trigerred on the file itself.

override def start() = {
  if (Files.isDirectory(root)) {
    watch(root, recursive = true)
  } else {
    watch(root.getParent, recursive = false)
  }
  super.start()
}

And, now in process(key), we make sure we react to either a directory or that file only:

def reactTo(target: Path) = Files.isDirectory(root) || (root == target)

And, we check before dispatch now:

case event: WatchEvent[Path] =&gt;
  val target = event.context()
  if (reactTo(target)) {
    dispatch(event.kind(), target)
  }
  • Auto-watching new items: The Java API, does not auto-watch any new sub-files. We can address this by attaching the watcher ourselves in process(key) when an ENTRY_CREATE event is fired:
if (reactTo(target)) {
  if (Files.isDirectory(root) && event.kind() == ENTRY_CREATE) {
    watch(root.resolve(target))
  }
  dispatch(event.kind(), target)
}

Putting it all together, we have our final FileMonitor.scala:


class ThreadFileMonitor(val root: Path) extends Thread with FileMonitor {
  setDaemon(true) // daemonize this thread
  setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler {
    override def uncaughtException(thread: Thread, exception: Throwable) = onException(exception)    
  })

  val service = root.getFileSystem.newWatchService()

  override def run() = Iterator.continually(service.take()).foreach(process)

  override def interrupt() = {
    service.close()
    super.interrupt()
  }

  override def start() = {
    if (Files.isDirectory(root)) {
      watch(root, recursive = true) 
    } else {
      watch(root.getParent, recursive = false)
    }
    super.start()
  }

  protected[this] def watch(file: Path, recursive: Boolean = true): Unit = {
    if (Files.isDirectory(file)) {
      file.register(service, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY)
      if (recursive) {
        Files.list(file).iterator() foreach {f => watch(f, recursive)}
      }  
    }
  }

  private[this] def reactTo(target: Path) = Files.isDirectory(root) || (root == target)

  protected[this] def process(key: WatchKey) = {
    key.pollEvents() foreach {
      case event: WatchEvent[Path] =>
        val target = event.context()
        if (reactTo(target)) {
          if (Files.isDirectory(root) && event.kind() == ENTRY_CREATE) {
            watch(root.resolve(target))
          }
          dispatch(event.kind(), target)
        }
      case event => onUnknownEvent(event)
    }
    key.reset()
  }

  def dispatch(eventType: WatchEvent.Kind[Path], file: Path): Unit = {
    eventType match {
      case ENTRY_CREATE => onCreate(file)
      case ENTRY_MODIFY => onModify(file)
      case ENTRY_DELETE => onDelete(file)
    }
  }
}

Now that we have addressed all the gotchas and distanced ourselves from the intricacies of the WatchService API, we are still tightly coupled to the thread-based API.

We will use the above class to expose a different concurrency model, namely, the actor model instead to design a reactive, dynamic and resilient file-system watcher using Akka. Although the construction of Akka actors is beyond the scope of this article, we will present a very simple actor that uses theThreadFileMonitor:

import java.nio.file.{Path, WatchEvent}

import akka.actor._

class FileWatcher(file: Path) extends ThreadFileMonitor(file) with Actor {
  import FileWatcher._

  // MultiMap from Events to registered callbacks
  protected[this] val callbacks = newMultiMap[Event, Callback]  

  // Override the dispatcher from ThreadFileMonitor to inform the actor of a new event
  override def dispatch(event: Event, file: Path) = self ! Message.NewEvent(event, file)  

  // Override the onException from the ThreadFileMonitor
  override def onException(exception: Throwable) = self ! Status.Failure(exception)

  // when actor starts, start the ThreadFileMonitor
  override def preStart() = super.start()   
  
  // before actor stops, stop the ThreadFileMonitor
  override def postStop() = super.interrupt()

  override def receive = {
    case Message.NewEvent(event, target) if callbacks contains event => 
       callbacks(event) foreach {f => f(event -> target)}

    case Message.RegisterCallback(events, callback) => 
       events foreach {event => callbacks.addBinding(event, callback)}

    case Message.RemoveCallback(event, callback) => 
       callbacks.removeBinding(event, callback)
  }
}

object FileWatcher {
  type Event = WatchEvent.Kind[Path]
  type Callback = PartialFunction[(Event, Path), Unit]

  sealed trait Message
  object Message {
    case class NewEvent(event: Event, file: Path) extends Message
    case class RegisterCallback(events: Seq[Event], callback: Callback) extends Message
    case class RemoveCallback(event: Event, callback: Callback) extends Message
  }
}

This allows us to dynamically register and remove callbacks to react to file system events:

// initialize the actor instance
val system = ActorSystem("mySystem") 
val watcher: ActorRef = system.actorOf(Props(new FileWatcher(Paths.get("/home/pathikrit"))))

// util to create a RegisterCallback message for the actor
def when(events: Event*)(callback: Callback): Message = {
  Message.RegisterCallback(events.distinct, callback)
}

// send the register callback message for create/modify events
watcher ! when(events = ENTRY_CREATE, ENTRY_MODIFY) {   
  case (ENTRY_CREATE, file) => println(s"$file got created")
  case (ENTRY_MODIFY, file) => println(s"$file got modified")
}

Full source: FileWatcher.scala

Java Advent CalendarThis post is part of the Java Advent Calendar, and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, Tweeting, FB, G+, etc!

 

 

Reactive File System Monitoring Using Akka Actors

| Programming Languages| 1,426 views | 3 Comments
Profile photo of voxxed
About The Author
-

3 Comments

  • Martin Ring
    Reply

    There is an issue with your code. There is no way to turn a blocking api into a non blocking API. If that was possible you would be a magician:

    `Iterator.continually(service.take()).foreach(process)` is virtually equivalent to
    `while (true) process(service.take())`. You don’t win anything and it is still blocking a thread.

  • Florian Leitner
    Reply

    While all Java file watching APIs can watch a few directories, they are also all entirely unsuited to watch a whole machine, due to the way this is implemented. Java maintains an open file handle per directory. With thousands or potentially millions of open file handles, your machine will get pretty upset after the first few thousand. You need to fall back to the OS-specific C or C++ APIs for that.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>