ClusterHQ‘s Kai Davenport chats to Voxxed about the company’s signature offering – storage orchestration tool Flocker, which was one of the first independent vendor contributions to the Docker ecosystem. We also discuss what led the company to the (then) fledgling Docker technology, the growing complexity of the Docker ecosystem, and why you really need to avoid that big vat of Kool-Aid at the centre of the container explosion. Recorded at Voxxed Days Bristol 2016 (read the transcript, or scroll down for the video interview).
Voxxed: Can you explain, for the uninitiated, what your key offerings are at ClusterHQ?
Davenport: We make a tool called Flocker, which is basically a tool to manage state around a cluster of machines. This whole container explosion – which is obviously taking the world by storm – the problem is, it doesn’t really address what happens if a container writes from state to disk. And obviously at the kind of scale we’re talking of running, failures are very likely to happen.
And so, if you run a stateful process that writes to disk, and for some reason that process needs to move off that machine to another machine, something needs to know that the data was left behind. One of the mottos is, “We leave no data behind”. The container and the disk are a single atomic unit, and if the container moves, Flocker will kick in and move the data to where the container is moving to.
That’s quite a unique niche in the container space you’re occupying there.
Very much. We call ourselves the container data people for that reason. There’s the “sexy” problems of orchestration and networking – and nobody wants to think about storage until the end. So that’s what we’re taking on. We’re trying to address that problem, and have been building a solution for it for the last year and a half.
ClusterHQ was one of the first to market people with a Docker offering. How did that begin? It’s interesting that you started here in Bristol.
That’s right yeah, we started in Bristol – Luke [Marsden] our CTO, and Rob [Haswell], our VP of product, they’ve been running a hosting company for a number of years, and they were doing some very interesting technology with ZFS and VST-ENGs. And Luke basically realised that Docker was the way forward, and he was right because hey, look at the explosion that’s happened now…Huge, huge growth curve.
Luke and Rob both decided we really needed to pivot and get in the space with Docker, but how could we do so capitalising on all the work we’ve done before? And so they created Flocker as a ZFS orchestration tool. Why ZFS? Because at the time they’d done a lot of ZFS work with their previous hosting company. And so Flocker kind of grew out of a recognition that you need to move data out of a cluster in this new world of orchestration frameworks and dynamic scheduling.
Flocker started as a ZFS orchestration tool, and by June last year, we had Rackspace, OpenStack, and AWS support – and since then, we’ve added support for more than ten backend providers. Now we’re kind of like an abstraction between the container and the choice of storage that the container chooses to use. It’s storage orchestration rather than container orchestration.
Since you launched, Docker has become a platform – there’s continually rollout of new things from them, and that whole ecosystem is becoming a bit more complicated.
It’s getting very complicated. Lots of people doing lots of different tools. We have discovered some competition in the last year in the form of things like Portworx libopenstorage, and there’s some other tools growing in the same space. But we are recognised, I think, as the de-facto leaders in state management for Docker clusters.
What can we expect down the line from you guys?
Luke demoed today a tool called “dvol“, which is a very interesting take on the whole state management thing.
Imagine if you’re developer and you’ve ran your test suite and you notice a problem – and the problem is to do with the particular data you have, and you need to share that data set with your colleague. It’s been well documented and fed back to us that people will still be sticking USB sticks in their machines and walking across the office, but then what happens if the office isn’t in the same building, or indeed the same city?
dvol is a tool to manage local snapshots of your local development data, and it’s very useful because you can snapshot your local data at the point where the problem was, so you can say hey, here is the customer data base, but with the bug I’m talking about, and you can snapshot that. And then we’re building an online SaaS product where a developer will be able to push that snapshot into what we’re calling the volume hub. Anybody else around the world can pull that snapshot, and reproduce the problem.
With so many enterprises picking up Docker, there must be a huge market for this.
We see opportunities in enterprise, but also in small developer shops…I think everything’s changing. The metaphor I like is, “It’s from steam power to internal combustion”. It takes a long time to shift, but the moment that technology is discovered and people discover the gains that it brings, they leap on it.
I think we’re in the phase now where 2016 is the year of Docker being considered in production. The last year has been lots of CI and CD pipeline work – definitely lots of developers on their laptops massively simplifying their lives – but now the next phase is really big enterprises thinking, well, the cost savings come when we run that kind of density in production that containers bring. I hope that 2016 and 2017 is where the real shift happens. I think that’s what’s happening now.
Can you see any issues for Docker in the year ahead? It’s grown so quickly – is there anything in the pipeline we should be wary of?
That’s a good question. You know, they are one company with one opinion about how things should be done – but there are other people, Kubernetes for example, who have a very different opinion about networking, for instance, than Docker do, as you can see on the GitHub threads. So one danger might be that they are creating themselves a walled garden of how everything in the data centre should run. Whilst they’ve created a tool to run a process really efficiently – are they the one to create the entire ecosystem of tools around it? They obviously would want to be that.
And they’ve got the critical mass that they can essentially push the market…
Right – they’ve got the marketing leverage. One worry would be to not go full Docker Kool-Aid mode, and just use what is the tool that’s very useful, which is the containerised, and look at the other parts of your stack and choose that tools that are correct for those areas.
Are you looking at things like Kubernetes at ClusterHQ too? Obviously you guys are very Docker-centric primarily – but do you have your eye on the other containers as well?
We do indeed. We actually wrote a Kubernetes plugin for Flocker. We got Jetstack, the Bristol Kubernetes guys, to write that for us. We’ve also got a Mesos framework, which Container Solutions wrote for us. We’re very much trying to acknowledge that people may well use a whole range of different tools.
And Flocker itself isn’t a container-centric tool. You can use Flocker even if you’re not using containers. You could say, “I want to move this data set from here to there“, and it doesn’t have to be a container that consumes it. We’re in a good place in order to get Flocker into as many different frameworks and tools and setups as possible, it’s not necessarily a container or indeed Docker-centric thing.
I suppose it’s like the early days of NoSQL – in the beginning, it was MongoDB and everything else, and actually you now have the scenario where one company is using multiple NoSQL solutions to solve different problems.
It’s like trying to use a chisel for everything, when actually, hey, you might want to use the hammer for the nails and the chisel for the wood. There’s no one tool for every job…there’s lots of software patterns where you break the problem down into smaller problems, and pick the right tool for each problem.
Docker is a containeriser – it’s a fantastic way to remove dependency hell and get processes machines and run them in an isolated and reproducible manner, but maybe there are better networking tools. Maybe there’s Kubernetes, Mesosphere or other better orchestration tools, maybe Swarm – who knows?
The real point is to make a choice based on your actual use case, and not to have a sense of, because everybody is using one thing, maybe I should…
“Let’s all drink the Kool-Aid”…
Indeed. That Kool-Aid really isn’t a good thing.