By Apurva Dave
Like it or not, ‘FrankenOps’ is a reality
“Hey, customer X is having performance problems with the application.”
“OK, let’s take a look at some of the metrics in Graphite.”
“Hmmm, nothing obvious appears but our metrics aren’t set up to dig into a single customer. Let’s hop into Splunk and search for log errors. Well, there’s a lot, hard to know.”
“What about GitHub? Any obvious commits that would be related?”
“Lots of commits. Hard to correlate them to any errors.”
“OK, maybe I could export some of the log data, filter down our server metrics to the ones relevant to this user, and then try to relate them. If anything pops up I can do the work to crawl back through our code pushes and find a smoking gun.”
“OK, cool, I’ll check back after lunch to see if you need any help.”
Does this sound familiar? For organisations tasked with running complex distributed software and services, it probably does. Building software is only part of the issue – being able to understand what it’s doing in the wild is a make-or-break capability for teams that want to deliver a consistent experience to their users. Today, developers naturally spend a lot of time architecting the core product, but when it comes to instrumenting the application and the underlying infrastructure, it’s usually a patchwork installation of open source, custom projects, and really specific vendor products that, when put together, give visibility across the entire deployment.
Why Should You Care?
At a 10,000 foot level, there is a monumental shift going on within today’s businesses. Regardless of your industry, the software you create is becoming the core of the business. Your code (frequently delivered as SaaS to external or internal customers) is the competitive differentiation for your business, and by extension, your Dev and Ops teams who architect, build, and manage the software are now architecting and running the business.
As data output from these software systems becomes more important, it’s only natural that developers are thinking about more scalable, rational, and unified approaches to dealing with the deluge of data generated by machines and user activity. The data itself, it turns out, can be used for many different purposes:
- troubleshooting operational issues
- understanding user activity and correlation with performance
- data driven development
- smarter system testing
Realistically, there are two paths to solving this problem, so let’s address both.
- Build Your Way Out of The Problem.
One benefit of the open source big data movement is that there are plenty of ways to attack the data problem. Projects like Hadoop, Spark, and Elastic(search) present generic data frameworks that enable analytics across almost any type of data. In speaking with companies of all sizes, I’ve seen organisations that have built “data lakes” to store as much information as they possibly can.
Storage of the data isn’t really the challenge anymore, it’s really about how your organisation can really put the data to use. So building your internal analytics solution will also consist of you building:
- A reliable, scalable ingest pathway. How can you stream logs and metrics in consistently as your infrastructure grows and changes? Will every developer in the organisation be able to instrument their code and point the results to your system?
- A data analysis pathway. With most big data projects, you’re still required to program in Java or Scala in order to process the data. How scalable is that for your company? How do non-developers get at the data? Are you stuck building and maintaining non-core applications because you created a data platform?
So, while building a centralised analytics platform isn’t out of the question, it’s going to take a lot of work. Everything I’ve described so far is the basics to stand it up; also consider your appetite for ongoing maintenance and improvement.
- Deploy a data hub
A second option to the FrankenOps issue involves deploying a data hub. This is commercial technology that in itself leverages open source, but makes it simpler and faster to get to the analytics you seek. A data hub not an “application” per se, because that would be too limiting for the broad array of questions and users you’re trying to solve for, especially across your data and your data structures. A data hub architecturally looks something like this:
A data hub is a place where any and all data (metrics, logs, and events) is aggregated and organised in one place — in one big data backend — so that any question can be asked about the data by different people in the organisation simultaneously. A data hub is designed to abstract away the complexity of dealing with the ingest and storage of different data types so that organisations can deploy faster and deal with less data management overhead. Being a newer generation of technology, most can deal with streaming data, which is ideal for real time monitoring and troubleshooting.
Data hubs do more than just collect your data; they typically have a built-in analytics and visualisation layer on top of a big data backend, where the interface to this layer can be two things:
- An API that programmatically allows you to interact with raw data & analysis. This allows you to integrate the data hub into your operations, whether it’s for ingesting data and embedding analytics in other places.
- A domain specific language (DSL) that allows you to do dataflow analytics in an agile and ad hoc fashion without having to program in a low-level system language like Java or Scala. This is useful for your developers to quickly iterate on analytics, but also broadens accessibility. DevOps teams, BI teams, database administrators should all be able to use this level to manipulate data as they need to.
This approach gives developers and ops teams to “program” into their software infrastructure analytics and visualisations so they can ask any question about their products and business. When architected in the right way, your team can up-level the strategy from just “monitoring your software” to enabling a broad range of users in their organisation to interact with the data as they each see fit, whether it’s a developer, an operations person, a QA person, a product manager, a data scientist, or even the business analysts. It’s all the same data to all these different people viewed in different, customised ways!
So Which Approach is Right for You?
It’s always different strokes for different folks, but the core comes down to how much of your development resources will go against analytics on an ongoing basis. Here are a few questions that should help you find the path forward:
- Are analytics at the centre of your business? If so, you’re likely going to need to build your own infrastructure.
- Are you planning to continually invest developer resources into a custom platform? If it’s just a short-term project to get something going, a data hub will likely provide higher return-on-investment.
- Are you interested in making your data accessible to all business units within your organisation? If so, and you prefer the “build it” route, that implies that you’ve got the resources to build front-end analytics applications. For some organisations those resources are better spent on their core product itself, and offloading the ‘data accessibility’ issue to a data hub makes the most business sense.