One of the biggest challenges with the modern data center is that it is both distributed and fast-changing. As every year passes, a smaller percentage of computing takes place at the on-premise data center as more and more applications move to SaaS providers or were created to run on public clouds.
Cloud computing and virtualization also mean that the pace at which any application changes has also increased. Gangs of servers may be created and destroyed hourly to keep up with demand. Code may be released daily or hourly.
As I’ve pointed out in two recent stories (“How To Manage Your New Data Center: The Entire Internet” and “Debugging The Entire Public And Private Network With ThousandEyes”) CIOs need a host of new capabilities to meeting their responsibilities in this new world. Today we will look at how to monitor the health and performance of complex applications that are distributed between on-premise data centers and public clouds.
New Challenges, Old Responsibilities
Here’s something a CEO has never said to a CIO: “You know, this is a new world. SaaS, cloud computing, virtualization all mean things are moving faster than ever. I realize that the tools you have used in the past can’t keep up. So, you know what, if you can’t monitor and troubleshoot our applications for a while, I’ll understand. Just let me know when you’ve figured it all out.”
Despite the rapid migration to a world in which the entire Internet is the data center, CIOs have all the old responsibilities for keeping everything running in a reliable fashion and for fixing any problems immediately.
But for CIOs to do their jobs, they must be able to track and monitor a fast changing set of services, many of which are out of their control. Traditional data center and IT service management products are not up to the job because much of the time they require explicit modeling of services.
Boundary, a startup founded in 2011 and backed by Lightspeed Venture Partners and Scale Venture Partners, was created to meet the challenge of knowing what’s working and what’s not in an environment that never stops evolving. The traditional way to do that is to put agents on machines, which works just fine but requires installation and configuration of software. In addition, most of the established players in IT Operations Management require the computing environment to be modeled. This approach works just find as long as change doesn’t take place too often. When it does, the models get out of data and large chunks of important computing takes place in an unmanaged and unmonitored fashion.
The new school way of monitoring is to use what Boundary calls a “meter” on each machine or virtual machine that looks at network traffic. Each meter tracks communications between all of the application services that are on the machine and all those that are talking to it from the outside. Each second, the meter reports to a cloud-based repository about volume of traffic for various services, speed of round trips, lost packets, and so forth. If a new service starts talking, it is added to the model. The idea is that everything works by sending packets back and forth over the Internet. If you look at those packets, you can see what is going on.
“There is 25 to 50 times more change in IT than 5 years ago,” said Boundary’s CEO Gary Read. “For CIOs to have any chance of meeting expectations about uptime and reliability they must have a brain keeping track of what is happening and looking out for what’s going wrong. Boundary is that brain.”
In “How To Choose The Right Eyes And Ears For Cybersecurity”, I pointed out that modern security is based on the ability to have a way to monitor what is normal and then to analyze aberrations to see if they represent security threats. To make this work you need Eyes and Ears to gather data, and a brain to determine what is abnormal. You then need arms to take action and legs to handle mobile devices.
Boundary applies this idea to IT Operations. It uses the network as the eyes and ears and has a brain that can determine when things are going wrong. Like most advanced monitoring systems such as SmartSignal, Boundary can detect when things are trending in the wrong direction so that corrective action can be taken before there is an impact.
Read said that such early warnings allowed several of Boundary’s customers were able to shift away from one Amazon Web Services data center that was being affected by the weather before to another before service was disrupted.
Boundary uses statistical approaches to define what’s normal and also allows explicit rules to be set. With any such system, a balance must be struck between having too many alarms that are overwhelming and too few, meaning important events are missed.
Boundary has three core components.
- Cloud-based Analytics: Boundary has built a real-time massively scalable processing engine in the cloud that receives data from the meters and provides the ability to track, monitor, and analyze what is happening in a computing environment. Just as with other cloud offerings, Boundary customers don’t have to spend anytime scoping, procuring, deploying and managing hardware.
- Meters for Monitoring Traffic: The light-weight meters are software that can be deployed with little effort on tens to tens of thousands of machines that are either on-premise or in the cloud. These send flow data back to Boundary’s servers that analyze and compute performance metrics that get pushed to end-customer dashboards and alerts, which are updated every second. Since this flow data includes source and destination, Boundary is able to always dynamically render and update the most current application topology regardless of changes. Customers routinely use common deployment tools like Puppet and Chef to automate and scale meter deployment. Since Boundary has an open API, customers are able to used Boundary data to inform systems that automate a variety of activities (like moving compute power, restarts, pushing alerts into customer tools, etc).
- Integrations: Boundary has built a host of integrations to ingest, correlate and alert on events from a wide variety of IT tools including Splunk, New Relic, AppDynamics, Amazon EC2, Nagios, Zendesk and many others. Using the time-stamped data from Boundary, a tool or application can rewind the detailed flow data to the moment a problem appeared to see the source of an issue make a diagnosis.
Boundary is one way to build the brain the CIOs need for IT operations to meet the old responsibilities that will never go away. No doubt there will be others, but no matter how it is constructed, IT needs a new type of brain to do its job.
Follow Dan Woods on Twitter: