Network observability tools can help enterprises more quickly identify, troubleshoot, and resolve performance issues across complex environments before they impact end-user productivity.
Today’s enterprise networks span on-premises and cloud environments, and it has become a lot harder for IT teams to maintain performance, reliability and security when some parts of the network are unknown or off-limits to traditional performance monitoring tools.
“If you cannot get visibility into all the components comprising the digital experience, everything that is between the end user clicking the mouse to the deepest part of a cloud or data center network, then you are flying blind, you are incurring a lot of risk, and you could be overspending, too,” says Mark Leary, research director for network analytics and automation at research firm IDC.
Network observability tools aim to fill those gaps. They represent an evolution from performance monitoring technologies that have long provided IT with data on how various components of their networks are performing. The difference is greater visibility and enhanced analytics that go beyond internal networks and extend to everywhere between end-user devices and service provider environments.
Observability tools take performance management tools to the next level by aggregating data from a wider range of sources—such as queue statistics, I/O devices, and error counters—analyzing the information, and providing actionable guidance on how to resolve the problem before it impacts end users. In some scenarios, the tools can automatically escalate events to streamline problem resolution.
The evolution from performance monitoring to network observability happened in part because enterprise companies embraced multi-cloud environments, and the pandemic pushed businesses to support hybrid work environments for employees and customers.
“Eighty-four percent of [network operations] teams tell me that they expect to be multi-cloud by 2024. Even with a single cloud, you are adding a lot of complexity to your environment. They are not satisfied with what they can see in the cloud with their network tools,” says Shamus McGillicuddy, vice president of research at Enterprise Management Associates (EMA). “Add to that people working from home, and enterprise IT needing to troubleshoot problems outside of the network: Is it the Wi-Fi in the house or is it the ISP connection? IT needs tools that can answer these questions.”
The market for network observability includes some longtime network and performance monitoring players as well as startups that are building their business around filling visibility gaps. Example vendors include Catchpoint, Elastic, Gigamon, Honeycomb, Kentik, Keysight Technologies, New Relic, Observe, Riverbed, Solarwinds, Splunk, and ThousandEyes (Cisco).
For IT buyers who are evaluating network observability tools, here are some key features and capabilities to expect.
Data aggregation for multiple systems and roles
Network observability tools must collect data from many sources and make sense of the data for stakeholders across multiple teams.
Traditional SNMP polling tools utilize a pull model, in which a network monitoring station reaches out to devices and pulls specific data to determine performance metrics. With newer observability technologies using telemetry data, for instance, the tools operate using a push model, like syslog, sFlow, or flow data, which involves the devices pushing data to various collectors or integrated systems. These real-time alerts being pushed to various systems will trigger different reactions depending on the application—and that won’t impact how other systems interpret the same data with their own policies.
“Everyone wants high-performing, reliable, and secure digital products or services, whether you’re in security, DevOps, site reliability, or the network. That drive for a common outcome among IT domains is pushing teams to invest in tools that can find the theme across all these data sources and provide more collaboration across teams,” says Stephen Elliot, group vice president I&O, cloud operations, and DevOps, IDC. “Your digital services and products depend upon your systems being reliable and available; it is a direct correlation to great customer service.”
Data visualization that puts metrics in context
Network observability vendors need to go beyond just collecting data; they must also enable their tools to provide data visualization capabilities for multiple stakeholders. Data from various sources might not mean much when seen isolated, but when correlated across components it could tell a different story.
“Part of the problem is that when data is scattered across multiple tools, the experience and context is lost. Network observability tools should stitch a lot together. Yes, you could see the same data cross five different tools, but how observable is it if John must mentally piece it together?” says Carlos Casanova, principal analyst, Forrester Research. “Sometimes network metrics will reveal a bad actor on the network that the security team would recognize but would not necessarily be picked up by network teams.”
Industry watchers say network observability tools not only need to present the data collected from sources such as SD-WAN gateways and IoT endpoints, for example, but they must also explain how the data will impact services when correlated with other events or incidents occurring across the network. Traditionally, network operators would be expected to troubleshoot and triage the various data points collected across tools to understand what it might mean for overall performance. Now network observability vendors promise to bubble up the potential performance impact when events from multiple sources happen simultaneously.
“You want a tool with good data visualization that provides intuitive views into what the data actually means. A lot of people have a hard time managing all the data that their apps and network stack are generating,” says Tim Yocum, director of site reliability engineering, InfluxData. “How does this data from network providers and cloud providers tell a story of general uptime to our customers? You must understand what data matters and what data doesn’t matter.”
When selecting a tool, be certain the network observability vendor can present data for various stakeholders across IT domains. Dashboards that provide meaningful information gleaned from the data to network, security, DevOps, site reliability, and other teams will be much more valuable than a tool that performs just raw data collection.
Noise management to prioritize events
The network and systems components making up a digital service also generate a lot of alerts. For IT managers, those alerts can sound like noise unless a tool can identify which among the notifications is impacting a critical or customer-facing service. Network observability tools must not only collect and visualize volumes of data, but they must also manage the noise for IT operators—shining a spotlight on the alerts that matter.
“For instance, IT managers could have 1,000 tickets open, but many of those could not be impacting what really matters to the business. Most IT managers don’t have time to figure out which alert is the real issue,” EMA’s McGillicuddy says. “Tools that can tell you when something changes – such as a BGP routing table and how that is causing a problem – will become important.”
Traffic analysis that sees inside provider networks
Traffic analysis becomes a critical capability when considering the path traffic takes leaving the internal network and traversing the internet, which could include multiple service providers and cloud providers. Network observability tools can see the full network path, identifying locations where performance degrades.
For example, leading tools can analyze how traffic is being routed across the internet and myriad service providers to not only detect performance issues but also suggest areas for optimization. Network observability tools can provide recommendations on what enterprise IT can change within the network or the SD-WAN to improve performance based on what they can control, for example. These tools should also be able to show enterprises where their traffic is going and analyze the data “in-flight” and not only at rest. For instance, BGP routing issues could stem from service provider configuration issues, and network observability tools will empower enterprises with the data to attribute the performance issue to their provider.
“We are used as the first pane of glass to understand if it is an internal or external problem. If it is an ISP issue, it will need to be escalated to the service provider,” says Angelique Medina, head of internet intelligence at ThousandEyes, a Cisco company. “Service providers might not have the motivation to act unless they have the evidence that the performance issue is stemming from a problem on their network. Network observability can provide that attribution of issues to the right party and prevent IT from spending a lot of time trying to find who is responsible.”
Automated escalation and actionable insights
With volumes of data and alerts being generated, IT managers often cannot keep up with the influx of information. Network observability tools should be able to automatically escalate the events that will most impact performance.
“Most IT managers don’t have the time to figure out if they have a real issue on their hands. They need that deduplication of alerts and automated escalation to identify and resolve the real issues,” EMA’s McGillicuddy says.
Artificial intelligence and machine learning will be critical in the level of automation enterprise IT leaders will be able to apply with network observability tools. The tools should be able to understand the data, correlate how it compares to known behaviors for this environment, and take agreed upon actions to get the right information in front of the right people. AI-driven capabilities are still emerging, but IT buyers should make sure to understand how AI and machine learning will use pattern matching of data to improve root-cause analysis and understand predictive situations. For instance, tools working in a specific environment should be able to learn normal behavior patterns, identify when incidents represent anomalous activity, and alert appropriate IT stakeholders to the potential impact on performance.
“Network observability tools should automate problem identification and escalation until the point that IT can look at it. It can do the detection, the correlation, and then the presentation of the historical data and provide suggested actions. A tremendous amount of automation can be done, and the IT manager is still in control,” says Forrester’s Casanova. “IT can automate as much as possible and leave the execution to the human.”
In addition, network observability tools should be able to provide guidance on how to fix an issue based on what the correlated data means to their business.
“When I talk to people in IT, they tell me they want a tool that allows them to take proactive actions on the network as opposed to just presenting data. All operators may not be able to glean meaning from the data. A tool that also presents what all that data means will be more valuable,” EMA’s McGillicuddy says.
The tools should also be able to go beyond just automating troubleshooting tasks and enable automation of certain workflows, McGillicuddy says. For example, if multiple alerts are pointing to the symptoms of a router flapping (when the state of a router constantly fluctuates), the network observability tool can be configured to run a workflow or execute a script that would restart the router, depending on the environment’s known fixes.
Integration among existing tools
Network observability tools should work with other network, systems, and performance monitoring tools that enterprise IT leaders have already invested in for their environments. When evaluating tools, industry watchers recommend getting a list of partners, technology support, and APIs for this type of tool in part due to the data that must be collected and correlated across systems.
“It is important for network observability vendors to ease integration with partner work; they should bring more of that capability to their customers,” IDC’s Leary says. “As an IT manager, you are likely short-staffed, and you won’t know the product as well as the vendor does. Vendors should provide that integration intelligence and best practices that can be put in place to help break down the silos across IT domains.”
Multi-cloud support and contributions from cloud providers
While network observability providers can work toward integrating with more technologies to help their customers, cloud service providers also must be willing to share information that can help paint a complete picture of applications and services as they traverse cloud networks. To get a full picture, network observability tools must also be able to see into cloud networks.
“To really get that full picture of a digital service, you are going to have to tie into certain aspects of AWS, Azure, and Google,” Forrester’s Casanova says. “IT needs to be able to stitch together the data in a way that can be quickly and easily digested, especially if there are security incidents, that incorporates the Azure cloud private data or Amazon services. IT needs near-real-time processing of data from these external sources to get that end-to-end visibility.”
Cloud providers are also realizing they cannot be a roadblock when it comes to how their services support businesses. According to IDC’s Leary, cloud vendors are providing more insights into their environments to help companies understand how they could be impacting performance.
“The cloud is driving more and more of the digital experience, and if you cannot see into those clouds or have any blindness in relation to those cloud services, you will hit that wall and the finger pointing will start.” Leary says. “Contributions by the suppliers is a much more important part of the game as we look at larger observability and IT automation projects to mitigate threats, solve problems, avoid problems, and watch out for customers.”
Shift toward open-source telemetry
Part of the integration requirements for network observability can be addressed with a standard known as OpenTelemetry. OpenTelemetry is a framework that enables tools to capture telemetry data from cloud-native software, and it looks to collect observability data from traces, metrics and logs. This type of standard support in network observability tools is becoming a must-have feature because of how important it is for tools to share data so IT can get the complete picture and identify anomalies that might impact performance.
“Increasingly customers want OpenTelemetry support in their network observability product. It is becoming a de facto standard,” IDC’s Elliot says. “OpenTelemetry understands how systems are exchanging data so that one system can look at another’s and communicate what that means to IT. This is a very hot area in which a lot of vendors are competing.”
In the big picture, there is no one-size-fits-all network observability tool for every environment. Understanding the capabilities available from providers will help IT buyers determine which tool might be the best fit for their specific environment. Ultimately, IT leaders need to be confident that they know how their applications and services are performing before a problem occurs.
“Wouldn’t it be better if you could have the information needed to prevent the problem? Avoiding problems is a much more successful strategy,” IDC’s Leary says. “Network observability tools should provide that visibility and those insights into your infrastructure so you can head off those performance problems or security threats before they directly impact customers or workers become less productive.”