Americas

  • United States
aaronwoland
Contributor

Troubleshooting Cisco’s ISE without TAC

Opinion
Jun 07, 201618 mins
Cisco SystemsNetwork SecuritySecurity

Here's a look at the top troubleshooting and serviceability features in Cisco's Identity Services Engine (ISE)

data center analysis troubleshooting
Credit: Shutterstock

One thing I have been very passionate about is making secure network access deployments easier, which includes what we like to call serviceability. Serviceability is all about making a product easier to troubleshoot, easier to deploy and easier to use. Ultimately the goal is always customer success.

There is a distinct correlation between visibility and success of any NAC project. If you are blind to what’s happening, and if you can’t easily get to the information that helps figure out what’s wrong, it can be very frustrating and also gives the appearance of a poor deployment.

My goal of this post is to highlight a lot of the serviceability items Cisco has put into ISE that you may not be aware of. I’ll do my best to not only call out the feature or function that was added, but explain why it matters and what version it was added in. 

Per Endpoint Debug (ISE 1.3+)

This is one of my favorite serviceability features that added, and arguable one of the most usable. ISE is not just a single product; it is a solution with many moving parts, and each of those parts may have different logs that you or TAC may have to sift through. The Per Endpoint Debug feature was added in ISE 1.3, and it provides a single debug file for all components (RADIUS, Guest, Profiling, etc.) for a specific endpoint across it’s entire session—across the entire deployment! 

So, if an endpoint is getting profiled in the East-Coast DC and the West-Coast DC at the same time, all of that will still show up in the single, consolidated debug file. It prevents you from having to enable debug on the components themselves for all endpoints, and it focuses the debug instead. This is incredibly elegant, and it helps advanced admins and TAC engineers greatly reduce time to resolution when experiencing an issue. 

Figure 2 - Debug Endpoint Aaron T. Woland

Figure 1 – Debug Endpoint Tool

De-duplication and anomalous endpoint suppression (1.2+)

Many of you have also heard me rant about endpoint supplicants and how they behave. You may have read my post on why to use Wildcard/WildSAN certificates to alleviate the painful symptom of bad endpoint behavior. We’ve even added functionality to TEAP (RFC-7170) to help with that behavior by delivering the list of server certificates to trust down to the supplicant. I won’t rehash all that pain here; instead I will show you one of the things we did at the RADIUS server (ISE) side to help alleviate wasting log storage/scale on poorly behaving endpoints. 

Prior to ISE 1.2, every authentication request would create a 12KB log record that needed to be stored. When bad endpoint behavior is causing millions of failed authentications a day, that is storing a LOT of log data.

Beginning in ISE 1.2, ISE suppresses anomalous clients by default, only storing a single record and then logging each time that same exact record was received. This saved a tremendous amount of processing and log storage, and it provides for higher scale. 

Figure 3 - Supression Aaron T. Woland

Figure 2 – Suppression

Examining the screen shot above:

  • Detection Interval will flag misbehaving supplicants when they fail authentication more than once per interval.
  • Reporting Interval sends the alarm from the PSN to the MNT every X-Minutes.
  • Request Rejection Interval stops sending logs for repeat authentication failures for the same endpoint during the rejection interval (Suppresses the logs). Note:  A successful authentication will clear all flags.
  • Reject Requests After Detection. Once the endpoint is in the reject interval, any requests with the same Calling-Station-ID (Mac-Address), NAD (NAS-IP-Address) and Failure reason will be sent an Access-Reject, and the counter will increment by 1 + timestamp. That log is sent at the “Reporting Interval” listed above.   

Below the horizontal line, you will notice the ability to de-duplicate successful authentications.  

  • Suppress Repeated Successful. Applies the de-duplication and suppresses the logs from MnT.
  • Accounting Suppression Interval. Stops sending accounting logs for the same session during this configured interval.
  • Long Processing Step Threshold Interval. Detects and logs NAS retransmission timeouts for authentication steps that exceed this threshold. This relates to the step latency that is visible in the Authentication Detail report. 

Dashlet counters above Live Log (1.2+)

The de-duplication is a very nice and welcome change, but it did leave a few gaps to be addressed. Live Log is the first screen that one would use when troubleshooting a login problem. However, if the entries are not showing up in Live Log because they are being suppressed, it leaves the admin in a very bad position with no visibility into what’s going on.

So, we added key counters at the top of the Live Log screen to help provide visibility. You can see those counters in Figure 3 below.

Figure 4 - Key Counters in Live Log Aaron T. Woland

Figure 3 – Key Counters in Live Log

The admin would see the Repeat Counter, Misconfigured Supplicant and RADIUS drops counters continue to go up. Click on one of the counters, and you’re brought to the list of items that are making the counters increment. 

Key actions from Live-Log (1.3+)

Now you can see which endpoints are causing the counters to increment, i.e. which ones are being suppressed. When troubleshooting, you may need to bypass the suppression to ensure all logs come to the Live Log no matter what, but only for that endpoint. That way you aren’t disabling the de-duplication for the entire deployment and opening those floodgates. Instead it is applying to only the single endpoint. 

Live Log was enhanced to include the ability to bypass suppression for one hour with a right click (ISE 1.3 – 2.0) and with the Actions target icon in ISE 2.1, as seen in Figure 4.

Figure 5 - Bypass Suppression Filtering for 1 hour Aaron T. Woland

Figure 4 – Bypass Suppression Filtering for 1 hour

The ability to bypass the event suppression is not limited only to the context menu within Live Log. It also exists in the collection filters located at Administration > System > Logging > Collection Filters, as seen in Figure 5. 

Figure 6 - Collection Filters Aaron T. Woland

Figure 5 – Collection Filters (1.2+)

Live Log RegEx (1.3+)

In ISE 1.3 the ability to use negative filtering in the quick filter boxes was added.  Beyond just negative filtering, it was actually a full RegEx capability, making it much easier to find what you really need within the Live Log. Figure 6 shows an example in version 2.0 and below. Figure 7 shows the new filtering in ISE 2.1, which provides a graphical way to leverage the advanced filters.

Figure 7 - RegEx in Live Log 2.0 and Below Aaron T. Woland

Figure 6 – RegEx in Live Log 2.0 and Below

Figure 8 - Filtering in Live Log 2.1 and Above Aaron T. Woland

Figure 7 – Filtering in Live Log 2.1 and Above

Tree View for Policy Match (1.3+)

When a policy is multi-tiered, it can be somewhat complex to quickly recognize the “path” that an authentication session takes through that policy. Tree View was added to Live Log and to the reports to show the Policy Set > Authentication Protocol Rule > ID Store Rule and the Policy Set > Authorization Rule that the session followed.  This is illustrated in Figure 8.

Figure 9 - Tree View Aaron T. Woland

Figure 8 – Tree View

Active Directory diagnostics (1.3+)

In ISE 1.3, the Active Directory connector was replaced with one that could support Multi-Forest, Multi-Join, domain white lists, and much more. One of the fantastic enhancements that doesn’t get enough credit is the Diagnostic Tool.  

The built-in tool was designed to provide the ISE admin with every bit of information possible to help them diagnose problems. You may want to translate that to “provide enough detail to give the Active Directory team at your company irrefutable evidence if something may be AD’s fault and not ISE’s fault”:  

Figure10 - Active Directory Diagnostics Aaron T. Woland

Figure 9 – Active Directory Diagnostics

TCPDump from Central GUI (1.0+)

Since the first release of ISE, it was known that packet captures are tremendously important for effective troubleshooting. Instead of just including the TCPDump utility on each of the ISE Nodes, the call was made to centrally control it through the GUI.  From that centralized location, you can configure a TCPDump to happen on any interface on any node in the entire deployment and download the result to your local machine—all through the GUI, as seen in Figure 10.

Figure 11 - TCPDump Aaron T. Woland

Figure 10 – TCPDump

Detailed authentication report (1.0+)

Since version 1.0, ISE has had an incredible troubleshooting tool that has single-handedly been responsible for solving the vast majority of cases. Is it some magical portal? No. It’s just the detailed authentication report. Simply click the magnifying glass in Live Log. It provides an overview of the authentication, every detail available, and even includes every step that has occurred within ISE—from receiving the RADIUS Access-Request to the RADIUS response.

Additionally, when any step takes longer than normal, the report lists out the step latency. A senior member of TAC has been quoted as saying the inclusion of latency was the “best feature ever.” 🙂 Figure 11a shows a snippet of a detailed authentication reports, while Figure 11b shows the step latency in action.

Figure 12 - Auth Report Aaron T. Woland

Figure 11a – Detailed Authentication Report

Figure12b - Latency Aaron T. Woland

Figure 11b – Latency

Download logs from GUI (1.0+)

ISE nodes have very detailed log files in the underlying operating system. You have the ability to download those logs for any node in the deployment from the centralized GUI since 1.0. It’s been enhanced over time, but has always been there. If only it were in alphabetical order. 🙂 

Figure 13 - Download Logs Aaron T. Woland

Figure 12 – Download Logs

Portal Preview (1.3+)

When creating portals with ISE 1.3 or above, there’s a WYSIWYG portal customization page with an automatic preview of what the portal will look like in mobile size. 

Figure 14 - Portal Preview Aaron T. Woland

Figure 13 – Portal Preview

Portal test button (1.3+)

In addition to the portal preview, opens an example of your saved portal configuration that allows you to test functionality without needing to actually connect to the portal. Figure 13 shows the Portal Test Link, while Figure 14 shows 

Figure 15 - Portal Test Aaron T. Woland

Figure 14 – Portal Test

Along with all the new portal enhancements that came with ISE 1.3, there is an option on most portals (Guest, Sponsor, BYOD, MDM, My Devices) to offer “support information.” If configured, it will include information that aids a help desk if someone had issues. The information is something that end users may not know how to obtain otherwise (MAC Address, IP Address, Browser User Agent, Policy Server and Failure Codes), as seen in Figure 15a and Figure 15b. 

Figure16 - Support Information Aaron T. Woland

Figure 15a – Support Information

Figure16b - Support Information Aaron T. Woland

Figure 15b – Support Information

Time range support bundles (1.3+)

Support bundles are another lifesaver for TAC cases. They create a single encrypted bundle of files, DB exports, Configurations—basically everything TAC should need to help root cause an issue. In ISE version 1.3, the ability to bind that bundle to a specified time range was added, as seen in Figure 16. This helps keep the file size down and ensure that only relevant logs are captured as part of the bundle.

Figure17 - Time Based Support Bundle Aaron T. Woland

Figure 16 – Time Based Support Bundle

Pre-defined smart-defaults and policies (2.0+)

Even back in 1.3, some smart-defaults were added, such as pre-built Identity Source Sequences that include all Active Directory join-points and pre-defining the MAB-continue for the MAB rules. Those were a very nice addition. 

In 2.0, those pre-defined smart configurations continued. They include pre-built guest rules, pre-built defaults for BYOD registration and on-boarding, even pre-installed Native Supplicant Profiles and Certificate templates. I have personally gone from first login to ISE to fully functioning with BYOD on-boarding using TLS and certificates in fewer than 25 minutes start to finish.

Figure 17 illustrates the pre-built authorization rules for on-boarding and for accepting the EAP-TLS after the device was on-boarded.

Figure 18 - Pre-built BYOD Rules Aaron T. Woland

Figure 17 – Pre-built BYOD Rules

Figure 18 shows the pre-built authorization result. Notice it uses an ACL named “ACL_WEBAUTH_REDIRECT”. If your WLC uses a different ACL for redirection, change this value to match. 

Figure 19 - Prebuilt Authorization Profile for NSP Aaron T. Woland

Figure 18 – Prebuilt Authorization Profile for NSP

Figure 19 shows the pre-configured Native Supplicant Profile. It is pre-configured to use an SSID named “ISE.” It’s also pre-configured to use a pre-built certificate template and the built-in certificate authority.

Figure 20 - Pre-built Native Supplicant Profile Aaron T. Woland

Figure 19 – Pre-built Native Supplicant Profile

Before ISE 2.0, you would have to connect to the Internet and download the Network Supplicant Assistants (NSAs) for MAC and Windows. In version 2.0+, they are included in the install. Figure 20 shows the pre-installed NSA wizards. 

Figure 21 - Pre-installed Network Setup Assistants Aaron T. Woland

Figure 20 – Pre-installed Network Setup Assistants

ISE 1.4 and below required you to create the Client Provisioning Policies; one for each OS type. Beginning in ISE 2.0, they are pre-configured for all OS’s using the pre-installed NSA’s and the pre-configured NSPs. 🙂 Figure 21 shows these pre-built policies.

Figure 22 - Pre-built Client Provisioning Policies Aaron T. Woland

Figure 21 – Pre-built Client Provisioning Policies

Overall, the time savings for BYOD on-boarding alone is over two hours. 

Offline examination of configuration (1.3+)

I have spent most of my 20-plus year career working with routers, switches, firewalls, IDS/IPS’s, email and web security appliances, and so many more. One of the common attributes of all those devices was the ability to export the configuration and send it to someone else to review or TAC engineers to analyze, etc.  

ISE was one of the first and only times where it seemed we must access the GUI in order to see how it was configured, and there was no way to do it offline/out of band.

ISE 1.3 added the ability to export the configuration to a human-readable XML. Before you ask: no, there is no import function as of ISE 2.1. Figure 22 shows the exported policy. 

Figure 23 - Export Configuration Aaron T. Woland

Figure 22 – Export Configuration

What certificates are in use with which portals (1.4+)

There are a lot of portals hosted in an ISE environment. Portals for WebAuth or sponsorship, certificate provisioning, BYOD and more. What was missing prior to ISE 1.4 was the ability to see from a single location what portals were using a given certificate. You used to have to go into each portal one at a time to see which certificate was being used. It made troubleshooting a bit of a pain but also complicated the operationalizing of certificates in ISE.  

In ISE 1.4, the capability to easily see which portals are associated to the certificate, as seen in Figure 23.

Figure 24 - Which Portals Using Certificates Aaron T. Woland

Figure 23 – Which Portals Using Certificates

Test buttons for external connections (1.4+)

ISE uses storage repositories for numerous things. They are added the configuration in the GUI, but there wasn’t any mechanism in the GUI to see if the configuration worked.  Sometimes admins would start an ISE backup, pointing it to the configured repository for storage of the backup. The files would be collected, tarred and gziped, and then ISE would try to place that tarball onto the repository and bam—failure. Many of us got into the habit of creating the repository in the GUI and then in a separate CLI window, we would issue the “show repository” command. So in ISE 1.4, a test repository button was added to the GUI, as seen in Figure 24.

Figure 25 - Validate Repository Aaron T. Woland

Figure 24 – Validate Repository

In addition to the external repositories, ISE also connects to external services, such as the Profiling Feed Service. ISE 1.3 also adds a test button for that external feed service that ensures it is reachable and it is functioning correctly, as shown in Figure 25.

Figure26 - Validate Feed Aaron T. Woland

Figure 25 – Validate Feed

dACL Validator (1.2+)

Downloadable ACLs (dACLS) are configured centrally in ISE and then downloaded to the Cisco IOS network device through the RADIUS control plane. ISE 1.2 adds the dACL syntax validator, shown in Figure 26.

Figure 27 - dACL Syntax Check Aaron T. Woland

Figure 26 – dACL Syntax Check

Exposed all the logs from CLI (1.2+)

All logs, including those from Tail and others, have been exposed in the CLI without needing root patch, along with the ability to tail the files, etc. Show Logging Application and Show Logging System are the commands used to show the files, shown in Figure 27.

Figure 28 - Show Logging Aaron T. Woland

Figure 27 – Show Logging

Here’s an example of using tail to view the profiler.log file.

ISE20-1ek/admin# sh logging application profiler.log tail
2016-05-30 01:07:09,879 INFO   [ReProfilingEventHandler-18-thread-1][] profiler.infrastructure.probemgr.event.ReProfilingEventHandler -:::- Resuming reprofiling.
2016-05-30 01:07:09,915 INFO   [FEEDAUTODOWNLOAD][] cisco.profiler.infrastructure.notifications.FeedServiceConfigNotificationHandler -:::- Enable feed re-profiling after feed download.
2016-05-30 03:00:00,000 INFO   [Timer-5][] profiler.infrastructure.probemgr.event.EPPurgeEventHandler -:::- Send Endpoint purge event.

Support Tunnels overview (2.0+)

Root access to the the operating system is something that secure applications will try to prevent. Any escape from the limited shell and into the underlying OS is something that vendors always try to avoid.  

Back in 2007, Cisco acquired Ironport for its world-class content security and centralized online reputation capabilities. As a former Ironport customer, I can tell you that one of my favorite attributes of Ironport was how the company manages its appliances. To provide the Ironport (now Cisco) TAC access to the appliance remotely, they had this wicked cool tunneling technology.  

Cisco learned a lot from the Ironport acquisition, including how great Ironport’s tunneling technology (known as: “Support Tunnels”) is. They brought that technology forward into the ASA-CX NGFW, and now also into ISE starting at version 2.0. No more root patches!  

Here’s how it works:

  1. The ISE Admin must enable the tunnel and set the tunnel key (passphrase).
  2. The ISE Admin will need to update the TAC case/engineer with the set passphrase or all is for naught.  
  3. The ISE node will SSH to a bastion host in the Cisco data center and wait.
  4. The TAC engineer will SSH into the Cisco internal tunnels server (no access to the tunnels server from outside of Cisco). 
  5. The TAC engineer must tell the tunnels server which customer and what their temporary passphrase is.
  6. The tunnels server will now sew-up the two SSH sessions, forming a single SSH tunnel between the TAC engineer and the Root OS on the ISE appliance. 

There is no longer any need for the “root patches” to be installed on customer appliances. On top of that, beginning with ISE 2.0, all installed software must now be signed by a Cisco key. So, there is a rarely used short-lived signed-application for root-patch for those special customers whose ISE nodes are not permitted to reach Cisco.com. Figure 28 illustrates the Support Tunnels flow.

Figure 29 - Support Tunnel Flow Aaron T. Woland

Figure 28 – Support Tunnel Flow

Upgrade tooling (ISE 2.0+)

One common complaint is the upgrade process. It gets better with pretty much every release. ISE 2.0 made a large step forward towards even better upgrades.

The new upgrade process automates a lot of the manual processes of the past, the downloading and validating of the upgrade files, setting the order of upgrade and more. Figure 29 shows the upgrade UI that was added in 2.0.

Figure 30 - Upgrade UI Aaron T. Woland

Figure 29 – Upgrade UI

Certificate details view, including full chain (2.0+)

Figure 31 - Certificate Details Aaron T. Woland

Figure 30 – Certificate Details

If something is wrong, the detailed view will display it for you. Figure 31 shows a certificate where the trust chain is incomplete, meaning that ISE does not have the Root and all intermediates in the chain installed in the Trusted Certificate list.

Figure 32 - Incomplete Chain Aaron T. Woland

Figure 31 – Incomplete Chain

Figure 33 - Expiring Soon Aaron T. Woland

Figure 32 – Expiring Soon

Set logging levels to default (2.0+)

A TAC engineer may sometimes ask you to modify a component logging level. Very often you can’t remember what the original level was supposed to be. You certainly don’t want to leave a component running at debug or trace level when not actively troubleshooting, right? Figure 33 shows resetting the entire node back to defaults.  Figure 34 shows resetting an individual component. 

Figure 34 - Reset Logs Entire Node Aaron T. Woland

Figure 33 – Reset Logs Entire Node

Figure 34 - Reset Logs Component Level Aaron T. Woland

Figure 34 – Reset Logs Component Level

New in ISE Version 2.1:

Support Bundle Encryption with Cisco PKI

Cisco TAC has automation tools to help quickly pull apart the support bundles, analyze them, call out things that have been to blame for other resolved TAC cases, and more. It’s truly phenomenal to see in action, trust me. One of the most common delays on TAC case time-to-resolution is the customer forgets the passphrase they used to encrypt the support bundle or they forget to tell the TAC engineer what it was. 

Starting in ISE 2.1, you can select to encrypt the bundle using Cisco PKI so that only Cisco may decrypt it. This will allow the automation tools to decrypt the bundle automatically and start analysis immediately, speeding up the time to resolution.  Figure 35 illustrates the process, while Figure 36 Illustrates the setting to use the PKI or a passphrase.

Figure 35 - Support Bundle Encryption w/ PKI Aaron T. Woland

Figure 35 – Support Bundle Encryption with PKI

Figure 35 - Support Bundle Encryption Setting Aaron T. Woland

Figure 36 – Support Bundle Encryption Setting

Secure Boot and Integrity Check

Even before public attack vectors like “Synful Knock” Cisco was already working on ensuring that all products leverage secure-boot technologies. ISE is no exception.  Beginning with ISE 2.0, no code may be used that isn’t signed. The new SNS35xx series appliances will only boot with specific signed code. ISE 2.0.1 and 2.1 are both signed with acceptable keys, and therefore can be booted on those appliances.

Per-Process/Thread Visibility

ISE 2.1 enables visibility into the Java virtual machine(s) to expose how much processor is being utilized by the individual components, as shown in Figure 37.

Figure 37 - Per Process Visibility Aaron T. Woland

Figure 37 – Per Process Visibility

Well, that does it for this blog entry. Whew! Thanks to Eugene Korneychuk and Jesse Dubois for their help. Also, this post would not be complete without thanking the entire ISE Serviceability Team.  

aaronwoland
Contributor

Aaron Woland, CCIE No. 20113, is a Principal Engineer at Cisco Systems, Inc., and works with Cisco’s Largest Customers all over the world. His primary job responsibilities include Secure Access and Identity deployments with ISE, solution enhancements, standards development, and futures. Aaron joined Cisco in 2005 and is currently a member of numerous security advisory boards, and standards body working groups.

Prior to joining Cisco, Aaron spent 12 years as a Consultant and Technical Trainer. His areas of expertise include network and host security architecture and implementation, regulatory compliance, as well as route-switch and wireless. Aaron is the author of Cisco ISE for BYOD and Secure Unified Access book (Cisco Press), and many published white papers and design guides. Aaron is a member of the Hall of Fame for Distinguished Speakers at Cisco Live, and is a security columnist for Network World where he blogs on all things related to Identity. His other certifications include: GHIC, GSEC, Certified Ethical Hacker, MCSE, VCP, CCSP, CCNP, CCDP and many other industry certifications.

The opinions expressed in this blog are those of Aaron Woland and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies, including Cisco Systems.

More from this author