Author Archives: Chad Gates

How to reset a Cisco Catalyst Express 500 switch

Ever find yourself needing to reset a Cisco Catalyst Express 500 switch to factory defaults? Even though they’re pretty bad devices and have no CLI? Yeah, me too. I needed one for it’s PoE ports.

Cisco has a support doc here walking you through the process: http://www.cisco.com/c/en/us/support/docs/switches/catalyst-express-500-series-switches/70874-ce500-factory.html
If you have access to the CE 500’s Device Manager, then you’re golden. Just use the software reset via the web interface.

If you DON’T have access to the Device Manager, read on.

As of this writing, that doc was last updated in June of 2008, and in the years that passed since, Cisco failed to update it with one important detail. If you don’t already have access to the Device Manager, You must use a Windows XP machine.

Apparently, this is due to DHCP implementation on Windows 7. On a Windows 7 machine, you’ll never get past the DHCP Discover phase in the DHCP process.

After capturing packets in Wireshark on the Windows 7 client, after the client sends a DHCP Discover packet, the Catalyst doesn’t respond with a DHCP Offer. It appeares that the CE 500 doesn’t like Windows 7’s DHCP Discover packet,  and so it never responds.

On XP, it works just fine though. Take care when noting your IP address, I got a 169.254.0.2 address via DHCP from the switch, not the 10.0.0.2 as in the Cisco doc. At first, this made me think that IP was APIPA from Windows XP, and I started all over again. On the second go-round, I noticed the default gateway of 169.254.0.1, which of course does not happen with APIPA. After that, it was all textbook.

Setting up a SANless Windows 2012 Failover Cluster for SQL 2012 AlwaysOn cluster

Recently, our Windows 2012 SQL environment, running on Windows 2012 needed to move  to Windows 2012 R2, and carry the SQL Cluster along with it. Unfortunately, there was no upgrade path for this kind of move, so we decided to backup the databases, blow everything away, and start from scratch.

Additionally, we were using a SANless setup. Without a SAN, we were using SIOS SteelEye DataKeeper to provide SAN functionality. It was mirroring a volume on our Primary SQL server to the Secondary SQL server. And on top of this we were running SQL Server 2012 as an AlwaysOn availability group.

First, we backup the databases. Next, we made a careful note of the SQL server network name (the name of the AlwaysOn cluster), so we could use the name again, and the SQL connection strings wouldn’t break.

At this point, we’re ready to re-install the OS. After this is done, we go into Active Directory Users & Computers, and delete the old Failover Cluster and SQL Cluster computer objects. If you don’t, the new clusters will try to use the old objects, but the GUID’s wont match, and you’ll end up with cluster setup problems. Also, be sure to delete the old DNS records for these as well.

For the next step, follow these two walkthroughs. They’re excellent articles to take you through the rest of the process. One thing to note before beginning. SIOS DataKeeper introduced support for Windows Server 2012 R2 in version 8.0. Version 7.x and below only support up Windows Server 2003-2012, not 2012 R2. Make sure you’ve got the right version.

Clustering SQL Server 2012 on Windows Server 2012 Step-by-Step, Part I

Clustering SQL Server 2012 on Windows Server 2012 Step-by-Step, Part II

When you’re all done, hand it over to the DBA’s and the WEB guys and go drink a well-earned beer (or two).

Tracing a Layer 2 Path on Nexus Switches

Ever been stuck trying to figure out the exact switching path that packets take through your network? Me too. Here’s how I solved the problem without fancy Layer 2 traceroute tools.

I’ve recently been working in a data center environment with Nexus 7K  and 5K switches in the core. The core is almost completely Layer 2, with most routing pushed to the distribution layer. During the first week, we ran into a problem forwarding jumbo frames. Some vlans that used jumbo frames worked fine, but one vlan simply wouldn’t work. The network team went to some effort to prove our innocence, and there was the usual veiled finger-pointing by everyone else, “I’m not saying it’s a network problem, but…”. Yeah, yeah, we know: the network is assumed guilty until proven innocent.

In my experience, troubleshooting jumbo frames begins simply. Either every device in the forwarding path allows jumbo packets, or they don’t. If just one interface in the path doesn’t allow jumbo frames, the conversation breaks. So the crucial first question is “what is the path?”

In a routed environment, this would be a no-brainer: traceroute would have your answer. But this is a Layer 2 environment. What I needed was a layer 2 traceroute tool. Turns out Cisco does offer a Layer 2 traceroute utility for IOS on both the Cisco 7600 series routers and Catalyst 3560 series switches. It’s been around since 12.2(18) and you can use either MAC address or IP address to run the trace.

However, it didn’t work in NX-OS. And I did try. Several times. Just to be sure.

So, what was left was a manual Layer 2 trace, which means manually searching through the mac address-tables. Kind of cumbersome, but still doable. It was going to be tricky though, since the core switches were using both port-channels and virtual port-channels. The command mac address-table alone was not going to cut it, as sometimes the switch would see the MAC address over a port-channel, and I’d need to know which interface in the port channel had forwarded the packet.

However, before jumping in, I needed the source and destination MAC addresses, as well as source and destination IP address (more on this later). Once the sever team provided all these, I began by finding the exact source switch and interface:

sh mac address-table | inc AAAA.AAAA.AAAA

SWITCH-A# sh mac address-table | inc AAAA.AAAA.AAAA
   VLAN     MAC Address      Type      age  Secure NTFY   Ports
---------+-----------------+--------+------+------+----+------------
* 200      AAAA.AAAA.AAAA    dynamic   10      F    F     Eth101/1/2

Once the originating switch and port was identified, I could begin looking for the path to the destination. On the source switch, I ran:

sh mac address-table | inc BBBB.BBBB.BBBB

SWITCH-A# sh mac address-table | inc BBBB.BBBB.BBBB
   VLAN     MAC Address      Type      age  Secure NTFY   Ports
---------+-----------------+--------+------+------+----+-----------
* 200      BBBB.BBBB.BBBB    dynamic   10      F    F     Po1

Guess what? The MAC was found on a port-channel. So, to find the physical interfaces included in that port-channel, I ran:

show port-channel summary

SWITCH-A# sh port-channel sum
Flags:  D - Down        P - Up in port-channel (members)
        I - Individual  H - Hot-standby (LACP only)
        s - Suspended   r - Module-removed
        S - Switched    R - Routed
        U - Up (port-channel)
        M - Not in use. Min-links not met
-------------------------------------------------------------------
Group Port-Channel  Type     Protocol  Member Ports
-------------------------------------------------------------------
1     Po1(SU)       Eth      LACP       Eth1/1(P)    Eth1/2(P)

This showed which physical interfaces each port-channel contains. With this, I looked in the CDP neighbor table to see which neighbor these interfaces connect to.

show cdp neighbor

SWITCH-A# sh cdp ne
Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge
 S - Switch, H - Host, I - IGMP, r - Repeater,
 V - VoIP-Phone, D - Remotely-Managed-Device,
 s - Supports-STP-Dispute
Device-ID Local Intrfce Hldtme Capability Platform     Port ID
SWITCH-B
          Eth1/1        125    S I s      N5K-C5548    Eth1/1
SWTICH-C
          Eth1/2        128    S I s      N5K-C5548    Eth1/2

Turns out the two physical interfaces connect to two different neighbors. Why? Virtual Port-Channel. This is where things get a little tricky.

I needed to figure out which physical interface is actually forwarding the packets, since they lead to different switches. I had no clue which command would accomplish this.  Fortunately, Cisco TAC did know.

sh port-channel load-balance forwarding-path int port-channel 1 vlan 101 src-ip 1.1.1.1 dst-ip 2.2.2.2

This command is full of options, and if you question-mark your way through it, you can tweak it a variety of different ways. Remember the source and destination IPs? This is where you’ll use them. The output shows which physical interface the packets are taking, as well as which load-balanceing algorithm the port-channel is using. In this case, it was just using source and destination IP only.

SWITCH-A# sh port-channel load-balance forwarding int port-channel 1
          vlan 140 src-ip 1.1.1.1 dst-ip 2.2.2.2
Missing params will be substituted by 0's.
Load-balance Algorithm on switch: source-dest-ip
crc8_hash: 11 Outgoing port id: Ethernet1/2
Param(s) used to calculate load-balance:
dst-ip: 2.2.2.2
src-ip: 1.1.1.1
dst-mac: 0000.0000.0000
src-mac: 0000.0000.0000

With the physical interface info, I could correlate with the CDP neighbors table, and find which neighbor to check next.

I moved to the next switch, repeated the whole process, moved to the third, ran the procedure again, moved on yet again … sigh. Eventually, the MAC address-table entry didn’t point to a CDP neighbor, but instead pointed to a single physical interface with only one MAC address in the MAC address-table.

At last. I’d found the full, one-way Layer 2 path.

At this point, it would be easy assume that the return path is symmetrical, and call it a day. But in this case, given how much everyone else had already worked on it (with no success), my hunch said the traffic followed an asymmetrical return path. So, once again into the CLI, I repeated everything until I returned to the source. Sure enough, one device on the different return path was not configured for jumbo frames – problem found. One maintenance window later, problem solved.

All told, this procedure took about an hour. But with some Layer 2 traceroute tool, it would have taken about 5 minutes. This is a great opportunity for Cisco to expand the Layer 2 traceroute to NX-OS, especially since the Nexus line goes into the core of many large networks. Maybe even some enterprising startup with mad programming skills could develop an app with a Cisco API that would spider through all these tables and display the path. No doubt the big monitoring packages like What’s UP Gold or HP OpenView or Cisco Prime already do it, but how about a scaled-down version for the rest of us?

Beyond mere engineering

Finding the layout of a network, the What and How, is straightforward, as networks (usually) lean toward order. The hard work is uncovering the historical reasons behind the design (the Why).

The History of Why doesn’t get written down. Instead, it’s common knowledge, and lives in company memory. But, the Why reveals all: why the network is the way it is, how scalable it is, if it’s flexible, and whether it’s responsive to evolving business needs.

The network is no longer just an expense; it’s enabler of business, a competitive advantage, and done right, a bridge across and between organizations. Network engineering alone is not enough. You want to be a Network Bridgebuilder.

Turn your assumptions into questions

All of us have assumptions. About everything. Sometimes they help us, and get us to good decisions faster. Other times they hurt us, because they turn out to be wrong, and we start out on the wrong foot.

In new environments where you don’t have a good grasp of the history of the situation (a new job, for example), check your assumptions. How do you do this? Turn them into questions. And use the word assume, even if some wise-cracks with the old ass-of-u-and-me joke.

“I assumed that the situation is X because of this and this and this. Is this accurate? If not, why not?”

Questions like these, asked with respect and transparency will generate conversations about  the history of the environment, which are crucial. Questions about the ‘Why’ of a design can shed light that ‘What’ and ‘How’ won’t.