Category Archives: IP SLA

HSRP and EIGRP: Part 2

So, back to HSRP and EIGRP. From Part 1, connectivity in the lab setup was working fine during my tests until I disconnected the Data Centers HSRP interfaces. Then we lost ping completey to our remote site. It took a while to figure out, but here’s what was happening.

On the EIGRP/VPN side, we had two VPN tunnels setup on the Remote Router – one tunnel to each of our Corporate Routers. All three routers are running EIGRP, which gave the Remote Router two routes to the Corporate network. When I disconnected the HSRP interface on Data Center Router 1 (1.1.1.2), the primary VPN tunnel went down, as well as the primary EIGRP route, and the Remote Router switched over to the secondary VPN tunnel, and EIGRP route number two. So far, so good.

However, back in the Data Center, our PC Host is still trying to reach the Remote Router through the primary VPN tunnel, and Corporate Router 1 is still forwarding traffic through EIGRP route number one, having no knowledge of EIGRP route number two (on Corporate Router 2). Packets from the PC Host are forwarded to the Remote Router on the primary VPN Tunnel (which is down), but the only available return path from the Remote Router is the secondary VPN Tunnel.  This is a problem. But what is the solution?

Like usual in Cisco-land, there are multiple solutions. After having a lengthy chat with my good buddy Google, I choose this one: use IP SLA to track the availability of both the internal and external interfaces of the Data Center Routers. Then, make the HSRP failover on our Corporate Routers occur whenever any of those interfaces went down. This way, whenever the Data Center HSRP failover occurs, our HSRP failover would also occur, thus insuring our Hosts and Remote Routers would always be communicating over the same VPN Tunnels, and the same routes.

Below is a sample config snippet for the statements to make this happen on our Corporate Router 1.

NOTE:  this IOS version is 15.0, but for the HSRP, IP SLA and tracking cmds, it’s functionally the same as 12.4.

!
!
track 1 ip sla 1 reachability
### Establishes the object tracking of IP SLA 1, which will be used by HSRP###
!
track 2 ip sla 2 reachability
### Establishes the object tracking of IP SLA 2, which will be used by HSRP###
!
!
<<<Output Omitted>>>
!
!
interface GigabitEthernet0/0
ip address 172.150.1.10 255.255.255.128
duplex auto
speed auto
standby version 2
standby 1 ip 172.150.1.1
standby 1 priority 110
standby 1 preempt
standby 1 track 1 decrement 20
###Tracks reachability of 1.1.1.2, and decreases HSRP router priority by 20 points if 1.1.1.2 goes down, triggering an HSRP failover###
standby 1 track 2 decrement 20
###Tracks reachability of 2.2.2.2, and decreases HSRP router priority by 20 points if 1.1.1.2 goes down, triggering an HSRP failover###
!
!
<<<Output Omitted>>>
!
!
ip sla 1
icmp-echo 1.1.1.2 source-ip 1.1.1.10
frequency 5
ip sla schedule 1 life forever start-time now
###Sets up the pings every 5 seconds to 1.1.1.2, using the local routers external IP as the source-ip###
ip sla 2
icmp-echo 2.2.2.2 source-ip 1.1.1.10
frequency 5
ip sla schedule 2 life forever start-time now
###Sets up the pings every 5 seconds to 2.2.2.2, using the local routers external IP as the source-ip###
!
!
No doubt this wasn’t the solution running through your mind, or you’ve a more efficient config. Either way, I’d love to hear your solutions. After all, I’m just a CCNA thrashing around in a CCNP world. However, if this is new to you like it is to me, here’s some pointers to what Cisco has say about HSRP, IP SLA and object tracking.

HSRP and EIGRP: Part 1

Been tasked with a new project: design and setup a fully redundant network for a new data-center. The old data center just doesn’t have enough uptime. It wasn’t designed right, and we suffered a major production outage a couple of months ago due a car crash. A couple cars lost control, and one of them hit the telephone pole that carries electricity to our facility. Bingo-Bongo, power outage for 12 hours. Finally (!) the powers that be agreed to move the servers to a REAL facility (thank you thank you thank you).

The Systems Administrator is working on the fully redundant systems pieces (VMware ESX 4, a SAN, tape backups and autoloader and all that goes with it), so fortunately I don’t have to sweat any of that. However, this new center will be one of the hubs in for our dual-hub, hub-and-spoke network. We’ve got something like 35 remote sites that will need two VPN tunnels to this location, and will be running EIGRP over each tunnel.

This will give the remote sites the redundancy they need should one of our data center routers go down.  Of course, we have dual routers and dual switches, and will run HSRP on our routers LAN interface. Additionally, the Data Center will run HSRP on their routers LAN interface, to protect us against a circuit/equipment failure on their side. So far, so good (in theory).

So I set about to create a simulation of this whole setup (since the routers and switches came in early), and test the failure scenarios to make sure the redundancy operated like we expected it to (lab network diagram below). And boy, how it did NOT work!

In the setup (see diagram below), I had one host (a PC) connected to a pair of switches, which in turn are connected to a pair of routers running HSRP on their LAN interface. These routers are connected to the dual handoff from the Data Center switches, which of course are connected to dual Data Center routers. The lead to the internet, and ultimately, our remote router. The remote router has a 2 VPNs, one to each of our pair of routers.

The failure test itself is pretty simple: establish the VPNs, start a continuous ping, and start disconnecting interfaces. Theoretically, we should have a few drops pings with every disconnect (and reconnect), but no more. And at no point should we lose ping altogether.

It all worked fine, till I disconnected the interfaces on the Data Center HSRP interfaces. Then we lost ping altogether, even though the Data Center routers successfully failed over. Took me a while to figure it out, but in the end, it was pretty simple.

Any guesses? I’ll post what I found and the resolution in Part 2. No doubt some of you have found better solutions than me!