Monday, November 16, 2015

Let's talk about ACI - Cisco's SDN vision.


So everyone keeps saying "SDN is the future!! Network Engineers beware!!", and it's been a few years but generally my reply to that remains the same. "Meh." I'll save my general SDN opinions for another post. This post is about Cisco ACI, and all the fantastic fail that it is. In fairness, I'll start with things I like about ACI first... then I'll drag my soapbox out.

The Good!

(1) VXLAN... oh how I love thee. ACI brings you vxlan right out of the box, links between spines/leaves are IS-IS, and there are literally NO L2 links between fabric nodes. This is pretty awesome, admittedly not remotely unique to ACI, but very cool that this is the default behavior. (2) Bash shell on your network devices. The latter is pretty cool, but I'll go over the downsides soon. It was unsettling to log into a very NX-OS looking shell, but have full use of my favorite Linux commands. I want you (yes you, all 5 of the people that will read this) to know, when I started writing this blog, my intention was to make this section longer. I swear.


So, that all sounds kind of cool right? ACI must not be so bad after all!! Nope.



The Bad, and the Ugly.

I'm going to try my best to keep this section short, but I get the strong feeling I'm only going to become enraged as I type along. Any Windows users out there? It's ok, this is a safe place, I'm typing this post from the comfort of my Windows 10 workstation. Remember when Microsoft decided to take the start button away? That's how MOST of ACI feels. It feels like Cisco took my [slew of obscenities] start button away. Before ranting about what an awful mess it is to try to use REST to configure anything, let's talk about the things you just don't have currently in ACI. I'll start with the biggest one, traceroute. You heard me. You. Can not. Traceroute. 

[Stopping to allow that to sink in]

That's right, no traceroute. Sure the command is there, but it does nothing. Cisco documentation will refer you to "itraceroute", however that traceroute only shows the path within the fabric... making it useless really. I also hate that when converting a 9500 chassis to ACI, if you migrate one SUP to ACI mode... the change isn't replicated to the standby, which can lead to this just... weird flapping scenario until you figure out what happened. I know, because it [slew of obscenities] happened to me. Even better, if Cisco didn't ship your SUPs with the latest SSL certificates, your fabric will not come up. At all, OH, and you have to contact TAC to get it resolved. Again, I know... because it happened to me.

Automation, pretty much the only reason anyone thinks "Yeah, we should implement an SDN solution." Was just... painfully slow. I've used python scripts to push mass changes to non-SDN architecture at other jobs, and that was faster than trying to push changes to ACI. It was just slow. Speaking of slow, the APIC GUI is, as GUIs are, slow. Shutting/no shutting a single interface can take 2 minutes. Log into the APIC, wait. Go to Fabric, wait. Expand inventory, wait. You get it, it's a WebUI... it's freaking slow.

Which brings me to REST, because surely you're reading this thinking "C'mon there has to be a faster way to make those sort of one off changes?!" Yes, there is. REST, sort of your replacement to the ultra fast CLI that we all loved. With REST, there are a number of different ways to POST changes, I decided to use a REST plugin for Firefox. Finding documentation on using REST with ACI, turns out to be a nightmare. After digging through documentation for a period of time, I finally found out how to actually get my auth token so the APIC would allow me to POST said changes. Then I found a doc on shutting/no shutting ports via REST, which turned out to be wrong. Cisco's documentation on using REST with there own product, was wrong. Of course I didn't find this out immediately, because REST is just using HTTP so you don't get handy CLI-like errors telling you specifically where the syntax is invalid... you get HTTP error codes lol. Which is just, awful. So I found some useful information on Cisco support forms that showed the correct formatting of a REST call for that purpose, modified the information to fit my environment AAAANNNNNNDDDD.. new and different HTTP failure code. I tried at this for another 15 minutes or so, before throwing in the towel and just using the slow GUI to make my changes. One port at a time, I finally resolved the given issue I was after. Naturally the next day I had to sit through a sales call with ACI guys talking about how ACI was the best thing since sliced bread, all the while the environment is proving the be least stable we have, and took the most amount of time to deploy.




Honorable mentions for things I also hate about ACI!

- Default behavior doesn't allow for GARP tracking, or ARP flooding. VRRP fail overs were a nightmare. (Can be resolved, just annoying in the heat of tshoot).

- VLANs are localized to the switch, which is fine... but ACI also maps them seemingly arbitrarily to other VLANs. So if you're looking for where VLAN 10 is, you need to see what THAT particular switch mapped VLAN 10 to, and then you can see what ports are in the mapped VLAN. Which will change switch to switch.

- ACI NX-OS is just NX-OSish to make you feel at home until you need to actually use a show command. No tab complete, no context sensitive ANYTHING and if you want the old NX-OS back you can go into vshell, but the commands will often return empty or incomplete information.

- You can not the see "running config" of anything. Which just gets SO old. Sometimes you just want to look at how BGP is configured, or how an interface is configured. Can't do that, so your stuck with using a variety of other show commands to eventually get the information you were after in the first place.

- Contracts. Do a little reading on your own about endpoint groups (EPGs) and contracts. Implementation is sloppy, and doesn't make much sense. Having a white-list only LAN sounds cool, but even Cisco just ends up recommending that you permit any any inside a given group.


That's all I have for now, this post is already too long so I might just do a part 2 later. 

1 comment: