Monday, June 20, 2016

Using VXLAN to extend my home lab

So first off, I recently had a change of heart. While I still love Service Provider, I'm continually being pulled in for Security work. So, I've decided to stop resisting and follow the current. That said, CCIE Security (both v4 and v5) have some heavy demands. While my home server has plenty of RAM (96GB) and CPU... I ran out of disk space really fast. So naturally, I ordered some new 1TB drives to help out, but does that mean I should stop studying until the drives arrive?? No.




My workstation upstairs (server is in the garage) has 32GB of RAM, plenty of spare disk space... but how to get my VMs to appear on the same VLANs as VMs running downstairs in the garage. Let's look at the problem really quick.


So, essentially I need to get VM traffic from my host machine, over a wireless bridge, through my 5506, past my lab switch (c4948). There's a couple different ways to do this, but I kept gravitating towards VXLAN. My initial thought was to use a vswitch like vEOS to get the job done. After some light testing, I found that while it did pass traffic the performance was just absolutely awful. I need these VMs to be able to access web services, pull up GUIs from WSA, etc. So vEOS was out the window. Next up was CSR1000v. This guy would probably work... but I was out of demo licenses, and the unlicensed CSR1Kv is limited to 100kb/s. So that's a hard nope. Then it dawned on me... what about Open vSwitch? Granted, it's an openflow switch and without a controller would need some love, BUT the flows I'd need to get my lab working wouldn't be too complicated. So I gave it a go, here's the high level topology.



Essentially on the firewall I just needed to add a rule permitting the eth0 interface for ovs-vtep1 to send udp/4789 to ovs-vtep2's eth0 interface. Additionally, I only needed (2) segments, so instead of working with VLANs I just allocated 2 physically interfaces to each OVS server (Ubuntu 16.04 w/ 1GB of RAM). Then end result was very impressive. I won't go in depth for installing open vswitch in Ubuntu, since it's in the repositories. However, let's look at the config to get my OVS setup up and running. I have (2) VLANs I'm concerned about, VL37 and VL36. Again, OVS isn't doing any tagging (leaving that to vsphere) but I decided to use the VLAN numbers as my vxlan VNIs to keep things simple. So, I'll share the config below then we can break it down.

ovs-vtep1

jonmajor@ovs-vtep1:~$ ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:0c:29:1d:cc:c3  
          inet addr:172.17.0.51  Bcast:172.17.0.255  Mask:255.255.255.0
{...}
!
sudo ifconfig eth1 up
sudo ifconfig eth2 up
!

sudo ovs-vsctl add-br BR1
sudo ovs-vsctl add-port BR1 eth1 -- set interface eth1 ofport_request=1
sudo ovs-vsctl add-port BR1 eth2 -- set interface eth2 ofport_request=2
sudo ovs-vsctl add-port BR1 vtep -- set interface vtep type=vxlan option:remote_ip=172.16.255.51 option:key=flow ofport_request=10
!
!

jonmajor@ovs-vtep1:~$ sudo ovs-vsctl show
e208bb8b-0adc-4d4e-b883-7819a8b63b35
    Bridge "BR1"
        Port "eth1"
            Interface "eth1"
        Port vtep
            Interface vtep
                type: vxlan
                options: {key=flow, remote_ip="172.16.255.51"}
        Port "eth2"
            Interface "eth2"
        Port "BR1"
            Interface "BR1"
                type: internal
     ovs_version: "2.5.0"



ovs-vtep2


jonmajor@ovs-vtep2:~$ ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:0c:29:a9:fb:41  
          inet addr:172.16.255.51  Bcast:172.16.255.255  Mask:255.255.255.0
{...}
!
sudo ifconfig eth1 up
sudo ifconfig eth2 up
!
sudo ovs-vsctl add-br BR1
sudo ovs-vsctl add-port BR1 eth1 -- set interface eth1 ofport_request=1
sudo ovs-vsctl add-port BR1 eth2 -- set interface eth2 ofport_request=2
sudo ovs-vsctl add-port BR1 vtep -- set interface vtep type=vxlan option:remote_ip=172.17.0.51 option:key=flow ofport_request=10
!
!

jonmajor@ovs-vtep2:~$ sudo ovs-vsctl show
c6a89e34-e0e8-41c7-b20d-fd38a262ecab
    Bridge "BR1"
        Port "eth1"
            Interface "eth1"
        Port "eth2"
            Interface "eth2"
        Port vtep
            Interface vtep
                type: vxlan
                options: {key=flow, remote_ip="172.17.0.51"}
        Port "BR1"
            Interface "BR1"
                type: internal

    ovs_version: "2.5.0"


Alright, so what just happened?! Fairly straight forward actually. The (1)st thing I did was bring up eth1 and eth2 interfaces, these are the ports facing my VMs on vtep2 and on vtep1 they're connecting to vsphere on two different port groups where they get their vlan tags. (2)nd,  I created my ovs bridge "BR1" with "ovs-vsctl add-br BR1". (3)rd, I added my ethernet interfaces and created/added a VXLAN interface called "vtep" to the bridge. Ofport_request,  I'm specifying the openflow port number. This will be handy in the next step when I define my flows. So I mapped Eth1 to ofport 1, Eth2 to ofport2, and the vxlan interface to ofport 10. Easy enough right?

So here's where you can get into trouble with OVS if you're not familiar with it. VXLAN doesn't define a control plane protocol, so each vendor does things a little differently. Cisco for example has largely been leveraging multicast for unknown unicast and broadcast traffic between vteps (there is support for BGP to exchange this information now). Open vSwitch, being a switch designed to work with openflow, doesn't have this functionality. You have to specifically tell it how to forward unicast frames, and ARP traffic. If you just let it forward anything, you'll quickly have a bridging loop. Especially if you try to add more than (2) VTEPs. So that said, we need a couple flows defined on each OVS. We need to tell OVS how to forward unicast traffic for local and remote devices, AND we need to tell it how to handle ARP traffic. So I have my flows defined in a text file on each server, and can load them in with a single command. Let's take a quick look then break it down.

ovs-vtep1

jonmajor@ovs-vtep1:~$ sudo su
!
root@ovs-vtep1:/home/jonmajor# cd ~
!
root@ovs-vtep1:~# cat wsa_lab.txt 
table=0,in_port=1,actions=set_field:37->tun_id,resubmit(,1)
table=0,in_port=2,actions=set_field:36->tun_id,resubmit(,1)
table=0,actions=resubmit(,1)
table=1,tun_id=37,dl_dst=00:0c:29:d9:5f:c4,actions=output:10
table=1,tun_id=37,dl_dst=00:1b:0c:0c:9b:bf,actions=output:1
table=1,tun_id=37,arp,nw_dst=136.1.37.100,actions=output:10
table=1,tun_id=37,arp,nw_dst=136.1.37.1,actions=output:1
table=1,tun_id=36,dl_dst=00:0c:29:db:bf:b5,actions=output:10
table=1,tun_id=36,dl_dst=00:1b:0c:0c:9b:bf,actions=output:2
table=1,tun_id=36,arp,nw_dst=172.16.20.100,actions=output:10
table=1,tun_id=36,arp,nw_dst=172.16.20.1,actions=output:2
table=1,priority=100,actions=drop
!
!

root@ovs-vtep1:~# ovs-ofctl dump-flows BR1
NXST_FLOW reply (xid=0x4):
!
root@ovs-vtep1:~# ovs-ofctl add-flows BR1 wsa_lab.txt 



ovs-vtep2

jonmajor@ovs-vtep2:~$ sudo su
!
root@ovs-vtep2:/home/jonmajor# cd ~
!
root@ovs-vtep2:~# cat wsa_lab.txt 
table=0,in_port=1,actions=set_field:37->tun_id,resubmit(,1)
table=0,in_port=2,actions=set_field:36->tun_id,resubmit(,1)
table=0,actions=resubmit(,1)
table=1,tun_id=37,dl_dst=00:0c:29:d9:5f:c4,actions=output:1
table=1,tun_id=37,dl_dst=00:1b:0c:0c:9b:bf,actions=output:10
table=1,tun_id=37,arp,nw_dst=136.1.37.100,actions=output:1
table=1,tun_id=37,arp,nw_dst=136.1.37.1,actions=output:10
table=1,tun_id=36,dl_dst=00:0c:29:db:bf:b5,actions=output:2
table=1,tun_id=36,dl_dst=00:1b:0c:0c:9b:bf,actions=output:10
table=1,tun_id=36,arp,nw_dst=172.16.20.100,actions=output:2
table=1,tun_id=36,arp,nw_dst=172.16.20.1,actions=output:10
table=1,priority=100,actions=drop
!
!
root@ovs-vtep2:~# ovs-ofctl dump-flows BR1
NXST_FLOW reply (xid=0x4):
!
root@ovs-vtep2:~# ovs-ofctl add-flows BR1 wsa_lab.txt 


First time I saw flow entries, I was hot in the face (angry). The above can look a little daunting at first, but if we take a minute to break it down it's not that bad. So, notice the first three lines of "wsa_lab.txt". Remember above where I specified which openflow ports Eth1 and Eth2 were mapped to? This is where that starts becoming important. The first line I'm saying "traffic coming in ofport 1 needs to get a tunnel id (vxlan vni) of 37. Then, after adding that id, please forward the traffic to table 1." The same logic applies for the second line, when traffic comes in ofport 2 (eth2), add tunnel id 36 and then continue processing traffic in table 1. The third rule is just a catch all for table 0, resubmitting to table 1. Now, in table 1 we're doing some interesting stuff. First I'm mapping where to forward traffic based on destination mac addresses. When the output is :1 or :2, those are my local ports eth1 and eth2. If the output is :10, remember from above that's the vxlan interface we named "vtep". In addition to mapping unicast traffic based on dest. mac address, we also need to map ARP traffic, which you'll see done above. If the arp req is for a node connected locally, forward out either port 1 or 2 depending on the dst IP. If for a remote node, forward out port 10. Notice also, for better segmentation, these flow entries are also matching on tun_id (vxlan vni). Lastly, if the traffic doesn't match any of my flow rules, drop it "table=1,priority=100,action=drop".

Finally, we actually load these flows into the vswitch with "ovs-ofctl add-flows BR1 wsa_lab.txt". I can verify that with "ovs-ofctl dump-flows BR1".

ovs-vtep1

root@ovs-vtep1:~# ovs-ofctl dump-flows BR1            
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1.456s, table=0, n_packets=0, n_bytes=0, idle_age=1, in_port=1 actions=load:0x25->NXM_NX_TUN_ID[],resubmit(,1)
 cookie=0x0, duration=1.456s, table=0, n_packets=0, n_bytes=0, idle_age=1, in_port=2 actions=load:0x24->NXM_NX_TUN_ID[],resubmit(,1)
 cookie=0x0, duration=1.456s, table=0, n_packets=0, n_bytes=0, idle_age=1, actions=resubmit(,1)
 cookie=0x0, duration=1.456s, table=1, n_packets=0, n_bytes=0, idle_age=1, tun_id=0x25,dl_dst=00:0c:29:d9:5f:c4 actions=output:10
 cookie=0x0, duration=1.455s, table=1, n_packets=0, n_bytes=0, idle_age=1, tun_id=0x25,dl_dst=00:1b:0c:0c:9b:bf actions=output:1
 cookie=0x0, duration=1.454s, table=1, n_packets=0, n_bytes=0, idle_age=1, tun_id=0x24,dl_dst=00:0c:29:db:bf:b5 actions=output:10
 cookie=0x0, duration=1.454s, table=1, n_packets=0, n_bytes=0, idle_age=1, tun_id=0x24,dl_dst=00:1b:0c:0c:9b:bf actions=output:2
 cookie=0x0, duration=1.455s, table=1, n_packets=0, n_bytes=0, idle_age=1, arp,tun_id=0x25,arp_tpa=136.1.37.100 actions=output:10
 cookie=0x0, duration=1.455s, table=1, n_packets=0, n_bytes=0, idle_age=1, arp,tun_id=0x25,arp_tpa=136.1.37.1 actions=output:1
 cookie=0x0, duration=1.454s, table=1, n_packets=0, n_bytes=0, idle_age=1, arp,tun_id=0x24,arp_tpa=172.16.20.100 actions=output:10
 cookie=0x0, duration=1.454s, table=1, n_packets=0, n_bytes=0, idle_age=1, arp,tun_id=0x24,arp_tpa=172.16.20.1 actions=output:2

 cookie=0x0, duration=1.453s, table=1, n_packets=0, n_bytes=0, idle_age=1, priority=100 actions=drop

ovs-vtep2

root@ovs-vtep2:~# ovs-ofctl dump-flows BR1             
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=2.056s, table=0, n_packets=3, n_bytes=180, idle_age=0, in_port=1 actions=load:0x25->NXM_NX_TUN_ID[],resubmit(,1)
 cookie=0x0, duration=2.056s, table=0, n_packets=1, n_bytes=161, idle_age=0, in_port=2 actions=load:0x24->NXM_NX_TUN_ID[],resubmit(,1)
 cookie=0x0, duration=2.056s, table=0, n_packets=0, n_bytes=0, idle_age=2, actions=resubmit(,1)
 cookie=0x0, duration=2.055s, table=1, n_packets=0, n_bytes=0, idle_age=2, tun_id=0x25,dl_dst=00:0c:29:d9:5f:c4 actions=output:1
 cookie=0x0, duration=2.055s, table=1, n_packets=0, n_bytes=0, idle_age=2, tun_id=0x25,dl_dst=00:1b:0c:0c:9b:bf actions=output:10
 cookie=0x0, duration=2.055s, table=1, n_packets=0, n_bytes=0, idle_age=2, tun_id=0x24,dl_dst=00:0c:29:db:bf:b5 actions=output:2
 cookie=0x0, duration=2.055s, table=1, n_packets=0, n_bytes=0, idle_age=2, tun_id=0x24,dl_dst=00:1b:0c:0c:9b:bf actions=output:10
 cookie=0x0, duration=2.055s, table=1, n_packets=0, n_bytes=0, idle_age=2, arp,tun_id=0x25,arp_tpa=136.1.37.100 actions=output:1
 cookie=0x0, duration=2.055s, table=1, n_packets=0, n_bytes=0, idle_age=2, arp,tun_id=0x25,arp_tpa=136.1.37.1 actions=output:10
 cookie=0x0, duration=2.055s, table=1, n_packets=0, n_bytes=0, idle_age=2, arp,tun_id=0x24,arp_tpa=172.16.20.100 actions=output:2
 cookie=0x0, duration=2.054s, table=1, n_packets=0, n_bytes=0, idle_age=2, arp,tun_id=0x24,arp_tpa=172.16.20.1 actions=output:10

 cookie=0x0, duration=2.054s, table=1, n_packets=4, n_bytes=341, idle_age=0, priority=100 actions=drop



From the above output, you can see each flow's number of packets matched/forwarded. Also note when you dump-flows, the tun_id is represented in hex format. A final note, MTU can be a killer if your network (like mine) can't accommodate the overhead. So, I lowered the MTU in my windows VM to 1200 bytes. I could probably get away with up to 1420 bytes, but I'm getting fantastic performance with 1200, so I'll probably just leave it as is. So what's the final result?






Works like a charm!

***Update***

So, as these things go, I found a way better way to do this. Basically, if I enable spanning-tree I can drastically decrease the amount of config needed to get this working.

ovs-vsctl set bridge BR1 stp_enable=true
!
root@ovs-vtep1:~# cat wsa_lab.txt 
table=0,in_port=1,actions=set_field:37->tun_id,resubmit(,1)
table=0,in_port=2,actions=set_field:36->tun_id,resubmit(,1)
table=0,actions=resubmit(,1)
table=1,priority=100,actions=NORMAL

So much simpler right? Now I'm running 802.1D over VXLAN thereby preventing loops. You can confirm that with the following:

root@openvswitch:~# ovs-ofctl show BR1
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000080027d2f207
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 1(eth1): addr:08:00:27:d2:f2:07
     config:     0
     state:      STP_FORWARD
     current:    1GB-FD COPPER AUTO_NEG
     advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG
     supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG
     speed: 1000 Mbps now, 1000 Mbps max
 12(vtep12): addr:0a:2d:a5:f8:a9:12
     config:     0
     state:      STP_BLOCK
     speed: 0 Mbps now, 0 Mbps max
 23(vtep23): addr:7e:2a:b3:f5:21:c2
     config:     0
     state:      STP_FORWARD
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(BR1): addr:08:00:27:d2:f2:07
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0