Powered by Blogger.

Setting up LAG between VMWare ESXi 5 and Cisco SG300-20

In setting up my VMWare vSphere 5 home lab i want to make it as close as possible to a real world working VMWare production environment. Granted I don’t have all the big boy toys, but who does on a limited budget? With that said the first thing I wanted to get setup and tested was the LAG (Link Aggregation Group) on the CISCO SG300-20. I can confirm that I am successfully creating the LAG and upon any port failure the traffic is truly migrated to the other available port.

VMWare Home Lab Network Diagram

In order to get a better understanding of how I have things currently setup please take a look at the following VMWare Home Lab Network Diagram. I do acknowledge that my storage traffic and management traffic is not separated; this is where I skimped because I ran out of money and in the end as long as I know it goes against the fundamentals of setting up a true VMWare environment and can explain why then it’s not a big deal – but I know it’s not 100% real world. I am considering making a LAG for the QNAP TurboNAS TS-459 PRO II and excepting the fact that I wont be able to use the iTunes server but over the next several months I’ll have a better understanding of how my lab works over just a 1GB connection.

Network Diagram of my VMWare Home Lab and various other network setups.

Setting up Link Aggregation Groups (LAG) on the Cisco SG300-20

Keep in mind that I have my VLAN’s created already. You will want to get the VLAN’s setup first before setting up the LAG groups. I’m writing this post first because the settings are still fresh in my head along with all of the caveats.

Warning: If you have your ESXi Server up and running already and then join it to a LAG you will most likely lose connectivity. There are two main reasons behind this. The first reason is because you may not have the LAN’s assigned to the LAG. The second is the default setting on your vSwitch NIC teaming policy which is to “Route based on the originating virtual port ID.”

Warning: When you are setting up LAG you want to make sure that any of the ports involved in the lag ARE NOT assigned to any VLAN’s except for the default one (VLAN 1). If so you need to go through each port you are going to use and switch them all back to “Excluded” and then on VLAN 1 set it to Untagged.

Default vSwitch Setting could make you lose your connection to the ESXi 5 server when you enable LAG on the switch

On the switch you will want to log into the web interface. Click on “Port Management” and then “Link Aggregation.” Under the Link Aggregation drop down click on “LAG Management.”

Once on the LAG Management page you will first want to set the Load Balance Algorithm to “IP/MAC Address.”

Load Balance Algorithm on Cisco SG300-20 switch

With this done you can now start making your LAG’s.
Understanding how the Server physically connects to the switch

What I’ve done for my physical connections is to span my LAG’s across multiple NIC’s. In the Network Diagram (bottom right) you will see the physical NIC setups which represents both VMWare ESXi 5 servers. There are two dual port NIC’s per server (Intel PRO 1000 PT). On the NIC’s, Intel has labeled the ports A and B. For my LAG setup I used NIC 1 Port A and NIC 2 Port A as my first LAG group. These first two Ethernet connections corresponds to plugging into Gigabit Interface 1 (GE1) and 2 (GE2).
NIC 1 Port A –> GE1
NIC 2 Port A –> GE2

This method allows the the possibility of an entire NIC to die and the network will stay up. Good practice in any situation.
Creating the first Link Aggregation Group (LAG) on the Cisco SG 300-20

You should still be on the LAG Management Page. In here you will have the ability to setup 8 different LAG’s. Select LAG 1 and click the Edit button. In the new page that pops up you can provide a name to the LAG and under the Port List column select your two ports being added to the LAG and move them over to the LAG Members column. When you move the ports over to LAG members you will notice LACP will become an available option; do not check this box. LACP from the switch is not utilized by ESX.

LAG Group Setup on the Cisco SG300-20 switch

With LACP NOT checked and the correct ports assigned as LAG members click Apply. This will take you back to the LAG Management page and after a few seconds you should see your two ports become Active Members of the LAG with a Link State of Up.

Note: Once this is done you may lose connectivity to your ESXi server(s). If you need connectivity back right away you will need to pull the cables from either the switch or NIC. If you get lucky you’ll get it on the first try; if not swap the connections around. One thing you can do is start with all but one connection involved in the LAG unplugged. If you do not get connectivity back then the next step is to double check or assign VLAN’s to your LAG.
Assigning VLAN’s to your LAG on the Cisco SG300-20

The next thing to do is to assign the appropriate VLAN’s to your LAG. There are two setup’s being used and this can vary on how you have your vSwitches setup. In this example LAG 1 will be hosting traffic for all VM’s. If you look at the Network Diagram you’ll see that VM Traffic is on VLAN’s 2, 13, 14, and 15. LAG 2 is different it is only hosting my ESX Management and vMotion / Storage traffic. Each VLAN will be setup differently.

Click on VLAN Management and then “Port to VLAN.” At the top of the Port to VLAN page select the drop down box VLAN ID equals to “2” and Interface Type equals to “LAG” and click Go.

The Interface represents the LAG number and you have several options. We will want to assign “Tagged” to Interface 1 and 3. Click Apply when finished. What this means is that you are accepting VLAN 2 tagging on LAG’s 1 and 3. Repeat this step for each VLAN that will exist on LAG 1 and 3. In my example you will see I have VLAN 2, 13, 14, an 15 assigned.

LAG 2 & 4 are different and are only accepting VLAN 12 traffic. If you look inside ESX server you will also see that I am not assigning any VLAN tagging to the vSwitch VMkernel ports. This basically means you must set VLAN 12 with the Untagged option instead of the Tagged option.
LAG 1 & 3 (Repeat steps for all VLAN’s assigned)

Assign VLAN’s to LAG on the Cisco SG300-20
Lag 2 & 4

Shows Untagged traffic setup for VLAN 12 on LAG 2 and 4.
Contrasting view from ESX perspective

To understand why these selections were made you must see how the vSwitches are setup in my VMWare vSphere Home Lab. In the next screen shot you’ll see that I have two vSwtiches (vSwitch0 and vSwitch1). vSwitch0 is where my vMotion and Management traffic will reside. Because I am not tagging the traffic inside VMWare ESX I do not need to setup the tagging on the LAG; instead I need to set it to Untagged.

vSwitch1 is where all of my VM Traffic will reside and here you see that I have multiple Virtual Machine Port Groups setup each consisting of different VLAN ID’s. With having multiple VLAN’s going across LAG 1 & 3 we want to utilize the Tagged option.

vSwitch Setup for VMWare vSphere Home Lab

Most likely you have your vSwitches setup already but in order to get connectivity back to the ESXi 5 server you will need to find which port is responding. The easiest way is to unplug your connections and see which one starts responding to ping. When you’re able to log back into the ESXi 5 box using the VMware vSphere client you will need to modify the NIC teaming policy on each vSwitch. Keep in mind that each VMkernel and Virtual Machine Port group can override the vSwitch NIC teaming policy. This means the vSwitch NIC teaming policy should be set and you should also review each VMkernel and Virtual Machine Port group to find any other exceptions. Under NIC teaming you will use the following settings:
Load Balancing: Route based on IP Hash
Network Failover Detection: Link status only
Notify Switches: Yes
Failback: Yes

Also take this time to move up your other NIC’s into the “Active Adapters” section. If you only have one NIC then they are added under the “Network Adapters” tab of the vSwitch.

Use Route based IP Hash for Load Balancing on VMWare when using a LAG

Note: If you will be using iSCSI you will need to override the switch failover order and only have 1 active adapter. The other adapter in this vSwitch must be moved to “Unused Adapters.” What I did was create a StorageA and StorageB VMKernel and toggled which Adapter was active. I will cover this more in detail when I go over the iSCSI QNAP TurboNAS TS-459 PRO II setup for VMWare.
Ran into an Issue with vMotion connectivity between ESXi servers

When testing out vMotion between the two ESXi Servers I was failing right at 9% and getting a message “The VMotion failed because the ESX hosts were not able to connect over the VMotion network. Please check your VMotion network settings and physical network configuration.” I first thought it was something to do with the switch settings or even the cable and after a while of double checks I found that I didn’t set the second server to use “Route based on IP hash” for the NIC teaming. Thinking this was going to fix it right off the bat I quickly found out that I still didn’t have consistent ping responses between the ESXi servers. In the end I needed to make sure my Load Balancing was setup correctly and then reboot the server. After the server came back online I started to get responses from all of my various IP’s. Without a doubt temporarily turning on SSH on each of the ESXi servers and using vmkping comes in very handy.

So word for the wise. Once you get your LAG setup and your Load Balancing set to Route based on IP hash; do yourself a favor and just reboot.

That is pretty much it when setting up LAG in conjunction with VMWare ESXi 5.
    Blogger Comment
    Facebook Comment