Nexenta High Availability Cluster on Cisco UCS — Failover Demonstration
One of the lessons I learned while operating a nuclear power plant in the middle of the Pacific Ocean, underwater, was that it is important to have systems that don’t have a single point of failure. I’m sure many people in Information Technology feel the same way, and given their choice, want their systems to be highly available.
The NexentaStor HA Cluster plugin can be a bit of work to set up (see previous post), but it is definitely worth the effort. I had the good fortune to go through some validation testing with some talented Nexenta engineers using the system that I set up in the Adcap lab, so I got to learn the ins and outs of the High Availability setup. The clustered high availability demo system is circled in green in the picture below.
The controllers are Cisco C240 servers (on loan from Intel) and the JBOD’s are from DataOn Storage. I discussed the build and initial setup in my previous post Nexenta High Availability ZFS Storage Systems Using Cisco and DataOn. The Cisco UCS B series blade server system is circled in blue, and the Cisco Nexus 5548, 5596, and 2000 are circled in orange — they are used in the IOMeter testing of failover.
I made some improvements since the first build of the system. The 10GbE NIC’s are Intel X520’s and are operating at full speed with jumbo frames, all the hard drive bays are filled, and the sTec SSD’s and ZeusRAM are in full use.
This is one configuration of the Adcap SwiftStor product. The validation testing involved stress tests, equipment tests, and failover tests for any number of different hardware and software faults. After the validation was done, I rebuilt the cluster then took a few videos and screenshots demonstrating failover of CIFS, NFS, and iSCSI.
I’m going to skip over all the basic setup. This was covered in a previous post on Cisco Nexenta ZFS Storage System Configuration and Benchmarking.
The HA Cluster shares volumes between two controllers, but the volumes have to be created first. I found it easiest to create the volumes all on one of the controllers first, then share them out from there. If the cluster is removed, the volumes revert back to the controller on which they were active at the time, and are exported. You would just have to go to Import Volume to get them back up and running with no loss of data.
I set up two volumes. One with twenty 3 TB drives, set up into four RAIDZ1 groups, called Large_Volumes. The other a set of ten Mirrors, with two 400GB SSD’s for L2ARC (read cache) and two 8GB ZeusRAM SSD’s for Synchronous Log cache.
Next we establish the cluster. Even through it can all be done with the GUI, I had better success establishing the cluster using the command line. The command line provides a few more helpful hints and seems to have a better flow. I found that when I used the GUI I would go back and forth between the two controllers, which causes issues. I recommend using the command line to create the cluster, if only because it forces you to set up everything from one controller.
As shown below, the Mirrors are shared out on a Virtual IP address (VIP) of 10.124.12.204, and is currently managed by the C240-HA-A controller. The Large_Volumes are shared on VIP 10.124.12.214, and currently managed by the C240-HA-B controller. Heartbeats are done on both the network and the hard drive side.
This is an Active-Active configuration where some of the volumes are shared out by one controller, and some by the other. This permits full use of the processors, memory, SAS channels, and network of both controllers, while providing High Availability in case of a hardware or software issue.
After creating the volumes and sharing them out using the HA Cluster feature, and using different IP addresses for each volume, everything else that is created is done as part of the HA system, which means the configuration changes made on one controller are also updated on the other.
Each volume is then shared out using both CIFS and NFS. A CIFS password was set, and NFS version changed to version 3 instead of the default of 4. In an enterprise setup, the CIFS would be tied into Active Directory, and permissions set up properly on both CIFS and NFS. For the demo setup, I left if wide open.
By clicking on the individual volumes, the mount point for the CIFS and NFS shared are shown. CIFS is easy to mount from a Windows and Mac client. NFS is a pain in the butt with Windows and Mac, and I did not feel like messing around with it too much, so I set up an Ubuntu Linux Virtual Machine to test NFS.
Accessing the CIFS from Windows was easiest by just typing in the IP address in the Windows preferred \10.124.12.204 format, entering the username of SMB and the password typed in earlier. Both the shares were available on both the IP addresses, as shown below.
So, on to the failover testing. This is a video that shows the failover of the cluster and how CIFS stays up and running during the transition of the volume from one controller to the other. During the video I use the manual failover feature of the Nexenta HA control. This has the same effect as if a heartbeat detection of a failure causes a transition of control.
There is a finite amount of time for the transfer, as can be seen by the network pings in the video. The Windows machines maintains the mapping and access to the shares during the transition, but the in-progress file transfer has to be restarted.
So, I did the same thing with a linux box using NFS for a file transfer. The problem was, I set the Linux box up on the Cisco UCS, and connected it to the storage network at 10Gbps. No matter how big of a file I transferred, it would take less than 10 seconds. At some point maybe I can find a 30GB file and transfer that, or set up Linux laptop and transfer over wireless so it slows things down. Until then, take my word for it that NFS is resilient.
For a more practical demonstration of the resilience of NFS, I set up three IOMeter virtual test servers on the Cisco UCS. I set up their storage as 200GB virtual hard drives using the NFS share on the Nexenta HA cluster. Then I ran a 120 second performance test, and failed the volume over from one controller to the other in the middle of the test.
This is a picture of the results. Obviously the IOPS in the graph are an average, because there was no storage activity while the VIP was unreachable. It is nice to see that things come back up instead of the Virtual Machine puking out.
This is a video demonstrating how NFS shares stay up during the failover. It shows the Nexenta HA interface, the IOMeter interface, and a continuous ping during the testing.
Then I did the same thing with iSCSI. This was a little trickier, because when I did the scan of the iSCSI from the VMware server, it found both iSCSI targets on the Nexenta box, and the iSCSI target for the Mirros was found on the IP address for the Large Volumes, which is not what I wanted. I found that the solution for this is to set up specific mapping of the iSCSI targets to volumes on the Nexenta box.
I had first set up a LUN, or ZVol as it is known, for the Performance Mirrors
The iSCSI targets had been created when the Virtual IP addresses were created during the Cluster setup, so they were already there.
The Cluster setup had also created Target Portal Groups, providing a way to separate out the volumes and IP addresses within the iSCSI realm.
However, it was necessary for me to create a Target group that put the iSCSI target into a specific group.
At that point I could map a specific LUN to a specific iSCSI initiator.
After this, when I rescanned the iSCSI targets from the VMware virtual machine, the correct iSCSI name was matched to the correct LUN and IP address. Even though this seems a little bit complex, this system is necessary when there are multiple LUN’s, initiators, and targets.
I set up three new IOMeter virtual machines with their storage defined on the LUN’s through the iSCSI connection. Then I did the same exact test that I had done with NFS using the iSCSI connection.
This is a picture of the results. Just like on NFS, it shows that there is a loss of connectivity in the middle of the test, and then it recovers.
This is a video. It is a little longer, because I also show a little bit of the iSCSI setup on the VMware setup.
This is a really solid implementation of High Availability Clustering. The setup of the system is straightforward, the tools to use it are powerful, and the failover works well. By having each controller be the primary manager for a set of volumes a true Active-Active configuration is enabled. Both controllers are able to use the capabilities of the processors, memory, SAS controllers, and network connections to full effect.
I have four more failover videos for those of you who would like to see the actual time it takes to failover in real world cases where equipment fails or is disconnected.
In the first video I use the Cisco Integrated Management Controller to do a hard reset of the active controller.
The power cycle is done while running a continuous ping, monitoring the Nexenta High Availability Management GUI, and running both and iSCSI and NFS test using IOMeter.
In the second video I go to the test lab and pull one of the SAS cables from the active controller. The backup controller for the volume figures out that there is a problem and takes control of the cluster. The previously active controller does a controlled reboot.
The SAS cable pull is done while running a continuous ping, monitoring the Nexenta High Availability Management GUI, and running both and iSCSI and NFS test using IOMeter.
In the third video I pull first one then the other of the 10GbE network cables from the active controller. One of the cables is jammed in tight, so I have a hard time pulling it out. But after I do, the backup controller for the volume figures out that there is a problem and takes control of the cluster. The previously active controller just hangs out until the network is restored.
The network cable pull is done while running a continuous ping, monitoring the Nexenta High Availability Management GUI, and running both and iSCSI and NFS test using IOMeter.
In the fourth video I pull both of the power cables from the active controller. The backup controller for the volume figures out that there is a problem and takes control of the cluster. The previously active controller does a controlled reboot.
The power cable pull is done while running a continuous ping, monitoring the Nexenta High Availability Management GUI, and running both and iSCSI and NFS test using IOMeter. Unfortunately I was a little fast on pulling the power so the results just show what happens after failover.