Nexenta High Availability Cluster on Cisco UCS — Failover Demonstration

Nexenta High Availability Cluster on Cisco UCS — Failover Demonstration


Nexenta High Availability Cluster on Cisco UCS — Failover Demonstration

One of the lessons I learned while operating a nuclear power plant in the middle of the Pacific Ocean, underwater, was that it is important to have systems that don’t have a single point of failure. I’m sure many people in Information Technology feel the same way, and given their choice, want their systems to be highly available.

The NexentaStor HA Cluster plugin can be a bit of work to set up (see previous post), but it is definitely worth the effort. I had the good fortune to go through some validation testing with some talented Nexenta engineers using the system that I set up in the Adcap lab, so I got to learn the ins and outs of the High Availability setup. The clustered high availability demo system is circled in green in the picture below.

Adcap Network Systems Test and Development Lab

The controllers are Cisco C240 servers (on loan from Intel) and the JBOD’s are from DataOn Storage. I discussed the build and initial setup in my previous post Nexenta High Availability ZFS Storage Systems Using Cisco and DataOn. The Cisco UCS B series blade server system is circled in blue, and the Cisco Nexus 5548, 5596, and 2000 are circled in orange — they are used in the IOMeter testing of failover.

I made some improvements since the first build of the system. The 10GbE NIC’s are Intel X520’s and are operating at full speed with jumbo frames, all the hard drive bays are filled, and the sTec SSD’s and ZeusRAM are in full use.

This is one configuration of the Adcap SwiftStor product. The validation testing involved stress tests, equipment tests, and failover tests for any number of different hardware and software faults. After the validation was done, I rebuilt the cluster then took a few videos and screenshots demonstrating failover of CIFS, NFS, and iSCSI.

I’m going to skip over all the basic setup. This was covered in a previous post on Cisco Nexenta ZFS Storage System Configuration and Benchmarking.

The HA Cluster shares volumes between two controllers, but the volumes have to be created first. I found it easiest to create the volumes all on one of the controllers first, then share them out from there. If the cluster is removed, the volumes revert back to the controller on which they were active at the time, and are exported. You would just have to go to Import Volume to get them back up and running with no loss of data.

I set up two volumes. One with twenty 3 TB drives, set up into four RAIDZ1 groups, called Large_Volumes. The other a set of ten Mirrors, with two 400GB SSD’s for L2ARC (read cache) and two 8GB ZeusRAM SSD’s for Synchronous Log cache.

1 - volume creation

Next we establish the cluster. Even through it can all be done with the GUI, I had better success establishing the cluster using the command line. The command line provides a few more helpful hints and seems to have a better flow. I found that when I used the GUI I would go back and forth between the two controllers, which causes issues. I recommend using the command line to create the cluster, if only because it forces you to set up everything from one controller.

As shown below, the Mirrors are shared out on a Virtual IP address (VIP) of 10.124.12.204, and is currently managed by the C240-HA-A controller. The Large_Volumes are shared on VIP 10.124.12.214, and currently managed by the C240-HA-B controller. Heartbeats are done on both the network and the hard drive side.

2 - Cluster established

This is an Active-Active configuration where some of the volumes are shared out by one controller, and some by the other. This permits full use of the processors, memory, SAS channels, and network of both controllers, while providing High Availability in case of a hardware or software issue.

After creating the volumes and sharing them out using the HA Cluster feature, and using different IP addresses for each volume, everything else that is created is done as part of the HA system, which means the configuration changes made on one controller are also updated on the other.

Each volume is then shared out using both CIFS and NFS. A CIFS password was set, and NFS version changed to version 3 instead of the default of 4. In an enterprise setup, the CIFS would be tied into Active Directory, and permissions set up properly on both CIFS and NFS. For the demo setup, I left if wide open.

3a - Shares Created

By clicking on the individual volumes, the mount point for the CIFS and NFS shared are shown. CIFS is easy to mount from a Windows and Mac client. NFS is a pain in the butt with Windows and Mac, and I did not feel like messing around with it too much, so I set up an Ubuntu Linux Virtual Machine to test NFS.

4e - Share volume names

Accessing the CIFS from Windows was easiest by just typing in the IP address in the Windows preferred \10.124.12.204 format, entering the username of SMB and the password typed in earlier. Both the shares were available on both the IP addresses, as shown below.

5a -CIFS Share Access

So, on to the failover testing. This is a video that shows the failover of the cluster and how CIFS stays up and running during the transition of the volume from one controller to the other. During the video I use the manual failover feature of the Nexenta HA control. This has the same effect as if a heartbeat detection of a failure causes a transition of control.

There is a finite amount of time for the transfer, as can be seen by the network pings in the video. The Windows machines maintains the mapping and access to the shares during the transition, but the in-progress file transfer has to be restarted.