Tuesday, September 30, 2014

VMware Virtual SAN (vSAN) - replacing a failed disk connected to the LSI 9271 controller.

    VMware vSAN has it's own hcl of disks and controllers. This is a subset of the vSphere hcl, and the only Cisco controller card on the list is the LSI MegaRaid 9271. This is a high performance controller, but LSI does not support running the 9271 in JBOD mode. As a result, virtual disks need to be created, SSD disks need to be marked as such, etc. I've discussed this briefly before.

    RAID 0 makes troubleshooting failed disks problematic. The disks are virtual, not physical. as a result, simply replacing the disk may not be useful; the virtual disk needs to know you replaced the physical disk. This means direct interaction with the controller.

   Many customers know that the 9271 can be controlled via the WebCLI, but that is only available at boot time. Once the server is running, one must reboot to access this tool. Fortunately Cisco and LSI have planned for this challenge.
   LSI makes a utility called StorCLI. It is available at the LSI website and also comes on the Utilities iso for UCS, found at Cisco Support.

  Once you get this iso,  you need to find the StorCLI .vib file. You could try mounting the iso to the ESXi server, but I wouldn't recommend it. Too much trouble getting ESXi to see the attached CD drive. If you can mount it anywhere else, I recommend that.

Once you get the iso mounted, go to the directory ucs-cxxx-utils-vmware.2.0.3 (1).iso\Storage\LSI\9xxx\StorCLI. There you will find the StorCLI vib file. 

   Copy this vib file to /var/log/vmware. I don't know why, but everytime I try to install that vib from anywhere else, it fails.
   Execute the esxcli install command from within the ESXi shell. (NOTE: this may well work using the esxcli install tools in the vSphere PowerCLI. I haven't tried it.)

~ # esxcli software vib install -v /var/log/vmware/vmware-esx-storcli-1.12.13.vib --no-sig-check
You need the --no-sig-check part, or else you will get an error about signing.

   In order to run any StorCLI commands, you must cd to the StorCLI directory. installation of the StorCLI binaries does not modify your path to include them or their linked library.

~ # cd /opt/lsi/storcli/
/opt/lsi/storcli #

Now we can issue commands. Here are some of my favorites: 

To create a RAID 0 virtual disk for every physical disk in one shot: 

./storcli /c0 add vd each type=raid0 pdcache=off 

/c0 represents controller 0, the only one you probably have. The pdcache=off command turns off cacheing, which VMware vSAN requests.

To delete all the RAID 0 virtual disks at once:

 ./storcli /c0/vall del

The /vall means all virtual disks. 

To delete one virtual disk for a particular slot: 

   This requires knowing which virtual disk is assigned to which physical disk and slot. Most likely we'll know the drive to be replaced by it's slot number. The 9271 uses the concept of "Enclosures" which are contained on the controller, and contain the slots (drive bays). Issue the command:

./storcli /c0/eall/sall show

which yields a chart that tells us which drive group is attached to which drive. 
 Let's say we need to replace the disk in slot 7. This slot and disk is assigned to Drive Group 5. Now let's find the virtual disk for Drive Group 5.

./storcli /c0/vall show 

gives us a list of virtual drives to drive groups. Drive group 5 happens to hold virtual disk 5. Don't assume these numbers will always be the same. 

 Now we can delete virtual disk 5:

/opt/lsi/storcli # ./storcli /c0/vall show
Controller = 0
Status = Success
Description = None

We can now replace the physical disk. Once that's done, we can create a new virtual disk for the new drive. 

/opt/lsi/storcli # ./storcli /c0 add vd type=RAID0 name=vd5 drives=22:7
Controller = 0
Status = Success
Description = Add VD Succeeded

Notice that the slot is 7, not 5.

That should be all there is to it. This procedure was tested using known good disks, and some steps may be missing due to not having an actual bad drive. Storcli has commands for that, too, like marking a slot good. The docs for Storcli can be found here:

StorCLI Reference Manual

Monday, September 29, 2014

VMware RVC client- Installing to a MAC without the extra baggage

     With the invention of VMware vSAN, my attention was drawn to a new tool for operating on vCenter and performing configuration tasks called the RVC Client. It can be found by logging in to your vCenter Server (v 5.5 and up) and executing the command `rvc`.

VMware recommends having a separate vCenter appliance just to use rvc. Naturally, I don't need yet another VM taking up valuable space on my MacBook Air. I just want rvc, running in a terminal window. This should not be a problem; rvc started out life as a 'Fling' at the VMware Labs website. It's open source. The instructions even say that all you have to do is run the command gem install rvc.

Oh how I wish that were true...

As it turns out my Mac, running ruby version 2.0.0p247 (I don't know what that means, I'm not into ruby), doesn't respond well to that command.

Johns-Mac:~ johnkennedy$ gem install rvc
Fetching: rvc-1.8.0.gem (100%)
ERROR:  While executing gem ... (Gem::FilePermissionError)

    You don't have write permissions for the /Library/Ruby/Gems/2.0.0 directory.

OK, Easy fix. sudo it! 

I needed Xcode command line tools. 

Johns-Mac:~ johnkennedy$ xcode-select --install
Or else nokogiri won't compile properly. 

Johns-Mac:~ johnkennedy$ sudo gem install rvc
Successfully installed rvc-1.8.0
Parsing documentation for rvc-1.8.0
1 gem installed

Awesome! Now I can manage my vCenter, ESXi servers, everything from the Mac without cumbersome clients. 

So I try to get to work doing just that, and this happens: 

Johns-Mac:~ johnkennedy$ rvc 
Install the "ffi" gem for better tab completion.
Host to connect to (user@host): root@mgmt-vc.flexpodlab.com
0 /
1 mgmt-vc.flexpodlab.com/
> cd mgmt-vc.flexpodlab.com/
/mgmt-vc.flexpodlab.com> ls
RuntimeError: unknown VMODL type AnyType

I've dealt with enough cryptic error messages in my time not to try to understand them right away, if at all. Just google em, and find a solution. But nothing worked, until...

I noticed that there was a brand new beta version of rbvmomi, the guts of the rvc. If you are planning on doing anything serious with ruby and vSphere API's, rbvmomi is the tool you need. 
it looked to be a version 1.8.2.pre. So I installed it. 

JOHNKEN-M-N085:~ johnken$ sudo gem install rbvmomi -v 1.8.2.pre
Fetching: rbvmomi-1.8.2.pre.gem (100%)
Successfully installed rbvmomi-1.8.2.pre
Parsing documentation for rbvmomi-1.8.2.pre
Installing ri documentation for rbvmomi-1.8.2.pre
1 gem installed

and uninstalled version 1.8.1

JOHNKEN-M-N085:~ johnken$ sudo gem uninstall rbvmomi -v 1.8.1
Successfully uninstalled rbvmomi-1.8.1

Now rvc works! 

JOHNKEN-M-N085:~ johnken$ rvc root@mgmt-vc.flexpodlab.com
Install the "ffi" gem for better tab completion.
VMRC is not installed. You will be unable to view virtual machine consoles. Use the vmrc.install command to install it.
0 /
1 mgmt-vc.flexpodlab.com/
> cd mgmt-vc.flexpodlab.com/
/mgmt-vc.flexpodlab.com> cd Datacenter/
/mgmt-vc.flexpodlab.com/Datacenter> ls
0 storage/
1 computers [host]/
2 networks [network]/
3 datastores [datastore]/
4 vms [vm]/

P.S. the prompt to install vmrc is a red herring: there doesn't seem to be one for Mac. 

Wednesday, August 6, 2014

configuring the LSI MegaRaid 9271 for VMware vSAN

Those configuring the LSI 9271 can take heart – you don't have to configure each disk by hand for VMware vSAN. 

The problem – The only controller that Cisco sells that is also on the VMware HCL for vSAN is the LSI MegaRaid 9271. VMware vSAN recommends pass-through mode (JBOD). LSI does not support this. In this case, VMware supports creating virtual disks for each physical disk, of type RAID0. 
The solution I originally tried was to use the StorCli software, made by LSI and found on the UCS Utilities disk for C series. It installs directly in the ESXi shell as a VIB. 
BUT, creating individual disks required an individual command for each disk, i.e. 

/opt/lsi/storcli/storcli  /c0 add vd type-RAID0 name=vd1 drives=32:1
/opt/lsi/storcli/storcli /c0 add vd type=RAID0 name=vd2 drives=32:2
Even worse, the enclosure number was required for each command (shown in the above example as ’32’). Since enclosures are potentially different for each server, this made scripting difficult, as finding the enclusre number added complication. 

The solution – As it turns out, someone at LSI was thinking about this problem. They must have faced this particular use-case before (where customers want each pd to have a RAID 0 vd). So the put in a one line command to handle this situation: 

cd /opt/lsi/storcli
./storcli /c0 add vd each type=raid0 pdcache=off  

UPDATE: some commands removed as they did not work in latest testing. 

This command quickly builds all the virtual disks on the physical disks. No muss. No fuss. The extra commands turn off all the caching done by the controller. vSAN likes to take care of this itself. I haven't tested it for performance yet, so this may change in the future.