Tuesday, September 30, 2014

VMware Virtual SAN (vSAN) - replacing a failed disk connected to the LSI 9271 controller.

    VMware vSAN has it's own hcl of disks and controllers. This is a subset of the vSphere hcl, and the only Cisco controller card on the list is the LSI MegaRaid 9271. This is a high performance controller, but LSI does not support running the 9271 in JBOD mode. As a result, virtual disks need to be created, SSD disks need to be marked as such, etc. I've discussed this briefly before.

    RAID 0 makes troubleshooting failed disks problematic. The disks are virtual, not physical. as a result, simply replacing the disk may not be useful; the virtual disk needs to know you replaced the physical disk. This means direct interaction with the controller.

   Many customers know that the 9271 can be controlled via the WebCLI, but that is only available at boot time. Once the server is running, one must reboot to access this tool. Fortunately Cisco and LSI have planned for this challenge.
 
   LSI makes a utility called StorCLI. It is available at the LSI website and also comes on the Utilities iso for UCS, found at Cisco Support.

  Once you get this iso,  you need to find the StorCLI .vib file. You could try mounting the iso to the ESXi server, but I wouldn't recommend it. Too much trouble getting ESXi to see the attached CD drive. If you can mount it anywhere else, I recommend that.
 





Once you get the iso mounted, go to the directory ucs-cxxx-utils-vmware.2.0.3 (1).iso\Storage\LSI\9xxx\StorCLI. There you will find the StorCLI vib file. 


   Copy this vib file to /var/log/vmware. I don't know why, but everytime I try to install that vib from anywhere else, it fails.
   Execute the esxcli install command from within the ESXi shell. (NOTE: this may well work using the esxcli install tools in the vSphere PowerCLI. I haven't tried it.)

~ # esxcli software vib install -v /var/log/vmware/vmware-esx-storcli-1.12.13.vib --no-sig-check
 
You need the --no-sig-check part, or else you will get an error about signing.

   In order to run any StorCLI commands, you must cd to the StorCLI directory. installation of the StorCLI binaries does not modify your path to include them or their linked library.

~ # cd /opt/lsi/storcli/
/opt/lsi/storcli #

Now we can issue commands. Here are some of my favorites: 

To create a RAID 0 virtual disk for every physical disk in one shot: 


./storcli /c0 add vd each type=raid0 pdcache=off 

/c0 represents controller 0, the only one you probably have. The pdcache=off command turns off cacheing, which VMware vSAN requests.

To delete all the RAID 0 virtual disks at once:


 ./storcli /c0/vall del

The /vall means all virtual disks. 


To delete one virtual disk for a particular slot: 

   This requires knowing which virtual disk is assigned to which physical disk and slot. Most likely we'll know the drive to be replaced by it's slot number. The 9271 uses the concept of "Enclosures" which are contained on the controller, and contain the slots (drive bays). Issue the command:

./storcli /c0/eall/sall show

which yields a chart that tells us which drive group is attached to which drive. 
 Let's say we need to replace the disk in slot 7. This slot and disk is assigned to Drive Group 5. Now let's find the virtual disk for Drive Group 5.

./storcli /c0/vall show 

gives us a list of virtual drives to drive groups. Drive group 5 happens to hold virtual disk 5. Don't assume these numbers will always be the same. 


 Now we can delete virtual disk 5:

/opt/lsi/storcli # ./storcli /c0/vall show
Controller = 0
Status = Success
Description = None

We can now replace the physical disk. Once that's done, we can create a new virtual disk for the new drive. 

/opt/lsi/storcli # ./storcli /c0 add vd type=RAID0 name=vd5 drives=22:7
Controller = 0
Status = Success
Description = Add VD Succeeded

Notice that the slot is 7, not 5.

That should be all there is to it. This procedure was tested using known good disks, and some steps may be missing due to not having an actual bad drive. Storcli has commands for that, too, like marking a slot good. The docs for Storcli can be found here:

StorCLI Reference Manual

No comments:

Post a Comment