The Great VSAN Recovery of 2017

Hey Hey,

Now VSAN is not something I would generally blog about but I had a small win today which I thought I would share.
I have just moved house….yay!! everything is great, kids are happy, wife is happy(most important part) but unfortunately the Lab was not so happy.

Previously my lab was parked under my desk at home, taking up much of my leg space while providing room heating during winter. But now it has its own area free from little fingers pushing buttons, but one of the hosts did not make the trip so well.

Host number 2 came up with the ever so awesome invalid boot device. tried everything but it was cactus.

I have been running VSAN since the beginning and its been a love hate relationship, even though I have it setup to tolerate a host failure, there always seams to be a couple of VM’s that will be in an invalid state because the data it needed was on the host that was unavailable. I had my fingers crossed that all my machines where fine.

I logged in, and with host 1 and 3 available, I was down a couple of VM’s unfortunately one was the NSX manager, my main vRA cafe node and my main vROps box which i’ve done all the dashboards and other stuff on. Why o why does this happen. This was a problem because i had stopped doing backups some time ago to fix things but never re enabled… my bad.

Being upfront here I know very little about VSAN I don’t think I’m an expert and my only experience with it is in this lab, give me vRA or vROps any day.

Making a really long story short, I re built host 2 and re configured everything, VSAN disks looked fine but under VSAN management I had host 2 as connected, but Virtual SAN Health Status for that host as Unknown, and the Network Partition Group was blank.
Under General it was only saying 2 of 2 hosts connected.

From vSphere I could not find a way to re connect the disks without removing the disk group, which gives a nice warning about all data being erased if I do this, something I’m trying to avoid.

So I turn to my friend esxcli.
On host 2 I got the below using esxcli vsan cluster get
vsan01

After some fumbling around I could see I can join a cluster using esxcli vsan cluster join
vsan02

Now all I needed was the UUID of the cluster. using the previous get command on host 1 I got this:
vsan03

Taking a stab in the dark I assumed Sub-Cluster UUID was what I needed so I used that on Host 2 using esxcli vsan cluster join -u 52fef163-6062-db58-d753-c6f2868ac4d4
vsan04

Instantly my inaccessible VM’s where available and looking in vSphere everything was looking normal, I was able to rebuild the host and keep the disk group and data intact.
While VSAN and I have some issues over the years everything generally works out well in the end.

Cheers

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Anti SPAM BOT Question * Time limit is exhausted. Please reload CAPTCHA.