Powered by Blogger.

VMFS File System Reconstruction (Part 1)

Recently DTI Data received a challenge in the form of a VMWare ESXi 5.1 server which contained 16 (sixteen) 2 (two) terabyte Seagate ST2000DM001 hard drives. These drives were configured as a RAID 10, 8 (eight) drives for each side of a RAID 0 mirrored. The RAID had dropped what looked like 3 (three) drives and this is what degraded the array to the point of where it no longer functioned. In a RAID 10 all that really needs to happen is to drop one drive from each side of the mirror then the entire RAID will come down.

In addition to the drives going down there were some apparent and not so apparent problems with the file system. VMware and therefore VMFS are becoming increasingly popular as the virtual machine is becoming the backbone of the cloud industry. However, the documentation as well as the support recovery software for VMFS is somewhat sparse. This makes it difficult to recover from a corrupted file system and led DTI to develop a method from a more bare bones approach. Each set of problems had to be handled independently and in the final analysis they fell into one of the following three categories.

VMFS Partition Missing

First the actual file system parameters were gone. These aspects of the file system are normally stored in a table type format and then stored on the disk. There are two basic formats that are used, first is the Master Boot Record (MBR) that will house a maximum of four partition entries. Each one of these entries defines a volume or an extension to another table of partitions that will define yet another four volumes. This linked list of partitions is the method Microsoft used to define systems that had more than four drives and has been around almost from inception. The limitation of the partition table method is that the largest volume one can have is 2 (two) terabytes. With the size of drives growing increasingly larger and a 3 (three) terabyte drive becoming the norm it has become essential that a file system handler be more robust as well as flexible in addressing larger storage needs.
To help manage the growth of the storage requirements for the modern server it was necessary to define a volume table handler that could manage a virtually limitless configuration. The technology developed to tackle this situation is called GUID Partition Table (GPT). This table is also in union with the Extensible Firmware Interface (EFI) which is Intel’s answer to virtual and configurable BIOS and would replace the complexities of the on-board BIOS offered by the likes of AMD, a BIOS developer. With all of this going for it you would think that there would be a better method for recovery if it is lost other than a copy stored in the last few sectors of the drive. However this is the method used and still causes a great deal of consternation when trying to rebuild the GPT handler.
With all of this being said the GPT on this particular server was totally absent and was the first situation that had to be handled before the data could be addressed on the RAID 10. The entire GPT was erased and the backup was limited in its usefulness as it did not match the current configuration. It was the task of the DTI technical staff to rebuild the GPT and have it point to the current VMFS partition and possibly mount the partition with no degradation of the data on the current partition. This meant that standard partitioning tools could not be used and a more “by-hand” method would be implemented to ensure the integrity of the data trying to be restored.
In order to do this there were two questions that had to be answered. First, where did the VMFS volume start, and two, where did it end? If this question could be answered then the rest would hopefully fall into place. Let’s take a look at the method for finding the beginning of the VMFS volume and for this we need the help of two tools. First a leading hex editor that has flexible parameters for doing full drive searches for either hexadecimal or ASCII data strings. The tool we used was WinHex by X-Ray software. Among its many forensic functions scanning a drive is one of its most powerful. Second and even more important is how would we know where the VMFS volume started if the GPT had been destroyed. The answer is not as complex as one would think. The on-disk format of the VMFS volume data structure has an embedded ‘magic’ number or code. This number is used as verification by the boot process of the operating system to ensure that the on disk data encountered is correct and can be used to finalize the volume parameters.
The source for this information is found in an API/Toolset developed by Mike Hommey. The entire source code is found on this website http://glandium.org/projects/vmfs-tools/. You will need to download this source code as it will help you immeasurably when trying to recover from a corrupted VMFS volume. It would also be a positive asset if you were familiar with the ‘C’ programming language. It is important to understand the difference between a header file and a source file. What a structure is and how it relates to the on disk format of a file system. Finally it would be helpful to understand single byte alignment within a defined data type. In other words, how to force a structure defined data template to match that of what is on disk without the interference of compiler byte alignment.
To help expedite matters the ‘magic’ number for a VMFS volume is 0xC001D00D. I am not sure of the significance of that particular number all I know is that it is the ‘magic’ for the VMFS volume. In order to use this number in a hex search it is important to understand ‘Big Endian’ and ‘Little Endian’. In ‘Big Endian’ numeric data is stored where the most significant byte is presented first and the least significant byte is stored second. In the case of the VMFS volume magic number it would look like this. 0xC001D00D. It is stored as it looks. In ‘Little Endian’ however the least significant bytes are stored first and the most significant stored secondly. So, the VMFS ‘magic’ number would be stored as 0x0DD001C0. This is important as WinHex searches for data strings as they are stored on disk and not how they look numerically. All of Intel processors use the byte order of ‘Little Endian’, however, SPARC, Motorola, and others use ‘Big Endian’.
DTI scanned the drive using WinHex and found the VMFS volume magic number at sector 10231808. This is about 5 (five) gigabytes down the drive. However, this is not the beginning of the partition. Upon further examination of the ‘C’ source file vmfs_volume.h at the top of the file are two important values. First is the magic number which we have already discussed, second is the VMFS volume sector offset from the beginning of the volume. Both values can be viewed below as they appear in the header file.
#define VMFS_VOLINFO_BASE 0×100000
#define VMFS_VOLINFO_MAGIC 0xc001d00d

The VMFS_VOLINFO_BASE value is defined in bytes so the value in sectors which we are interested in 2048 in decimal not hex. In order to find the beginning of the volume take the sector in which the volume magic is found and subtract the volume base. That gives you the proper value for where the volume begins. The formula is (found magic at 10231808) - (VMFS_VOLINFO_BASE 2048) = (Volume Start 10229760). Interestingly when we created an ESXi 5.1 VMware server and initialized the file system it used the same offsets as our corrupted server. Later on, this will come in handy in trying to rebuild the RAID 10 we received for recovery.

VMFS Volume Finding Software

Finally, at DTI we are always trying to make our clients and site visitor’s lives easier. With this in mind DTI has developed a small software tool that allows the technician to scan a drive for a VMFS Volume magic number. The software will look for the magic number and then calculate and display a line item that can be used to help rebuild the GPT. That software can be downloaded here: VMFS Volume Finder.
The next step will be to take the data we have found and build a GPT table. Hopefully if all goes well we will have a piece of software to help you with that.
    Blogger Comment
    Facebook Comment