#!/bin/bash # vmdk_integrity_check.sh # Checks for early signs of VMDK corruption DATASTORE_PATH="/vmfs/volumes/VM_Storage" LOG_FILE="/var/log/vmdk_health.log"

# Disk DescriptorFile version=1 CID=fffffffe parentCID=fffffffe createType="vmfs" RW 419430400 VMFS "server01-flat.vmdk"

[SUCCESS] Grain table rebuilt. 1,892,471 sectors recovered. [WARNING] 204 sectors unrecoverable (log files, temporary data). The SQL database performs a recovery on mount. I lose 15 minutes of transaction log data. The business accepts the loss. To prevent this, implement a VMDK health check cron job on the ESXi host:

Upon attempting to power on the VM, vSphere returned the dreaded: "Failed to power on VM. The specified virtual disk needs repair (Corrupt disk)." Attempting to browse the datastore revealed the file server01_2.vmdk (the 200GB delta disk) present, but the system could not read its metadata. Part 1: The Post-Mortem Analysis What is a VMDK? A VMDK is not a single file, but a logical container. It consists of a small descriptor file (text) and a large extent file (raw data). Corruption often strikes the descriptor or the filesystem journal within the extent.

The CID (Content ID) was reset to fffffffe — a flag meaning "unknown or corrupted parent." The flat extent is intact, but the mapping is blind.

Scenario: VMware ESXi environment, version 7.0.3. VM: Legacy SQL Server.

if [ ! -z "$FLAT_SIZE" ]; then CALC_MB=$(($FLAT_SIZE / 1024 / 1024)) echo "Health: $vmdk is $CALC_MB MB nominal." >> $LOG_FILE fi done VMDK corruption is rarely a hardware failure. It is a timing failure —where the hypervisor assumes a write completed before the storage array confirms it. The fix involves bypassing the broken descriptor and forcing a rebuild from the raw extent. Always keep a backup of the .vmdk descriptor file separately from the flat file.