Data Sanitization in the Virtual Realm and Cloud

In virtual realm data storage, while there are several solutions for sanitizing entire hard drives there are limited ways to properly sanitize the files for an individual virtual machine. If you take a virtual machine out of service it does not make sense to literally have to wipe the entire storage array to effectively wipe the data associated with the virtual machine in a compliant manner.

While you can use VMware command line tools to overwrite a particular vmdk file most often it will fail to meet NIST 800-88 requirements, fails to address the data within snapshots, fails to meet the verification and reporting requirements found in todays regulatory requirements.

The penalties today for failing to properly sanitize data are high:

Sarbanes Oxley - SOX

Since hard disk drives magnetically store data, there isn't always an easy or clear-cut way to erase the data if the organization hasn't taken the time to vet different techniques and options.

But, according to a recent blog post ignoring this need to completely wipe a computer of all records before recycling it will levy the following fines on C-suite executives, in particular:

  • Directors and officers: $1,000,000
  • Institution: $5,000,000
  • Jail time: 20 years

Since the publishing of the above referenced blog post the fines and penalties under SOX Section 1106 have been increased:

The maximum penalty fine for an individual, as defined by the SEC, has been increased from $1,000,000 to $5,000,000, and the maximum prison sentence has been increased from 10 to 20 years. The maximum penalty fine for someone other than an individual (i.e. a corporation or legal entity) has increased from $2,500,000 to $25,000,000.

In the 2018 European Union Data Protection Regulations (EU GDPR), penalties for not having properly sanitized data are up to 4% of total revenue.

One particular solution to this issue is vErase from Virtual Data Erasure Software (Figure 1). It provides for data sanitization that is compliant with PCI, DoD, HIPAA and NIST. In this paper we will validate that vErase with its three pass process (random, random, zero) does in fact render data non recoverable. Further this paper will show the inadequacy of a standard deleted and the danger of not addressing data within snapshot files.

Figure 1

Background - The VMFS File System

Simply put VMFS is a cluster aware sharable file system that enables multiple ESX servers to have access to the same LUNS (Figure 2). It is the enabler of features such as vMotion, HA and DRS . It is optimized for virtual machine I/O and eliminates the bottleneck of a traditional file system.

Figure 2

The evolution of VMFS

VMFS version 1
VMFS1 is a flat file system with no directory structure, used with ESX Server v1
Officially named "VMware File System"
VMFS version 2
VMFS2 is a flat file system with no directory structure, used with ESX Server v2 and limited use with v3.x
VMFS version 3
Introduced directory structure in the file system
Older versions of ESX Server cannot read or write VMFS3 volumes
Officially named "VMware Virtual Machine File System"
VMFS version 5
2TB VMFS Volumes without using file extents
Greater than 2TB in Physical RDM
Theoretical max of 2048 VMs
Unified 1Mb block size
Large 60TB single extent volumes
Support for 100,000 files
Support for very small files > 1Kb
VMFS version 6
Supports 512 emulation (512e) mode drives

As of the date of this writing VMFS officially remains proprietary and undocumented. Hence it is hard to fault forensic tools vendors for not fully supporting this undocumented and proprietary file system. Simply put this means forensic analysis involving a VMFS file system at best will remain difficult for the foreseeable future.


Version 3.5 update 3 of VMFS provided an experimental utility called vmfs-undelete (Figure 3). This feature was notably not present with the release of VMFS version 4. VMware essentially eliminated any ability to easily recover deleted VM's stored on VMFS with version 4.

Figure 3

Traditionally we would use the block list that would remain in the file system after a file was deleted to gather the blocks to reassemble the file. In version 4 there is no copy retained of the block list after file deletion and no utility to manually create a block list prior to file deletion that could be used in a later recovery.

Official statement from VMware:

"vmfs-undelete utility is not available for ESX/ESXi 4.0 ESX/ESXi 3.5 Update 3 included a utility called vmfs-undelete, which could be used to recover deleted .vmdk files. This utility is not available with ESX/ESXi 4.0. Workaround: None. Deleted .vmdk files cannot be recovered".

Is There Any Data Sanitization in VMware Virtual Disk Provisioning?

While VMware can overwrite data during disk creation (Figure 4), it provides no protection for the data if the VM is no longer in use. At best this feature only affords protection from the co-mingling of data on disk between uses by virtual machines.

Figure 4

Thick Provision Lazy Zeroed

Creates a virtual disk in a default thick format. Space required for the virtual disk is allocated when the virtual disk is created. Data remaining on the physical device is not erased during creation, but is zeroed out on demand at a later time on first write from the virtual machine.

Using the thick provision lazy zeroed format does not zero out or eliminate the possibility of recovering deleted files or restoring old data that might be present on this allocated space. You cannot convert a thick provision lazy zeroed disk to a thin disk

Thick Provision Eager Zeroed

A type of thick virtual disk that supports clustering features such as Fault Tolerance. Space required for the virtual disk is allocated at creation time. In contrast to the thick provision lazy zeroed format, the data remaining on the physical device is zeroed out when the virtual disk is created. It might take much longer to create disks in this format than to create other types of disks.

Thin Provision

Use this format to save storage space. For the thin disk, you provision as much datastore space as the disk would require based on the value that you enter for the disk size. However, the thin disk starts small and at first, uses only as much datastore space as the disk needs for its initial operations.

VMware Recommendation For Deleting VMDK Files Securely

To help prevent sensitive data in VMDK files from being read off the physical disk after it is deleted, write zeros to the entire contents of a VMDK file ("zero out") before you delete it, overwriting the sensitive data. When you zero out a file, it is more difficult for someone to reconstruct the contents.


  1. Shut down or stop the virtual machinee
  2. On the ESXi host, locate the VMDK file by running vmware-cmd -l to list all virtual machine configuration files. By default, the virtual disk file has the same name as the VMX file but a .vmdk extensione
  3. On the ESXi host, run the command vmkfstools -writezeroes filename.vmdk. Here, filename.vmdk is the name of the VMDK filee
  4. Delete the file from the datastore

This method from VMware is a single overwrite of the vmdk file(s) which does not meet NIST compliance requirements and does not address other files such as snapshots. Further, it doesn't provide any ability to verify that the data in question was actually overwritten. It requires some very specific knowledge of which hypervisor files must be erased in order to securely erase a unit of storage. It may not be suitable in a high volume environment as there is no ability to throttle/start/stop/resume erase jobs. Lastly, it fails to meet compliance requirements for auditing, record-keeping and reporting.

Forensic Validation of vErase method(s)

Preparation of the hard drive

The hard drive was first formatted with VMFS and then mounted to an ESX server. A base virtual machine of a vanilla install of Ubuntu 16.04. was created and then cloned as 4 separate virtual machines.

Each virtual machine then had a file named /home/verase/secret.1.txt written to it, containing a single text string in CSV-style format typical of PII and credit card data, example row given below. The format is " <firstname> <lastname> - <email> - <password> - <md5> ... etc ". The string "CVV2" will appear in each file and can be used as a search key.


Virgilia Schleicher ? ? UVu6UTYbe ? b19c5865ae7fed2d01cd85992c4fd94b 7862 Silver Nook, Osoyoos, NU, X4Q-6K8, CA, (867) 998-1454 4663-1995-4274-7183, CVV2, 469, 04/2020

secret-1.txt is written directly to the virtual disk, and secret-2.txt is written after a snapshot is created.

The four virtual machines were then deleted in the following manner :

vm-1 : No deletion, provided for reference

vm-2 : Deleted using the standard VMware delete function

vm-3 : Deleted using vSector erase method #1 (vmdk sanitized but not the snapshot)

vm-4 : Deleted using vSector erase method #2 (complete sanitization of the virtual machine)

For the purposes of the forensic analysis of the hard disk the distro "Losbuntos" (Figure 5) was used ( It has all of the tools needed to perform a sound forensic validation of vErase including Vmfs-tools to mount the vmfs file system. When using Vmfs-tools in LosBuntu to mount a disk it mounts the disk Read Only thereby mitigating the need for a write blocker.

Figure 5

We connected the vmfs formatted hard drive to our LosBuntu virtual machine. We then identify the drive using "fdisk" (Figure 6) and verify the vmfs volume using "blkid" (Figure 7)

Figure 6

Figure 7

We then used the command "sudo vmfs-fuse /dev/sdb1 /mnt/vmfs" to mount our hard drive. To verify we now have access to vmfs we did a quick "cd /mnt/vmfs " and an "ls" and we could see 4 virtual machines. Doing a "cd vm-1" revealed the files associated with virtual machine 1 (Figure 8)

Figure 8

Now that we have access to the file system it is time to see what data can be retrieved. We know that each record in each virtual machine contains the word "CVV2" so we can create a quick command line to use dd to read the disk and pipe to strings to search for "CVV2" (Figure 9)

Figure 9

The search returned 5 records (Figure 10) which makes sense. I expected to see 2 records from VM-1, 2 records from VM-2 , 1 record from VM-3 and no records from VM-4. Let me explain in more detail:

Each VM had 1 record in its /home/verase directory and one in a snapshot.

vm-1 : No deletion, provided for reference only hence 2 expected records

vm-2 : Deleted using the standard VMware delete function — easily recovered the deleted records hence 2 records

vm-3 : Deleted using vSector erase method #1 vmdk sanitized but not the snapshot hence only 1 record recovered

vm-4 : Deleted using vSector erase method #2 complete sanitization of the virtual machine hence no records recovered

Figure 10

For the sake of clarity separate searches were then performed on the respective virtual machine directories:

The command "sudo dd if=/mnt/vmfs/vm-1.flat.vmdk | strings | grep CVV2 >> searchvm1.txt" (Figure 11) and "sudo dd if=/mnt/vmfs/vm-1-000001-delta.vmdk | strings | grep CVV2 >> vm1delta.txt" (Figure 12)

The command "sudo dd if=/mnt/vmfs/vm-2.flat.vmdk | strings | grep CVV2 >> searchvm2.txt" (Figure 13) and "sudo dd if=/mnt/vmfs/vm-2-000001-delta.vmdk | strings | grep CVV2 >> vm1delta.txt" (Figure 14)

The command "sudo dd if=/mnt/vmfs/vm-3.flat.vmdk | strings | grep CVV2 >> searchvm3.txt" (Figure 15) and "sudo dd if=/mnt/vmfs/vm-3-000001-delta.vmdk | strings | grep CVV2 >> vm3delta.txt" (Figure 16)

Figure 11

Figure 12

vm-1 : No deletion was performed this vm was provided for reference only - hence as expected 2 records were recovered for vm-1 (See Figure 12 and Figure 13).

Figure 13

Figure 14

vm-2 : Deleted using the standard VMware delete function which fails to properly overwrite data — Hence two records were easily found in unallocated space.

Figure 15

Figure 16

vm-3 : Deleted using vSector erase method 1 vmdk sanitized but not the snapshot hence only 1 record recovered. This is expected behavior as only the flat.vmdk was overwritten and the snapshot was not.

Figure 17

vm-4 : Deleted using vSector erase method #2 complete sanitization of the virtual machine was performed hence no records were able to be recovered and the required certification of secure data erasure was provided (Figure 18).

Figure 18

In conclusion:

The vErase software properly sanitized the deleted records in VM-4 using a NIST 800-88 standard for a three-pass erasure method — two passes with random data were written to the media and then a third pass overwriting the media with all zeroes was written. The respective data was not found to be forensically recoverable from either allocated or unallocated space and the required certification of secure data erasure was properly provided. More information on vErase is available at

In VM-3 vErase properly sanitized the records in the flat.vmdk file and as the snapshot was not deleted / over-written a single record was retrieved. This speaks volumes for the need to always choose an erasure method that includes sanitization of any VM snapshot(s). Failing to sanitize any snapshots can lead to potential exposure of stale data that may reside within a snapshot.

In VM-2 a standard VMware erase was performed and all of the records were easily recovered from unallocated space. This speaks volumes for the need to always overwrite data (with multiple passes) in your efforts to sanitize data. Lastly VMware did not provide the required certification of secure data erasure — hence no regulatory required proof that a best practice was used to properly erase the VM data. Using a standard erasure from most cloud providers today i.e. Amazon, Google or Microsoft do not follow NIST 800-88 guidelines and will likely lead to a very costly and embarrassing data exposure event.

More Blogs


The highest levels of industry certification

VCP-DCV (5.5), vExpert, Florida PI License C2800597