With Windows Server 2012 Microsoft introduces a built-in software based data deduplication solution. Where some deduplication solutions provide their services file-based, the deduplication in Windows Server 2012 is block-based.
Deduplication in Windows Server 2012:
- Only available in Windows Server 2012.
- Deduplication is cluster aware.
- Based on a filter driver per volume.
- Not supported on boot- or system volumes, only intended for data storage volumes.
- Does not work on compressed or NTFS encrypted files.
- Deduplication requires an NTFS file system and is not supported for the new ReFS file system which is introduced in Windows Server 2012.
- Does not work with Cluster Shared Volumes.
- Does not work with encrypted files, files smaller than 32KB, re-parse points or files with extended attributes.
- Not configurable through Group Policy.
- It is a post-process deduplication process.
- Windows caching is deduplication aware.
Data deduplication – Possible Savings
Microsoft has done some research in their deduplication technology and come up with some numbers on the storage savings deduplication provided:
Usage | Possible Saving |
General | 50-60% |
Documents | 30-50% |
Application Library | 70-80% |
VHD(X) Library | 80-95% |
Data deduplication – Performance
Data deduplication will cost you some performance, that is a fact.
Whether done on a storage level or in an OS…
Microsoft has offered some information about this.
Write actions have no direct performance hit since the deduplication process is done in the background when the system is idle.
Read actions do have a performance hit, around 3% when the file is not in cache.
My real life experience so far: the performance loss is totally neglectable and the you will love the amount of data you can put on that fast SSD!
Data deduplication and PowerShell
Deduplication can be configured, controlled and monitored via the new Server Manager GUI or by PowerShell.
To enable the deduplication feature by using PowerShell commands:
Add-WindowsFeature -name FS-Data-Deduplication |
To configure deduplication on volume D on a device:
Enable-DedupVolume D: |
To get the statistics of a volume (the amount of storage we actually saved) use:
Get-DedupStatus |
By default, the deduplication process will only affect files that have not been changed for 30 days.
To change this value to for example: 0 (process the file a.s.a.p.) use:
Set-DedupVolume D: -MinimumFileAgeDays 0 |
The deduplication process is done through scheduled tasks.
My advice: do not use the scheduled tasks if you are running Virtual Machines on the volume.
If this is your scenario (like mine), shut down the VM’s and do a manual optimization after creating one or more VM’s!
To start this process manually use:
Start-DedupJob D: –Type Optimization |
To view the status of a job, use:
Get-DedupJob |
We can use PowerShell to enable deduplication, but we can also disable deduplication on a volume with PowerShell.
Use this:
Start-DedupJob -Volume D: -Type Unoptimization |
For the PowerShell cmdlet’s for deduplication use:
Help Dedup |