Chapter 12 Windows 2012 R2 Storage: Storage Spaces, SANish Abilities, and Better Tools

For those who have had some time to play with Storage Spaces in Windows Server 2012, I bet you are excited about the new changes in R2. But before we discuss them, let’s do a quick recap of Storage Spaces for those who are new to this feature.

Storage Spaces was introduced in Windows Server 2012 as a native feature. Remember, this is not RAID but is something new that was designed for full-blown enterprise use. The basic function of Storage Spaces is to allow you to take just a bunch of disks (JBOD) and configure them in a pool. From here you can create virtual disks (the actual storage space) and volumes with fault tolerances of various degrees. This type of configuration gives you great flexibility.

Imagine not having to invest in a large, expensive storage area network (SAN) or in the specialized training that your administrators would need to configure and maintain it. A core goal of Storage Spaces is to provide a cost-effective solution for mission-critical storage. Storage spaces and pools are designed to grow on demand. Here is a list of just some of the features in Storage Spaces that are included in Windows Server 2012:

Just-in-time provisioning
Fault resiliency (mirroring and parity)
Intelligent error correction
Multi-tenancy support
Integration with CSV to allow scale-out scenarios

In this chapter, you will learn to:

Create a storage pool on a virtual disk
Create additional storage on a virtual disk
Use deduplication techniques to reduce file size

What’s New in Windows Server 2012 R2 Storage?

Since this book is about Windows Server 2012 R2, we’ll now tell you what’s new in storage. Microsoft has included technology in Storage Spaces that you previously saw only in expensive storage arrays. The following sections detail the new technology.

Tiered Storage Spaces

There are several classifications of disks in today’s storage world, including Serial Advance Technology Attachment (SATA), Serial Attached SCSI (SAS), Solid State Drive (SSD), and Fibre Channel. Choosing the correct storage for the job is essential. For example, if you need a file server, SSD is not a good choice. SSD was designed for speed, not capacity, and because file servers generally need capacity and not speed, SATA, which was designed for capacity rather than speed, in this case may be a better match.

In Windows Server 2012 R2 you can have a maximum of two storage tiers, essentially a fast tier and a slow tier. These tiers use SSD in the fast tier and SATA in the slow tier automatically. The really clever thing here is that an administrator doesn’t have to decide up front where to place the data. The Storage Tiers Management Service will automatically analyze the data on your disks in slices of 1 MB. It has two categories for assignment: hot spots and cold spots. Hot spots are areas of the data that are accessed frequently; the assumption here is that since this is active data, it is a “hot topic.” Cold spots are the opposite: data that has not been accessed regularly. After the analysis, hot spots will be promoted to the SSD tier, and any identified cold spots will be assigned to the SATA tier. The analysis happens daily at 1:00 a.m. by default, but you can configure it if you like; see Figure 12.1. But if a file needs to be on the fast tier all the time, the administrator can “pin” the file to the fast tier.

Figure 12.1 Storage Tiers Management service in the Task Scheduler

Write-back Cache

Write-back cache refers to how data is written to disk. Data is written to the cache first and will be stored there until it is about to be overwritten; at this point it will be flushed to disk and committed. In general, writing and storing data in cache gives better performance and is considered another type of memory. If an app is writing to the cache, it can hand off its I/O handle quickly and get back to other tasks. Certain workloads traditionally don’t like write-back caching because when the app writes data, it must be written to disk to avoid corruption. Hyper-V, for example, requires write-through. Tiered storage can be used in conjunction with a virtual disk (not VHD/VHDX in relation to Hyper-V and virtual machines in this case, but a virtual disk from a Storage Spaces perspective) to absorb any spikes in writes. The fast tier can then be used to overcome the spike and allow Hyper-V to use write-back caching.

Parallelized Repair

When a disk fails in a traditional RAID set, if you have a hot spare (a disk that can instantly take over the job of a failed disk in a RAID set), this hot spare will kick in and the RAID array will start rebuilding the data on this disk. A performance impact on the disk subsystem is inevitable during the rebuild because all the data is being written to a single disk. The paralyzed repair process in Storage Spaces is a little different. If a disk fails, the remaining healthy disks that have suitable capacity take ownership of the data that was stored on the failed disk and will serve users’ requests across all available spindles. Since all disks are now helping out, there should be no performance impact. The repair process can bring in the hot spare or the administrator can replace the failed disk, and in the background the disk can be brought back into the storage space.

Low-level Improvement: Native 4K Sector Support

Originally, hard disks used a 512-byte-per-sector format, and with this came limitations in storage size and performance. With ever-increasing demand on capacity and speed, a change was needed, and over the course of a few years 4K sector disks became standard. However, software (such as file system utilities, operating systems, and database engines) was not necessarily quick to catch up. Most drive manufacturers shipped 4K sector drives but emulated 512-byte sectors for compatibility. Obviously, this requires a bit of overhead because the entire 4K sector is read into memory, modified, and then written back. Since there is a degree of manipulation happening, there is also a performance impact, which is acceptable in this case.

With native 4K support, the data storage industry no longer emulates 512-byte sectors, which means the performance impact is gone.

The following list shows some of the apps and scenarios for 4K sector support:

Ability to install Windows to and boot from a 4K sector disk without emulation (4K native disk)
New VHDX file format
Full Hyper-V support
Windows backup
Full support with the New Technology File System (NTFS)
Full support with the new Resilient File System (ReFS)
Full support with Storage Spaces
Full support with Windows Defender
Inbox application support

UEFI BIOS Support Allows GPT Drives

Master boot records are special areas located at the beginning of the partitioned space on a disk. They contain information on the underlying partition structure and some chained boot code to allow the operating system to start. Master boot records stored their block address as 32 bits. Originally, a 512-byte sector drive with a block address of 32 bits was limited to 2 TB. This obviously is no longer acceptable. So the industry went to 4K sectors, and now drives have a maximum capacity of 16 TB. It seems like we are getting there, but considering that in today’s infrastructure we could be looking at petabytes of data, terabytes just doesn’t seem to cut it.

The GUID Partition Table (GPT) provides a 64-bit addressing structure. With a 512-byte sector drive you could have 9.4 zettabytes (9,444,732,965,739,290,426,880) of data. Currently, the GPT supports a maximum disk and partition size of 8 ZB.

I can envision you now rushing to convert your drives to GPT, and you would be right! But be careful; not all operating systems support booting from GPT partitions using standard BIOS. See the following link:

http://en.wikipedia.org/wiki/GUID_Partition_Table

The Unified Extensible Firmware Interface (UEFI) is designed as a direct replacement for the legacy BIOS system. It essentially does the same job but adds functionality like diagnostics and repair of computers with operating systems deployed. UEFI is designed to support booting from GPT, and Windows 2012 R2 fully supports the UEFI BIOS.

CHKDSK Gets Smarter

In all the years I have been computing, this has been one of my staples. CHKDSK has been with us for many generations of DOS and Windows, and it is (for me, anyway) great to see this tool upgraded.

One of the biggest problems that faced CHKDSK before its upgrade was its direct relationship to the number of files on a volume. The larger the number of files, the longer it took to run. Another problem that constantly plagued CHKDSK was that if it detected a problem, it usually had to dismount the volume, rescan everything, detect the problems all over again and then fix the problem. As you can imagine, with large volumes this took a long time, and in our current always-on culture, downtime is simply not acceptable.

The CHKDSK code has been upgraded, and the NTFS health model has also been redesigned. We’ll discuss these upgrades in the next few pages, but they essentially lead to the simple conclusion that CHKDSK is no longer needed in its former capacity.

Online Self-healing

Although this feature of NTFS has been around since Windows Vista, the number of issues that it can detect and fix online has greatly increased. This in turn has decreased the actual need for CHKDSK because most issues will be self-healed. And if they self-heal, the volume doesn’t have to go offline.

Online Verification

In Windows Server 2012 you can verify an actual corruption. Sometimes errors occur because of memory issues, but this doesn’t necessarily mean the disk is corrupt. Now, because of online verification, you can invoke a check. A new service called Spot Verifier is triggered by a file system driver to perform this check, as shown in Figure 12.2. It operates in the background and does not affect system performance.

Figure 12.2 Spot Verifier service

Online Identification and Logging

Once you find a real issue, an online scan of the file system is triggered. This scan is designed to run in conjunction with the operating system and will run only when the system is idle or when utilization is low. Once it finds the problem, it logs it for offline correction.

Precise and Rapid Correction

Because you have logged where the issues are, you don’t have to scan the entire file system again when you begin the offline process. This essentially means that when you do take a volume offline to repair the issues, it takes seconds to repair rather than potentially hours. This quick fix is called Spotfix. If you are using Cluster Shared Volumes, there is no downtime, giving you always-on volumes.

With these new improvements, the CHKDSK runtime is no longer based on the number of files but rather on the number of corruptions. Because you can repair so many issues online (with CSV always online), CHKDSK is becoming less required. Figure 12.3 shows the new options that are available in CHKDSK.

Figure 12.3 New CHKDSK options

The new options are /scan, /forceofflinefix, /perf, /spotfix, /sdcleanup, and /offlinescanandfix. As you can see, they relate directly to the new health model described previously.

We must stress at this point that another key goal of upgrading CHKDSK was to ensure that users are kept informed of any corruption. Part of the reason was to allow users and administrators to stop actively running CHKDSK to verify the file system; now there is simply no need. The system uses the Action Center included in Windows to notify a user or administrator of file system corruption and recommends an action. See Figure 12.4 for the results of a sample online scan.

Figure 12.4 Message in Action Center for online scan

In Figure 12.5 you can see that if the issue cannot be repaired online, the Action Center will ask you to restart the computer to allow an offline repair.

Figure 12.5 Message in Action Center for offline scan

In-depth Look at Storage Spaces

In Windows Server 2012 R2, the concept of Storage Spaces is essentially the same as what was released in Windows Server 2012, which we discussed at the start of this chapter. The exceptions are the new features we introduced earlier and now we will go into further detail about.

Reusing Technology from Microsoft’s Cloud

Microsoft runs multiple cloud services. I’m sure you’ve heard of Windows Azure or Office 365. Imagine all the lessons Microsoft learned during the deployment, setup, and day-to-day operations of these environments. Also imagine that if Microsoft had to buy multimillion-dollar storage networks to cope with the ever-growing need for storage in cloud-based environments, how crippling this would be to a cloud-based environment.

Microsoft applies all this knowledge to the new technologies they release, including Storage Spaces. Microsoft needed a cost-effective way to increase storage and maintain essential features found in storage area networks, hence the birth of Storage Spaces. As cloud services develop, you will see improvements in Storage Spaces such as those you have seen between the releases of Windows Server 2012 and Windows Server 2012 R2.

Providing SAN-like Capabilities with Microsoft Management Tools

One of the really interesting things about Microsoft technologies is the familiar interface that they provide for managing their products. You are usually given two options: the GUI and PowerShell.

Using the GUI

Although the Microsoft Management Console (MMC)—the traditional console for most of the management plug-ins—still exists, most features within Windows Server 2012 R2 are managed via Server Manager, as shown in Figure 12.6. Storage Pools is enabled by default on all systems and can be found as a subfeature under File and Storage Services.

Figure 12.6 Server Manager, File and Storage Services

Once you choose File and Storage Services, you will see all the related options to this menu, including Storage Pools, as shown in Figure 12.7.

Figure 12.7 Suboptions for File and Storage Services

Clicking Storage Pools will bring you into the main configuration. Take a look at Figure 12.8.

Figure 12.8 Storage Pools configuration

This is the main configuration window, it is split into three main areas:

Storage Pools This area contains a section called Storage Spaces, and under it is listed Primordial. By default, all disks not assigned to a different pool are assigned to the Primordial pool. As you work with Storage Spaces you will notice that the Primordial pool will disappear when all the disks have been assigned.

As you can see, there are no other pools assigned or configured yet. In the top-right corner of this area, under Tasks, is the option to create a new storage pool. We will play with this later on, so don’t worry about it for now.

You can right-click the Primordial pool and examine its properties.

Virtual Disks Virtual Disks represent the volumes you will create inside a storage pool. Remember, this is not a VHD or VHDX file. You cannot create a virtual disk inside the Primordial pool; you must create a storage pool first.

Physical Disks Physical disks are the disks that are available to Storage Spaces to assign to storage pools. A disk can be assigned to only one pool at a time. If you right-click a disk in this list, you get the option to toggle (turn off/on) a drive light (toggling drive lights helped you find which disk you are working on in a physical storage array); this will work only if the storage you are using is SCSI Enclosure Services (SES) compliant. SES also works if a drive is failing and will communicate with Storage Spaces to let the administrator know what is happening.

Using PowerShell

As mentioned, you can use PowerShell cmdlets to quickly provision Storage Spaces. Some administrators prefer to work with a command-line environment when administering servers. Personally, I like to mix them.

Windows Server 2012 R2 has a new PowerShell module called Storage, which contains all the PowerShell cmdlets you need to work with Storage Spaces.

In Windows Server 2012 and above, a PowerShell module is automatically imported when you attempt to call a cmdlet that is part of that module. To review the cmdlets available to you under the Storage module, open an elevated PowerShell window and type get-command –module Storage. In Windows Server 2012 R2, there are 102 cmdlets available to you. Not all are storage pool related.

Hopefully you are familiar with PowerShell and understand its verb/noun structure. If you base what cmdlets you are looking for on Figure 12.8, this would mean you are looking for cmdlets related to physical disks, virtual disks, and storage pools. To help you identify the cmdlets for each of these, try typing get-command *StoragePool* |where {$_.modulename –eq "Storage"} and examine the result. Figure 12.9 shows the expected output; each cmdlet has its own set of options, which you can see by using get-help cmdletname.

Figure 12.9 Storage cmdlets

We are not going to go through this here, but we are going to show you some sample output from a couple of the cmdlets. For example, type get-storagepool and observe the output. Now type get-storagepool |fl * and see the difference. FL is an alias in powershell which standards for Format-List, the * option determines which properties you want to display, in this * stands for All properties. The output is shown in Figure 12.10. Repeat for the cmdlets get-physicaldisk and get-virtualdisk and observe the output.

Figure 12.10 Sample output for get-storagepool

Creating a Storage Space

Storages Spaces is an extremely powerful feature, and as we have mentioned, storage spaces bring many benefits to an organization. They are also simple to configure. No specialized training is required.

In the next few pages we’ll walk you through the process of creating a storage space and show you how simple it really is. The general process is as follows:

1. Obtain free physical disks.

2. Create storage pools

3. Create virtual disks

We will show you how to create a storage pool via the GUI and PowerShell. First, however, we will give you a quick introduction to our lab. We have a single server with multiple physical disks, two 100 GB SAS drives, two 150 GB SAS drives, and one 300 GB SATA drive.

Creating a Pool

When you create a storage pool you must decide on the physical disks you want allocated to the pool. It is important to think in terms of what the pool will be used for and, now with the storage-tiering feature, what type of disks should be part of the pool.

To create a pool, follow these steps:

1. Open Server Manager and click File and Storage Services.

2. Choose Storage Pools.

3. Right-click the primordial pool and select New Storage Pool, as shown in Figure 12.11.

Figure 12.11 Creating a new storage pool

This will open the New Storage Pool Wizard.

4. Click Next to bypass the welcome screen.

5. In the Storage Pool Name screen, shown in Figure 12.12, you must name your storage pool in this case just name it Test. You can optionally add a description.

Figure 12.12 Naming your storage pool

In the bottom half of the window you will see it is using the primordial pool for its available disk pool.

Next, you need to select the physical disks in your pool.

6. In this example, select all the disks, as shown in Figure 12.13.

Figure 12.13 Selecting disks for the storage pool

Notice that you can select both ATA and SAS disks. Figure 12.14 shows the Allocation options: Automatic, Hot Spare, or Manual.

Figure 12.14 Disk allocation options

7. In our example we are using Automatic allocation. Click Next to continue.

Mixing Manual and Automatic Disks

When allocating disks, you can have multiple hot spares in your pool, but you should not mix manual and automatic disks. Choosing Automatic on this screen will balance the pool automatically between hot spares and usable capacity.

You cannot change the disk allocation from manual to automatic after making an assignment in the GUI, but you can do so from PowerShell. See Figure 12.15 for an example of how to identify and change the allocation of a drive.

Figure 12.15 Changing the drive allocation type in PowerShell

8. Finally, as with all wizards, you get a chance to review the options you selected before committing the change; see Figure 12.16. Click Create when you are satisfied with your choices.

Figure 12.16 Reviewing the configuration options before creating a pool

A progress screen will appear and show a status of Completed when the pool is created; see Figure 12.17.

Figure 12.17 Storage pool created successfully

Congratulations on creating your first storage pool.

Pool Limitations

As with all technologies, there are some limitations to storage pools, and although the technology is powerful, it most certainly is not optimal for every situation. With that in mind, let’s look at the limitations:

A hard drive must be 10 GB or larger.
You cannot deploy a boot system to a storage space.
Any drives to be added into a storage pool must not be partitioned or formatted. All data on any drive used will be lost.
Three drives are required when parity is used, two drives for two-way mirroring, and three or more for three-way mirroring.
All drives in a pool must be of the same sector size (4K/512e or 512). 512e or 512 Emulation allowed manufacturers to make 4K sector disks and maintain compatibility with software which had not been updated to understand 4K sectors.
Fibre Channel and iSCSI disks are not supported in a storage pool.
All storage must be compatible with storport.sys. To check this use the Microsoft Hardware Compatibility List (if your hardware appears on the compatibility list it will work with storport) located at the following URL: http://www.microsoft.com/en-us/windows/compatibility/CompatCenter/Home?Language=en-US.
If a virtual disk is to be used in a failover cluster, NTFS must be the file system deployed to the virtual disk.

Viewing Drives in Disk Management

Essentially, a storage pool is a logical container for the disks. For example, in our demo environment we have several disks ready to be assigned to a storage pool. In the Disk Management screen shown in Figure 12.18, you can see all the physical disks listed before we pooled them.

Figure 12.18 Unallocated disks in Disk Manager

After we added the disks to a storage pool, we refreshed Disk Manager. As you can see in Figure 12.19, they all disappeared. Disk 1 still remains as this is the OS disk and it always will be and you will never be able to include it in a storage space. Where did they go? Remember that a storage pool is a container. You need to create virtual disks in order to see the volumes in Disk Manager again.

Figure 12.19 Drives no longer appearing in Disk Manager after being added to a storage pool

Pooling with PowerShell

We mentioned earlier that everything you can do in the GUI you can do via PowerShell. We will now show you how to create a storage pool using PowerShell:

1. First, you need to find which disks are available. Use the Get-PhysicalDisk cmdlet to retrieve a list of all disks in the system. Figure 12.20 shows the results for our example.

Figure 12.20 Displaying available physical disks

2. Look at the CanPool property. When its value is True, this disk can be used in a storage pool.

3. Filter on physical drives that can be pooled, and store the results in a variable for later use using the following syntax:

$drivestopool = (Get-physicaldisk |where {$_.CanPool –eq $True})

4. Next, identify the storage subsystem you are running, and again store it in a variable, but in this case you are interested only in the FriendlyName of the property, as shown in Figure 12.21.

Figure 12.21 storagesubsystem cmdlet

Use the following syntax to capture the storage system’s FriendlyName:

$storagesystem = (get-storagesubsystem).friendlyname

Now you can create the storage pool.

5. Use following syntax for creating a pool:

New-storagepool –friendlyname TestPool –StorageSubSystemFriendlyName $storagesystem –physicaldisks $drivestopool

The FriendlyName for the pool can be any string you wish. This syntax will create a pool named TestPool with the disks that can be added into it. See Figure 12.22.

Figure 12.22 Output of PowerShell when creating a new storage pool

6. Now use the Get-StoragePool cmdlet to find out more detailed information about your pool using the following syntax:

Get-StoragePool TestPool |fl *

See Figure 12.23 for our example output, and notice the amount of detail that PowerShell provides versus the GUI.

Figure 12.23 Output for Get-StoragePool cmdlet

Allocating Pool Space to a Virtual Disk

From the outset of this chapter we have made it clear that when we reference a virtual disk in relation to storage space, we are not talking about VHDX for a virtual machine. In fact, unless you place a virtual machine in the storage space or use it as an iSCSI target store, you will not see a VHD anywhere.

Virtual disks are essentially disks that you carve out of your storage pool. You previously created a pool for your physical disks, and we showed you that from a Disk Manager perspective all the disks disappeared because they now belong to the storage pool. In order to use some of the space contained within your storage pool, you have to create a virtual disk. It is not directly related to a physical disk in the storage pool, but it is representative of a chunk of space you are allocating out of the storage pool. How that chunk comes into existence depends on the options we will discuss next.

One of the great things about virtualization in general is that you maximize the use of the hardware. For example, previously many organizations had one server role, which was a waste, but now you can have multiple roles that are completely isolated assigned to one server. Hopefully you are familiar with virtualization in general at this stage. A similar concept exists in storage pools and virtual disks.

As we have already said, storage pools are essentially logical containers for a set of physical disks that you want to aggregate. The virtual disks will be presented to a server for use as a volume. If you have three physical disks of 500 GB each combined in a storage pool, you have the potential for 1.5 TB of space. See Figure 12.24.

Figure 12.24 Storage pool allocation

Pretty cool. (We’re not taking into account redundancy just yet, because we will explain this shortly.) Now a system administrator gets a request from a new application team, and they require 2 TB of space for their application. However, when the system administrator reviews the projected growth, they realize that the 2 TB won’t be needed upfront, which is good because there is no budget for more disks. Sound familiar? The dilemma is what to do about it.

One of the first choices you have with virtual disks is whether they are fixed or thin provisioned:

Fixed With fixed, if you ask for 2 TB, you need 2 TB of capacity available for provisioning.

Thin Thin-provisioned disks use only what is needed at the moment. This is brilliant! In the previous example, the application team thinks they have the 2 TB capacity they asked for, but in fact they are using only a fraction of the 2 TB. See Figure 12.25.

Figure 12.25 Fixed and thin-provisioned disks

Managing Thin-provisioned Disks

Thin-provisioned disks can lead to overcommitment of resources and need to be managed. You need to create alerts to ensure that you monitor the free space left in the pool and in the virtual disk. The last thing you want is to have an outage because you overcommitted the resources. If used correctly, thin-provisioned disks can help system administrators mitigate storage costs and still meet the needs of the consumers.

Determining Disk Layout

Next, you need to decide on the layout of the virtual disk. There are three resiliency options, as follows:

Simple In this design, data is striped across all disks in the pool. There is no reliability in this layout. If a disk fails, you potentially lose all your data.

Mirror Mirroring the data duplicates it across different disks; this gives you maximum reliability but greatly impacts the amount of space you can potentially use. To protect from a single disk failure, you need at least two physical disks in your storage pool; to support two disk failures, you need at least five physical disks.

Parity Parity essentially writes data in stripes across all the disks but also writes parity information, so if a disk fails it can recover. This gives you excellent reliability and performance. To support a single disk failure you need at least three disks.

Figure 12.26 gives you a visual representation of the different layouts you can potentially use. As you will see in green the data is written across all disks. In yellow you will see if we write the data to one drive in a 4-drive mirror it will get written to a second disk. Finally in blue we show you data is written across all disks but parity information is written with it to allow for recovery.

Figure 12.26 Virtual disks layout

Creating a Virtual Disk in the GUI

The next step is to create a virtual disk. The easiest place to create the virtual disk is within the Storage Pools console of Server Manager, as shown in Figure 12.27.

Figure 12.27 Storage Pools console

1. In the lower left under Virtual Disks, click Tasks

New Virtual disk, as shown in Figure 12.28.

Figure 12.28 Creating a new virtual disk

2. Click Next on the welcome screen of the New Virtual Disk Wizard.

As shown in Figure 12.29, you need to select the storage pool that you want to create the virtual disk from. In our example we are going to use TestPool.

Figure 12.29 Select a storage pool to use for creating a virtual disk

3. Select your storage pool and click Next.

You need to assign a name to the virtual disk. You can also enter a description of what the virtual disk will be used for.

4. In our example, we’re naming it File_Vdisk, as shown in Figure 12.30, and indicating that it is to be used for file storage.

Figure 12.30 Naming and describing the virtual disk

The next step is to choose your storage layout, as shown in Figure 12.31. You have three options: Simple, Mirror, and Parity.

Figure 12.31 Storage layout for virtual disks

5. For this example, choose Simple and click Next.

Now you need to choose the provisioning type. You have two options, Thin or Fixed, as shown in Figure 12.32.

Figure 12.32 Provisioning type for the virtual disk

6. In this example, choose Thin, because you want to maximize your space in the storage pool.

You now need to decide on the size of the virtual disk. Since it is thinly provisioned, you could in theory enter any value here.

7. Our storage pool is about 400 GB, so we’ll assign 500 GB to the virtual disk, as shown in Figure 12.33. Click Next to continue.

Figure 12.33 Setting the size of the virtual disk

8. Finally, review your choices and confirm the settings by clicking Create.

9. Review the Results screen and ensure that everything is complete, as shown in Figure 12.34. Click Close to end the wizard.

Figure 12.34 Results screen for the new virtual disk

Side Exercise

Check out Disk Manager now and see what suddenly appeared!

Creating a Virtual Disk in PowerShell

First, let’s use PowerShell to look at the virtual disk we created in the previous example using the GUI. In Figure 12.35 we use the Get-VirtualDisk cmdlet to retrieve all information on virtual disks that we’ve created. As you can see, we have created only one, the 500 GB virtual disk.

Figure 12.35 Output of Get-VirtualDisk

To create a new virtual disk you need to use the cmdlet New-VirtualDisk. But you need to know the storage pool friendly name before you start. Do you remember the command for getting the storage pool friendly name?

Once you have the friendly name, follow these steps:

1. Use the following syntax to store the friendly name of the storage pool in a variable:

 $sp = (get-storagepool).friendlyname

The next step is to create the virtual disk, but we’ll show you the full syntax first:

New-VirtualDisk –StoragePoolFriendlyName $sp[1] –ResiliencySettingName Simple –Size 500GB –FriendlyName TestVdisk –ProvisioningType Thin –NumberofDataCopies 1 –NumberofColumns 2

As you can see, there are a few more options to select. Let’s look at a few of them:

ResiliencySettingName Equivalent to the storage layout options of Simple, Mirror, Parity.

NumberofDataCopies The number of copies of the data you want to keep; this option is directly related to ResiliencySettingName. If you choose Simple, for example, NumberofDatacopies can only be 1. If you choose Mirror, the NumberofDataCopies will be at least 2, depending on the amount of disk space you have in the system.

NumberofColumns Directly associated with the number of disks you want to use. A storage pool may have hundreds of disks, but you may want to stripe or mirror or use parity across only five disks. This option gives you the choice. This option also is related to both resiliencysettingname and numberofDataCopies.

The values we selected for NumberofDataCopies and NumberofColumns are related to the options we selected. For example if we want to mirror our data we increase the NumberofDataCopies and if we add to span our Vdisk across multiple disks we increase the Numberofcolumns. In our case we only want 1 copy of the data and we want to write the data across 2 disks.

2. The $sp[1] option we chose selects one element out of all the storage pools we captured using the get-storagepool command. For example if we have 5 storage pools the get-storagepool command will return all 5, this is no good so using [1] allows us to select the storage pool which is number 2 out of the 5 we captured. The count starts from 0 so the first storage pool can be retrieved using $sp[0].

In our lab environment we have multiple storage pools. If we just referenced $sp, the command would fail because it would try to insert (in our case) two storage pool friendly names.

See Figure 12.36, which shows you a sample run of the command we just outlined.

Figure 12.36 Sample output from creating Vdisk in PowerShell

3. Run get-virtualdisk now to review the output, and the disk you created should be listed.

Again, as an exercise, view the disk in Disk Manager.

Volumes from Virtual Disks

If you have ever provisioned a standard physical disk and created a volume and formatted it, then this should be very familiar territory.

There are several ways you can create volumes. Disk Manager and Diskpart are the two you are most familiar with, and there is absolutely nothing wrong with creating the volume from one of these if you so wish. However, for this example and to show you that you can do everything you need in relation to storage spaces directly from the Storage Pool UI, we will show you how to create a volume from there.

In Figure 12.37 you can see the Storage Spaces UI, and under Virtual Disks you can see File_Vdisk, which we created earlier.

Figure 12.37 Storage Spaces UI with our newly created virtual disk

1. Right-click the new virtual disk and select New Volume, as shown in Figure 12.38.

Figure 12.38 Creating a new volume from a virtual disk

2. Click Next on the welcome screen of the New Volume Wizard.

3. Select the server and disk—SS01 for the server and Disk 6, File_Vdisk, as shown in Figure 12.39—and click Next.

Figure 12.39 Selecting the server and virtual disk

As with normal disks, just because the full disk may be 500 GB, the volumes you create don’t have to be 500 GB. You can create multiple volumes of different sizes if you want. They just have to add up to 500 GB.

4. In our example, as shown in Figure 12.40, we’ll stay with the default size of 500 GB, which will be thinly provisioned.

Figure 12.40 Setting capacity for the volume

Next, as you would for a normal disk, select a drive letter or a folder where you want to mount the volume.

5. In our example, we’ll accept the default of E, as shown in Figure 12.41.

Figure 12.41 Selecting a drive letter for the volume

Now you can choose the file system. Notice you can choose only NTFS or ReFS, as shown in Figure 12.42.

Figure 12.42 File system settings

6. Select NTFS.

7. Rename the Volume Label to Test_Volume and click Next.

8. Confirm all the values and click Create.

9. Ensure in the Results screen, shown in Figure 12.43, that everything registers Completed, and click Close.

Figure 12.43 Results screen for creating the volume

10. Open Windows Explorer and notice that your new volume E: has appeared.

New Disks Are Offline by Default

By default, when you add a physical disk or a VHD or even a new virtual disk, it will always be in an offline state. Normally this is OK, but in a cloud environment always having to bring the disk online is an extra step you could do without. In the command-line utility Diskpart you have the ability to configure an option so that if you create a virtual disk, it will automatically come online.

To set the policy status for SAN disks to Online, in an elevated command line type the following:

Diskpart "san policy=OnlineAll"

See Figure 12.44 for sample outputs of running Diskpart and setting the SAN policy to Online.

Figure 12.44 Sample output for setting Diskpart SAN policy

Side Exercise

Using the steps we previously outlined, create a new virtual disk in TestPool. Does it come online?

Making Disks Online with PowerShell

As with everything, the old command-line tools are being replaced with PowerShell. Can you guess the cmdlets used to find out what disks are offline and set them online?

First, let’s use get-disk to figure out what state our disks are in. Figure 12.45 shows a sample output for get-disk, and as you can see, there are two disks offline.

Figure 12.45 Output for get-disk

1. Simply type get-disk and press Enter.

2. You could also filter on just offline disks by typing the following:

Get-disk |where {$_.operationalstatus –eq "Offline"}

3. To bring them online, you would use the set-disk cmdlet.

4. To bring all the disks online at once using the previous syntax, which we filtered for offline disks, you would pipe its output into the set-disk cmdlet to make it easier.

The syntax is shown here, and Figure 12.46 shows the sample output:

Figure 12.46 Bringing all disks online using PowerShell

Get-disk |where {$_.operationalstatus –eq "Offline"} |set-disk –isoffline $false

Storage-tiering Demo and Setup Using PowerShell

We have brought you through creating a storage pool, a virtual disk, and volumes for your environment. One of the things we mentioned at the beginning of this chapter was storage tiers. They can be of huge benefit to an environment because they allow you to split up your storage and charge-back based on the resources that the end users require. Essentially, if the end users require high-speed storage, you can allocate and bill accordingly; if they don’t, you can allocate low-end storage to serve their needs. If your company does charge-back, this will be of benefit to all end users because storage spaces will automatically move the more frequently accessed data to the fast storage tier and the less accessed data to the slow tier.

Since you are now familiar with the Storage Spaces console, you will notice that there is no place to configure storage pools within the UI. This feature can be configured only via PowerShell.

We’ve already created our storage pool named TestPool, so let’s use this as the friendly name.

As we’ve already said, you can create only two tiers in Windows Server 2012 R2. Solid State Drive (SSD) and Hard Disk Drive (HDD) are the two media types the system recognizes.

In our lab if we run the PowerShell cmdlet get-physicaldisk, we get the output shown in Figure 12.47.

Figure 12.47 Sample output of get-hysical disk for creating storage tiers

Creating SSD and HDD Pools

As you can see, we have SSD and HDD drives in our environment. Now we’ll create our storage tiers. We are going to create two tiers in our example (which is also the maximum supported), and then we’ll create a virtual disk that will be allocated across the tiers. We will then partition and format the disk for use.

Using the cmdlet New-StorageTier, here is the syntax to use for creating the SSD pool. We have to store it in a variable for later use:

$ssdtier = new-storagetier –StoragePoolFriendlyName "TestPool" –FriendlyName SSD_Tier –Mediatype SSD

For the HHD tier we use the following syntax:

$hddtier = new-storagetier –StoragePoolFriendlyName "TestPool" –FriendlyName HDD_Tier –Mediatype HDD

The next step is to add a virtual disk and tie it to the storage tiers. Before you ask whether you can remap an existing virtual disk to a storage tier, the answer is no.

With that in mind, we’ll create a new virtual disk, which we will tie to our storage tiers. Here is the syntax to use:

New-VirtualDisk –StoragePoolFriendlyName TestPool –FriendlyName Tiered_VDisk –StorageTiers @($ssdtier, $hddtier) –StorageTierSizes @(10GB, 50GB) –ResiliencySettingName Simple

Figure 12.48 shows the output of successfully creating a disk.

Figure 12.48 Output of creating a virtual disk in storage tiers

Most of the options should be familiar from creating a virtual disk earlier. However, we have two options related to creating a virtual disk in a storage tier:

StorageTiers @($ssdtier, $hddtier) This specifies the tiers you can use. This is not a hash table, so be careful not to use curly brackets. We have stored our tiers in separate variables for ease of reference.

StorageTierSizes @(10GB, 50GB) This specifies the size of each tier and is referenced in the order you set in the –StorageTier option. As you saw in Figure 12.48, the total size is 60 GB, which is 10 GB + 50 GB.

From here you would need to create a volume as before for storing data. We’ll show you a quick PowerShell trick you can use to create a 20 GB volume on the disk and format it all in one line! Here is the syntax:

Get-VirtualDisk | Get-Disk | New-Partition –Size 20GB –AssignDriveLetter | Format-Volume –Force –confirm:$false

See Figure 12.49 for the output of the command. Now you can navigate to the drive letter and copy a file.

Figure 12.49 Creating a new partition and formatting it in PowerShell

Take a moment to review the properties of the virtual disk we previously created. In Figure 12.50 notice how the capacity is split because we set it between the different tiers.

Figure 12.50 Properties of a tiered virtual disk

Using the Write-back Cache

Now we’ll show you how to use one of the last major features of Storage Spaces in Windows Server 2012 R2, the write-back cache. As with storage tiers, you cannot enable this via the GUI; it must be done via PowerShell. Remember, the write-back cache can help speed up applications because writing to cache is quick and doesn’t have to wait for storage to catch up to commit the write.

Take the PowerShell command we used to build our previous storage tier and modify the name and the StorageTierSizes options. Then add the –WriteCacheSize option with a size setting; in this case you want a write-back cache size of 2 GB:

New-VirtualDisk –StoragePoolFriendlyName TestPool1 –FriendlyName Tiered_VDisk –StorageTiers @($ssdtier, $hddtier) –StorageTierSizes @(20GB, 70GB) –ResiliencySettingName Simple –WriteCacheSize 2GB

Voilà You have now created a virtual disk that will use storage tiers, with write-back cache enabled.

Storage Tiers Optimization

The final thing we discussed about storage tiers at the start of this chapter is that every night at 1:00 A.M. it will run a job to reprioritize the storage and move around what needs to be in the fast tier versus the slow tier.

In Task Scheduler, choose Task Scheduler Library Microsoft Windows Storage Tiers Management. Task Scheduler lists a job called Storage Tiers Optimization, as shown in Figure 12.51.

Figure 12.51 Storage Tiers Optimization task

You can modify the task or manually trigger it if necessary.

iSCSI on Storage Spaces

Storage spaces are incredibly useful for providing scalable and reliable back-end storage. Think of the amount of money a company would have to invest to get the abilities we have already outlined. What would be really useful now is to combine all this powerful storage technology with iSCSI so you can allow remote systems (such as file servers, mail systems, virtualization clusters, and the like) to also benefit from these features.

iSCSI requires a few elements to be configured in order for it to present logical unit numbers (LUNs) to remote machines. First, we’ll explain a few items that make up iSCSI from a host server and remote server perspective that you’ll need to know in order to understand the example we’ll use:

iSCSI Target Server This allows iSCSI initiators to make a connection to the target service, which in turns presents a VHD that’s located on a target server’s volume. To the target server’s operating system, this appears as a VHD file. You can configure access control to secure the disk appropriately.

iSCSI Virtual Disk The iSCSI virtual disk in this case is an actual VHD when viewed on the target server, but when viewed from a client server or initiator point of view, it appears as a disk that can be brought online or offline and have volumes created on it.

iSCSI Initiator The initiator is the client software used to connect to a target server and access whichever iSCSI virtual disks have been presented and it is authorized to access.

This technology is commonplace in most businesses today, and it allows them to create clusters for all sorts of business reasons. In my previous place of employment we used a Windows server with the iSCSI target server to create a Hyper-V cluster to run our production network. In the next section we’ll walk you through an example of setting up the iSCSI target service, creating a virtual disk, and presenting it to a remote system.

Adding the iSCSI Target Service

By default, the iSCSI target service is not enabled. You must add it, and you’ll do this via PowerShell. The syntax for adding the Windows feature iSCSI Target server is:

Add-windowsfeature FS-iSCSITarget-Server –IncludeManagementTools

A server reboot may be required after adding the feature, so make sure you are in a position to be able to complete the installation.

The iSCSI Target server is a File and Storage Services subfeature, and that means that you can administer it via the Server Manager console under File and Storage Services, as shown in Figure 12.52.

Figure 12.52 iSCSI Target server management

Creating an iSCSI Virtual Disk

As you can see from Figure 12.52, there are two main screens: iSCSI Virtual Disks and iSCSI Targets. As we have already said, a target will present the iSCSI virtual disks that have been created. (Don’t confuse them with virtual disks in Storage Spaces. They are different; iSCSI virtual disks appear as VHD files on the Target server.)

To demonstrate this, let’s create an iSCSI virtual disk. In our example we will be using the E drive we created earlier from our tiered storage pool. Don’t worry if you haven’t set it up; all you need is a drive and a folder to store the VHD you are going to create.

1. In the center of the iSCSI Virtual Disks window shown in Figure 12.52, click “To create an iSCSI virtual disk, start the new iSCSI Virtual Disk Wizard.”

2. Select the Target server that’s listed and a volume where you wish to store the iSCSI virtual disk.

In our example this will be E:, as shown in Figure 12.53.

Figure 12.53 Selecting a server and a volume for an iSCSI virtual disk

3. Give the iSCSI virtual disk a descriptive name; for example, if it’s for a Hyper-V cluster, type VMCluster_Vdisk. Notice the path in Figure 12.54.

Figure 12.54 iSCSI Virtual Disk Name screen

4. Next, enter the size of the virtual disk; for our example type 50 GB.

Notice the options; you can choose whether you want to provision all the space at once using the fixed option, provision a dynamically expanding disk, or use a differencing disk. These options will seem familiar if you are used to Hyper-V.

5. Choose Dynamically Expanding in this case.

Choosing the Fixed or Dynamically Expanding Option

It is worth noting that you need to be careful when choosing among Fixed, Dynamically Expanding, or Differencing. Choosing the wrong type can dramatically affect performance. As a rule of thumb, if you are unsure and do not know the type of workload that will eventually use that disk, choose Fixed.

6. Since this is a new server and you don’t have any iSCSI Target servers yet, you need to select the “New iSCSI target” option, as shown in Figure 12.55.

Figure 12.55 New iSCSI target

7. Give the iSCSI target a name; for this example type VMCluster_Target, as shown in Figure 12.56.

Figure 12.56 New iSCSI target name

Next, you need to configure access to the iSCSI virtual disk you are creating. You can authorize specific initiators based on their IQN or DNS name, IP address, or MAC address.

The IQN (iSCSI qualified name) is an automatically generated name. For Microsoft Servers it usually is in the format: iqn.1991-06.com.microsoft:servername.

8. On the Specify Access Servers screen of the wizard, click Add.

This will bring up the “Add initiator ID” window, as shown in Figure 12.57.

Figure 12.57 Add initiator ID

If you are familiar with iSCSI, you will notice a new option for Windows Server 2012 and above. If you do not know the iSCSI initiator qualified name, you can query a remote server for it. (IQN is simply a naming convention for iSCSI that is consistent with the format of the machine; usually it follows the format: iqn.1991-05.com.microsoft:server01.contoso.com.)

When deploying iSCSI in the past, I had a preference for IQNs because they do not change unless you change the machine name. Other options, as described earlier, have the ability to change easily in an environment, and if you are presenting these LUNs to remote machines, you don’t want that to happen.

9. As shown in Figure 12.57, select the ID from the initiator cache on the target server.

Next, you can challenge for authentication to a LUN using CHAP (Challenge-Handshake Authentication Protocol is an authentication protocol to control access to resources). In our example we will ignore this.

10. Finally, review all the settings and click Create.

Side Exercise

Use the iSCSI cmdlets to review the iSCSI Target server and virtual disk deployed.

The cmdlets you require are get-iscsitargetserver and get-iscsivirtualdisk.

Do you fancy creating a new virtual disk for iSCSI in PowerShell and presenting it to a target? Here are some sample cmdlets you can use:

1. Create the virtual disk using the New-ISCSIVirtualdisk cmdlet.

Here is an example:

New-IscsiVirtualDisk –path e:\newdisk.vhdx –SizeBytes 20GB –Computer name SS01

2. Add that disk to your target using the Add-IscsiVirtualDiskTargetMapping cmdlet.

Here is an example:

Add-IscsiVirtualDiskTargetMapping –TargetName VMcluster-Target –path e:\newdisk.vhdx

Done!

Connecting to an iSCSI Virtual Disk from the Client Side

You have provisioned an iSCSI Target server and a new virtual disk, but they are of no use until a client connects to the LUN. Remember that if you set up access lists, you will be able to connect to the LUN only from that specified machine.

1. Select the iSCSI initiator located in the Tools menu under Server Manager, as shown in Figure 12.58.

Figure 12.58 Locating iSCSI initiator

The iSCSI Initiator Properties window should appear.

2. To follow along with our example, simply type 192.168.0.1 in the Quick Connect box and click Quick Connect, as shown in Figure 12.59.

Figure 12.59 iSCSI Initiator Properties – Quick Connect

A dialog box will appear verifying the status as Connected. This will ensure the LUN can be seen and you have set up the access rules correctly. See Figure 12.60. Click Done to continue.

Figure 12.60 Successful connection to iSCSI target

3. Next, select the Volumes and Devices tab and click Auto Configure, as shown in Figure 12.61.

Figure 12.61 Volumes and Devices – Auto Configure

This will autopopulate the volumes that are being presented to the client.

4. Finally, from Server Manager, under File and Storage Services, click Volumes

Disks.

As shown in Figure 12.62, we have two new disks with Bus Type listed as iSCSI. They are now available to format and create standard volumes out of.

Figure 12.62 Displaying newly added iSCSI disks

NFS Shares

Network File System (NFS) allows you to share files between a Windows server and a Unix/Linux platform using the NFS Protocol. In Windows Server 2012 the following improvements were introduced:

NFS Version 4.1 Support This includes easier accessibility through firewalls, RPCSEC_GSS protocol for enhanced security, client and server security negotiation, Windows and Unix file semantics, better support for clustered file servers, and WAN-friendly compound procedures.

Improved Performance No more tuning is necessary because by using the new native RPC-XDR protocol, you should achieve optimal performance out of the box.

Easier Manageability You can manage via PowerShell and a unified GUI in Server Manager. RPC port 2049 makes it easier to configure firewalls. Another improvement is better identity mapping, and there is a new WMIv2 provider.

NFSv3 HA Improvements There are now improved failover times with the new per-physical disk resource and tuned failover paths. This makes failover time fast for NFS clients.

Where to Use an NFS Share

NFS is used in environments where you have a requirement for file shares in a mixed operating system environment (such as Windows and Unix/Linux). With the improvements in Windows Server 2012, you can now present a share with NFS and SMB at the same time.

A common use for this has been found in some third-party hypervisors using Windows Server 2012 NFS shares as data stores for templates and ISOs.

Quick NFS Share Setup

We’ll now show you how to provision an NFS share. Since we are in a hurry, let’s use PowerShell. Add the NFS service to Windows using the following syntax:

Add-WindowsFeature FS-NFS-Service

We have a directory we want to share in our lab under the path E:\shares. We will guide you through this process using the GUI:

1. Open Server Manager and navigate to File and Storage Services.

2. Click the Shares menu since we are going to be working with shares.

This will bring you to the Shares management area, as shown in Figure 12.63.

Figure 12.63 Share management in Server Manager

3. In the Shares area of the window, click Tasks

New Share.

This will invoke the New Share Wizard, as shown in Figure 12.64.

Figure 12.64 New Share Wizard

4. Click NFS Share - Quick.

5. Select your server. In our lab it will be SS01.

6. In the Share Location screen click “Type a custom path,” and enter the path to the share.

In our lab it is e:\shares, as shown in Figure 12.65.

Figure 12.65 Server and path for share

7. Enter a share name. In our lab it is shares, as shown in Figure 12.66.

Figure 12.66 Enter a share name.

Selecting the right authentication mechanism is highly dependent on the environment you are integrating into it. In our case we have not enabled our Linux client for Kerberos authentication because it is a stand-alone client. We have chosen No Server Authentication (AUTH_SYS) and “Enable unmapped user access,” as shown in Figure 12.67.

Figure 12.67 Authentication methods

Connecting to NFS from the Client Side

In our lab we have Linux Mint deployed. By default, we are able to connect to the share, but we get various errors when we try to browse the share or create a directory. Linux Mint, along with many other distributions, requires you to install the nfs-common package before you can read from an NFS share. Follow these steps to install the package:

1. From a terminal window type:

sudo apt-get install nfs-common

This will install the necessary items to allow you to browse the share.

Now you can mount the share that you previously created on your test Windows server.

2. Again from a terminal window type:

sudo mount -t nfs 192.168.0.1:/Shares /mnt/share

There is no output; rather, you have to browse to the directory or mount point (/mnt/share) you specified. Here’s an explanation of this syntax:

Sudo Super user do (Privileged execution for performing certain tasks)

Mount Used to mount various types of file systems

-t nfs The NFS file system to mount

192.168.0.1:/Shares The remote share you are mounting

/mnt/share The local mount point

3. Next, browse to the share by typing the following:

cd /mnt/share

4. Now type the following command to list the directory contents:

ls

On Windows Server you have created a file called Readme.txt. You should be able to see this file after you issue the ls command. The Readme.txt file is just an example, try placing some of your own files in the share on the windows server and retry the ls command on the linux client.

Deduplication: Disk and Network

Windows Server 2012 introduced Data Deduplication as a native storage feature. Data Deduplication is a more efficient way of storing data. With the ever-increasing need for storage in cloud technologies, you can imagine the amount of duplicate files that are stored. Even at home I have several copies of ISO files or virtual hard disks for my USB storage and servers. These files are 3–7 GB each. I’m wasting a lot of storage space by keeping multiple copies and not coming up with a proper library system.

This is a simple example but it brings up another point: the files all have similar parts and they all take up space. Wouldn’t it be cool if you could identify those common pieces, create a single master reference on disk, and then point to it for every other file that has that common piece? You have this ability in Data Deduplication.

Data Deduplication in Windows uses a concept called the chunk store. A file gets split into variable-size chunks usually between 32 KB and 128 KB; on average a chunk is around 64 KB. These chunks are compressed and stored in the chunk store. Each chunk is stored in a chunk container, which grows to about 1 GB in size before a new container is created. You can view the chunk store and its containers on the root of the volume in a folder called System Volume Information. The folder by default is locked down to just the System account, so you must take ownership of it and ensure that the System account remains in full control. A reparse point replaces the normal file. If the file is accessed, the reparse point shows where the data is stored and restores the file. See Figure 12.70.

Figure 12.70 Data dedup in action

Although not installed by default, Data Deduplication is designed to be easy to deploy. It also has been designed to have zero impact on the users; in fact, the users won’t even notice anything. You can turn on Data Deduplication on any of your primary data volumes with minimal impact on performance. It was designed to not interfere with files that are new or that are currently being written to. Rather it will wait, and every hour it will check for files that are eligible for deduplication. You can reschedule the process according to the needs of your company.

Eligibility for deduplication starts with files that are over three days old (again this is configurable based on needs), and it always excludes files that are smaller than 32 KB, have extended attributes, or are encrypted. If you have other files that you don’t want part to be of the dedup process, this is also configurable.

Deduplication happens on network traffic as well. As traffic is sent or received, it is assessed to see if it can be deduplicated, effectively reducing the potential amount of traffic that has to be sent or received. Unlike storage deduplication, you cannot modify a schedule or data type for the network dedup.

However, there are a few things to be aware of before continuing. Dedup is supported only on NTFS volumes, and you cannot dedup a boot or system drive. In Windows Server 2012 it can’t be used with CSV, live VMs, or SQL databases.

So what’s new in Windows Server 2012 R2 for Data Deduplication? The key focus was on allowing deduplication for live VMs. That’s right; you can dedup the VHDs and VHDXs that your live VMs are using. Primarily you can use it in VDI scenarios, with a further focus on remote storage. With these enhancements you can dedup your VDI environment. It is also worth noting that although it is not supported for other virtualized workloads, there are no specific blockers to stop you from enabling it. As always, the results cannot be guaranteed.

This is an amazing technology to have natively within Windows Server, and it will provide substantial savings in terms of storage for a business. Next we’ll show you how to configure it.

First, you need to add Data Deduplication. You can add this feature using PowerShell. The syntax is as follows:

Add-WindowsFeature FS-Data-Deduplication

Then you can configure it via Server Manager or PowerShell.

Configuring Data Dedup with Server Manager

We’ll explore the Server Manager method first.

1. Open Server Manager, click File and Storage Services, and choose Volumes.

2. Right-click the volume you want to configure Data Deduplication on, and select Configure Data Deduplication, as shown in Figure 12.71.

Figure 12.71 Configuring Data Deduplication

3. Click the drop-down box beside “Data deduplication” and select “General purpose file server.”

Notice the other option, “Virtual Desktop Infrastructure (VDI) server,” as shown in Figure 12.72. Click OK.

Figure 12.72 Enabling Data Deduplication

Next, you need to decide how old the files must be before they are processed by the dedup engine. This means there will be no impact on newly created files for the specified time.

4. In our example, we’ll keep it at 3 days.

You can modify this value later if you change your mind. See Figure 12.73.

Figure 12.73 Configuring New Volume Deduplication Settings

Also on this screen, you can choose any extensions you wish to exclude from the dedup process. For example, you may not want to dedup a SQL database or an Access database. If you want multiple entries in the field, include a comma between each entry. For example, if you wanted to exclude SQL database files and the Active Directory database file (ntds.dit), you would type in the field mdf,dit.

5. Exclude these extensions, as shown in Figure 12.73.

Excluding a file is great, but you may have some folders in the organization that are highly sensitive, and for that reason you can’t dedup them. They may have common chunks, but you can’t risk the possibility of a disk corruption on the chunk store, which could potentially affect the information. Of course, this is a highly unlikely scenario, but it does show you that you can exclude a folder of sensitive information.

6. In this case, we’ll exclude E:\shares because this is our previously created NFS share.

We are not 100 percent sure about what is being stored there, and we don’t want to take a risk without further investigation.

At the start of this chapter we said that the dedup engine has a background process that will run every hour by default. In Figure 12.73 you also have the option to change that schedule.

7. Click the Set Deduplication Schedule button, and you will see three check boxes, as shown in Figure 12.74.

Figure 12.74 Changing the dedup schedule

By default background optimization is turned on, but you can also enable throughput optimization, which will force the optimization job through when dealing with large amounts of data. Microsoft says a throughput job can process roughly 2 TB of data per volume in a 24-hour period on a single volume. If you have multiple volumes, you can run this in parallel.

8. For this example, leave the schedule as is.

When you review the volume in Server Manager, you will see the Deduplication Rate (measured in %) and Deduplication Savings (measured in bytes) columns, as shown in Figure 12.75.

Figure 12.75 Viewing deduplication information in Server Manager

Configuring Data Dedup with PowerShell

We’ll now show how to work with deduplication in PowerShell. Figure 12.76 shows the available PowerShell cmdlets.

Figure 12.76 PowerShell cmdlets for deduplication

1. To enable dedup for a volume, use the following syntax:

Enable-DedupVolume E:\

The output is shown in Figure 12.77.

Figure 12.77 Enabling dedup output with PowerShell

You know that dedup is enabled, but you’d like to find out how much it has saved you and other such data.

2. Use the Get-DedupStatus cmdlet, as shown in Figure 12.78.

Figure 12.78 Get-DeDupStatus output

On the volumes in our lab we haven’t had a lot of information to dedup, hence the values in the screenshots are at 0%, but this has been for a reason. In our configuration we excluded E:\shares because we’re not 100 percent sure what is stored there, and since it is an NFS share for Linux, we just don’t want to take the chance. (In practical terms it doesn’t matter what the client is; it is transparent.) Inside E:\shares we copied a Win2012R2_Preview ISO, which is about 4 GB in size. We created an additional folder at E:\TestData and have copied two Win2012R2_Preview ISOs under different names and also a Technical folder, which contains documents on technical information. See Figure 12.79 for our sample data.

Figure 12.79 Contents of E:\TestData

Remember I said that although we have copied this data, we now have to wait three days for it to be included for deduplication? If you can’t wait that long, you can use the cmdlet Start-DedupJob to accelerate this.

3. The syntax is as follows:

Start-DedupJob –Type Optimization –Volume E:

As you can see in Figure 12.80, dedup has started a manual schedule and is in its current state of Queued. You can accelerate it from here if you want.

Figure 12.80 Output of Start-DedupJob

4. In Task Scheduler, choose Task Scheduler Library

Microsoft

Windows

Deduplication.

As shown in Figure 12.81, you will see three jobs listed (we will explain the last two jobs later in this chapter):

Figure 12.81 Manually invoking BackgroundOptimization

BackgroundOptimization
WeeklyGarbargeCollection
WeeklyScrubbing

5. Right-click BackgroundOptimization, return to PowerShell, and use the cmdlet Get-DedupJob.

See Figure 12.82 for sample output.

Figure 12.82 Sample output of the Get-DedupJob cmdlet

Figure 12.83 shows just how much it has saved already, and it is not even finished.

Figure 12.83 Output of Get-DedupStatus while Get-DedupJob is running

6. Now compare the output in Figure 12.84 when the Get-DedupJob cmdlet has completed.

Figure 12.84 Output of Get-DedupStatus when optimization is complete

In our test lab we have already saved 4.64 GB, which is great because storage is tight!

7. Try using the Get-DeDupVolume cmdlet for a different output view.

8. As an additional exercise, remove E:\Shares from the excluded folder selection and rerun the optimization job.

How much space is freed up now?

9. Finally, right-click the E:\TestData folder and view its properties.

Figure 12.85 shows the properties of the TestData folder in our lab. Notice the difference in the Size and Size on disk values?

Figure 12.85 Folder properties after running dedup

So now you have seen dedup in action. But where has the data actually gone?

10. Use the cmdlet Get-DedupMetadata to view information on the chunk store we talked about at the introduction to this section.

See Figure 12.86 for the output of the cmdlet.

Figure 12.86 Get-DedupMetadata output

As you saw in Figure 12.81, Task Scheduler has two other jobs available that run on a weekly basis. We’ll discuss both of them now. First, we’ll talk about GarbageCollection.

GarbageCollection is configured to run on a weekly basis by default, but you can invoke it as needed. The GarbageCollection job cleans up the chunk store by removing unused chunks, which releases disk space. You can see that it is an important job.

To manually invoke a garbage collection, use the Start-DedupJob cmdlet as follows:

Start-DeDupjob –Type GarbageCollection –volume E:

This will queue the job until the system is idle, or you can run the job from within Task Scheduler to accelerate it.

Side Exercise

Delete all the ISO files you used throughout this lab and empty the Recycle Bin. Run an optimization job and then run a garbage-collection job. View the chunk store size after the jobs are complete using the cmdlet Get-DedupMetadata. The following illustration shows the reduction in our chunk store when we performed this exercise in the lab.

Checking for Corrupt Volumes

The last thing in relation to dedup that we will talk about in this chapter is volume corruption checks. As you can imagine, the more commonality found in files, the more the chunk store will grow, and the more reparse points that will exist on disk.

Imagine if the disk sector where part of a chunk exists became corrupted. You’d risk losing potentially hundreds or thousands of files. Although this is a rare occurrence, especially if you combine it with resiliency techniques, there is a potential for it to happen. Dedup has some special built-in checks that will prevent this from happening.

For example, dedup has redundancy for critical metadata; it also provides redundancy for the most accessed chunks (if a chunk is accessed more than 100 times, it becomes a hot spot). It provides a log file to record the details of any corruption, and later through the use of scrubbing jobs it will analyze the log and make repairs.

Repairs can be made from the backup of the working copy when referring to the critical metadata or the hot spots. If you have dedupped a mirrored storage space, dedup can use the mirrored data to repair the chunk.

As with optimization jobs and garbage-collection jobs, scrubbing jobs happen on a scheduled basis and can be configured to happen more often than the default of one week.

You can trigger a job with PowerShell using the following syntax:

Start-DeDupJob –Type Scrubbing –Volume E:

This will invoke a verification job against the E: drive volume but will check only the entries in the corruption log file.

To check the integrity of the entire deduplicated volume, use the following command:

Start-DeDupJon –Type Scrubbing –Volume E: -full

To review the output of the scrubbing, check Event Viewer. All output for a scrubbing job is stored in Event Viewer Applications and Services Logs Microsoft Windows Deduplication Scrubbing. See Figure 12.87.

Figure 12.87 Event Viewer Scrubbing log

The Bottom Line

Create a storage pool on a virtual disk. Storage is an ever-growing business requirement. If you were constantly buying SAN solutions to meet this need, it would prove very costly. Also, it is very hard to predict what you may need in a year’s time. How would you manage your storage to get the most out of it and to meet your future storage needs?

Master It In your lab create a storage pool using the GUI with three disks. Create a virtual disk three times the size of the total usable capacity of the disk. Format it and get it ready to use.

Create additional storage on a virtual disk. A common occurrence in enterprises today is last-minute requests for provisioning of applications that require large amounts of storage. Often the storage available locally in the server is not large enough to meet the need. How can you get additional storage onto the server without adding local storage?

Master It In your lab deploy an iSCSI target, create a virtual disk, and then connect your server to use the newly created storage.

Use deduplication techniques to reduce file size. Part of the reason behind data growth in today’s environments is the availability of storage, but storage will become a problem sooner rather than later. A high percentage of these files contain a large degree of identical data patterns, but using deduplication techniques can dramatically reduce the disk space required and make better overall use of the storage in place.

Master It In your lab copy an ISO multiple times into different shares, and repeat for office documents that are not located on your System volume. Enable Deduplication on the data drive, and exclude a share of importance in your environment.