Unmanaged SMB 3.0 Shares in SCVMM2012 R2

With Windows Server 2012, Microsoft introduced support of SMB 3.0.  With that introduction, there also came support of using the SMB 3.0 share location as a placement option for application files, including Hyper-V VHDs and VHDXs.  This came about, because the new SMB protocol supported scalability and high-availability enhancements.  These were discussed in an earlier post.

With storage arrays like EMC’s Unified VNX and VNXe, a file gateway is included, and with the appropriate VNX OE installed, SMB 3.0 is supported.  Using that as a storage target, customers can deploy standalone and clustered Windows Server 2012 (and later) implementations and use these SMB 3.0 targets as valid locations for applications (SQL Server, for example) and Hyper-V VMs.  Because the fileshare is seen by all nodes in the cluster, and because persistent filehandles are supported, Hyper-V virtual machines with VHD/VHDX file located on these fileshares, can be enabled as Clustered Roles, and will support all normal Windows cluster functionality, including Live Migration, etc.

For many customers however, usage of the SMB 3.0 share model also includes a management aspect.  What I mean by this, is that they want to put the Hyper-V hosts under the control of System Center Virtual Machine Manager 2012 R2.  Which, of course is all goodness as the management features add a great deal of value and function.  But then comes the desire to have the fileshares seen and possibly managed by SCVMM2012 R2.  Without introducing awareness of the existence of the file shares to SCVMM 2012 R2, these entities are not known, which means that they are not considered targets for VM deployments … and that will never do.

Management Versus Tolerance

I’m going to make a distinction between two aspects, which may be referred to as Management and Tolerance.  Management means the proactive operations against an object, which include , amongst other things, creation and deletion of an object (in this case a fileshare). Tolerance is simply the introduction of awareness of an object to a system, and the ability of the two to coexist.  SCVMM 2012 R2 can be tolerant of an entity such as a third party, unmanaged fileshare on Windows 2012 Server running Hyper-V.  Thus SCVMM 2012 can be aware of and use certain objects without needing to manage them.  This may not work for all requirements, but for many, this may be sufficient.

When we talk about Management of an SMB 3.0 fileshare, we are either talking about native Windows Server SMB management (if it’s a Windows 2012 Server that’s creating the SMB 3.0 share, which, for these scenarios, is typically a Scale-out fileshare), or for third party arrays with SMB 3.0, this will require an SMI-S File Provider.  Note that this is a FILE SMI-S provider.  I say that, because you can have separate FILE and BLOCK providers.  Most people will be familiar with the block providers – EMC’s had a block provider for quite some time, and this provides the interface for management of block storage devices from within SCVMM 2012 R2.  

For may existing storage arrays, there may not be an SMI-S file provider, and we may not be able to manage the SMB 3.0 fileshares, but we can introduce a level of tolerance of an SMB 3.0 fileshare into SCVMM 2012 R2, without the need for an SMI-S File Provider.

It is the File SMI-S Provider that will invariably trip up most customers.  EMC introduced a File Provider with the VNX2 (applicable to VMAX with the appropriate Gateway) product.  Since VNX and VNXe preceded the definition of the SMI-S File specification, they do not have a File Provider at this time.

Configuring System Center 2012 R2 for Unmanaged Fileshares

We’ll assume that the target Windows Server 2012 cluster (also works for a standalone Windows 2012 Server) has previously been deployed, and is known to the SCVMM 2012 R2 infrastructure.

SCVMM 2012 R2 Account requirement

One item that many deployments fail to implement, are appropriate account requirements for SCVMM for the managed systems (Host and Clusters).  The requirement is the usage of a RunAs account when adding a Cluster or Host to the SCVMM 2012 environment.  What invariably happens is that administrators will add the resource, without assigning a “RunAs” account to manage the host/cluster.  That’s a problem here, because we need to use SCVMM 2012 to assign the FileShare, and it will proxy user access to the share from the node using this defined RunAs account.  The problem is, that if you did not do this when adding the cluster resource, when attempting to access the FileShare, various failures will typically occur.

If the cluster/host has already been added to SCVMM 2012 R2, then you will find that the option is grayed out, and inaccessible via the UI.  You get to see if this RunAs account has been assigned by looking at the properties of an individual cluster node, in the Fabric -> Servers area of VMM, and you can see an example below.  The “Browse” option to associate a different RunAs account is grayed out and unavailable.

00 NodePermission

The most direct way to assign a RunAs account, is to remove the Cluster from SCVMM 2012 R2, and then re-add the cluster specifying the RunAs account. While that is one way to do it, this can cause some side-effects if there are existing resources already deployed into the cluster.  One alternative is to use PowerShell to save some angst, and mitigate impact  to existing resources.

The following script, run on the SCVMM 2012 server, will assign a RunAs account.  This account (the actual Domain user account) will need to be the same account that was used, or will be used, to assign access permissions on the share.  In this example case, the user account is defined within the RunAs account “DomainUserForFileShare”, and the cluster is “CLOUD2-CLUSTER”.  You will need to define your own RunAs name, and have assigned the correct domain user account.

$MyCluster = Get-SCVMHostCluster -Name CLOUD2-CLUSTER
$MyRunAsAccount = Get-SCRunAsAccount -Name DomainUserForFileShare
Set-SCVmHostCluster -VMHostCluster $MyCluster -VMHostManagementCredential  $MyRunAsAccount

In the event that this does not end up working, then it may be necessary to resort to the removal of the cluster from SCVMM and then the re-addition of the cluster (using a RunAs account).

FileShare Security/Access Requirement

The account defined for use as the RunAs account when adding the cluster, then also needs to be assigned permissions on the fileshare itself. For VNX/VNXe arrays this can be done by using the Computer Management MMC from any Windows host, and connecting to the VNX/VNXe (as is the case here). The remote connection is specified via a right-click against the root level item in the MMC as shown below:

4 FileShare User

Once connected, expand the tree to show the fileshare on the VNX/VNXe, and select the Properties option as shown below:

6 FileShare Setting

It will be necessary to configure appropriate permission on the share, by adding the domain account used within the RunAs account.  I would recommend adding the Computer accounts (all nodes, and the Cluster object) with the same permissions.  These accounts should all be assigned the permissions required by SCVMM 2012 R2:

  • Share permissions: Full
  • NTFS permissions: Modify, ChangePermissions, DeleteSubdirectoriesAndFiles

This will then provide appropriate access for the connections made from the cluster nodes for any services being accessed.  As an aside, if operations appear to fail when executed from within SCVMM (for example, when deploying a virtual machine), it can be useful to use the Computer Management MMC to see if there are security failures.  You will find these in the “Event Viewer” node, in the “Security” item.

Add the Unmanaged File Share to SCVMM 2012 R2

With all the appropriate security requirements implemented, the fileshare may be added, such that SCVMM 2012 R2 becomes aware of its existence, and allows for it to be used for deploying virtual machines to.  To do this, we add the fileshare to the Cluster entity itself from within SCVMM 2012 R2.  In the image below, we first select the cluster object in the “VMs and Services” section (although the same option is accessible in the Fabric section for the Cluster), and select the “File Share Storage” item.

1 VMM Cluster

From this location, shares can then be added via the “Add” option, as shown in the example below. It is assumed here, that the VNX/VNXe has been appropriately added to the domain, and is correctly defined within the DNS being used. 

2 VMM Add Unmanaged

It is recommended to watch the job execution within SCVMM 2012 R2 after the share is added to ensure that there are no errors logged for the cluster nodes.

If successfully added, the fileshare will be defined against all nodes, and can now be used for virtual machine placement.  The VNX/VNXe systems support Continuous availability and multi-channel connections, and therefore fulfill all requirements for implementing scalable and highly available virtual machine storage.

Virtual Machine deployments from within SCVMM 2012 R2 can now be executed through the UI, or via scripting, and specify the fileshare location for the VHD/VHDX storage.  In the example below, a virtual machine was deployed to this unmanaged share from within SCVMM 2012 R2, and the VHDX location is highlighted.

VM Config

 

Caveats

Because this is an unmanaged fileshare.  SCVMM 2012 R2 will show an alert next to the assigned fileshare, as shown below.  This is as a result of its unmanaged state, and the inability of SCVMM 2012 R2 to set and change account permissions directly on the share.

FileShareWarning

In addition, because there is no management of the fileshare, SCVMM 2012 R2 is unable to define new shares or remove existing ones.  These operations will need to be affected manually by the administrator.  EMC does provide a number of PowerShell utilities to control and manage the fileshares on VNX and VNXe systems.

Maintenance at 3AM in the Morning

Tags

, ,

The trouble with computers is that they don’t sleep (I’m talking metaphorically here – not sleep states) … the trouble with people is that they generally don’t think at scale.  When those two worlds collide, bad things can start to happen. Windows Server 2012 and Windows Server 2012 R2 (as well as the Windows 8 and 8.1 clients) include various Task Scheduler jobs that optimize, and otherwise attempt to do good things to the system … mostly.

I’ve previously posted on using Thin devices for Windows Server deployments, and specifically about the improved value proposition when Windows Server 2012 (and beyond) detect that the storage is thin.  Windows will start to issue UNMAP operations against those LUNs, with view to deallocating space that was previously allocated to a file that perhaps got deleted.  That part of the story doesn’t change.  You might have heard or seen passing references to the fact that there’s also a scheduled job that helps clean up deleted files?  If not, there is, and it’s accessible through the Optimize Disk command and UI.

Windows Server 2012 (R2) and The Scheduler

By default, this “Optimize Disk” runs once a week.  If you look at the Task Scheduler UI you’ll see all the jobs that Windows Server 2012 & R2 will run.  In the image below, we see the “Defrag” scheduled job.  When the weekly scheduled task for “Optimize Drives” gets executed, it results in this process being run.  You can see in the “Actions” pane, that the Defrag utility will be executed.  So, once a week, this task will run a defrag on all drives, and that operation will also result in some UNMAP execution for Thin volumes.

DefragDisk

If you look at that ScheduledDefrag operation in the Task Scheduler, you will note that it itself does not have a date/time trigger.  This task can be scheduled manually, or executed by another task, for example …  the “Regular Maintenance” job.

This generically named “Regular Maintenance” job shown in the image below from a Windows Server 2012 R2 environment, is an interesting beast. In the image, the default trigger of “At 2:00 AM every day” can be seen. In case you’re seeing a 3:00 AM job instead of 2:00 AM, it seems like the default start time changed – in Windows Server 2012, the start appeared to be set to 3:00 AM, in Windows Server 2012 R2 the default start time is 2:00 AM.  As a separate aside, the client versions of Windows 8 and Windows 8.1 also have a similar scheduled job with similar functions – however, I’m only talking about the Server versions here.

Regular Maintenance

But what does the “Regular Maintenance” task actually do?  Turns out, it does a good deal.

Exposing “Regular Maintenance” Operations

If you turn on logging for the Task Scheduler, you will get a break down on the various steps executed by any and all jobs that the Task Scheduler executes. The execution history is then available in the “History” tab of the task, and logged in the event log under the Task Scheduler.

“Regular Maintenance” launches according to its trigger on a daily basis, so you can see the activity associated by it, by looking at the logs. It also only runs for an hour – an attribute set in the “Settings” tab. Scrubbing the logs of some systems (Windows Server 2012 R2), the steps proceed (roughly) in the logical order shown below.  Some components actually have multiple phases or side steps.  The Customer User Experience components, for example, collect and upload in multiple steps .. if you have opted into the CEIP during installation.

RegularMaintSteps

Highlighted in blue is a task which executes each day, and is part of the NTFS improvements added to Windows Server 2012.  It’s also possible there there are other steps that could get executed during that phase.  I don’t, for example, use DeDup with NTFS – we have arrays that do that🙂 – but it’s likely that other checks might fire off depending on what NTFS features are in use.  These could end up amounting to additional work for the system, and the targeted volumes.

The DeFrag job, highlighted in red, will execute once a week for each volume on a given server and will definitely add more work.  If the volumes are thin, then there are also going to be UNMAP operations happening as a result.  It’s not clear that this is going to be the same day for every volume on every server.  However, this becomes an exponential challenge when you consider dozens, if not hundreds or even thousands of Windows Server instances in an environment.  Remember that Virtual Machines (irrespective of the hyper-visor) are servers too, and if they are all time synched (domain members are by default), they’re all going to start doing busy work at roughly the same time – every day.  The aggregate workload that they all generate could be quite the challenge.  So, be forewarned.

Well that’s all very nice – But, what do I do?

In doing some investigations on various blogs, etc., some people are advocating disabling the Regular Maintenance task from executing at all.  I don’t know that this is a wise choice.  It would seem that some of the steps executed are of value, and disabling them wholesale would be detrimental to the system.  So what’s a good Systems Administrator to do?  Having them all fire off at roughly the same time would also seem to be counterproductive.  My only suggesting would be to stagger the time that this job is executed.  Understandably, that is easier said than done.  But I’m including a rather utilitarian PowerShell script that might help (use at your own risk).

The script obtains a list of all servers from Active Directory (so you will need the Active Directory PS Module).  If you’re running a Windows 8 / 8.1 client, then you will need the Remote Server Administrative Tools (RSAT) … a free download from Microsoft.  Remote Execution capabilities are assumed to be available, and valid.  Also, you better be a Domain Admin, or have the necessary privileges … or there will be a lot of errors when you attempt to remotely change the settings of servers.

I am the first to admit that my PowerShell scripts are “raw”, and I’m certain that there are a multitude of better ways to do this.  But this worked.  It grabs the list of servers from AD, and attempts to pick up IP addresses as well (because a server without an IP address is likely not there). For every server entry found, it will attempt to randomly select a time from the list assigned to $TimeOptions, and then set that time as the Task Start time for the “Regular Maintenance” task for the selected server.

$TimeOptions = "10:00 PM", "11:00 PM", "12:00 AM", "1:00 AM", "2:00 AM", "3:00 AM", "4:00 AM", "5:00 AM"
$MyComputers = Get-ADComputer -LDAPFilter "(OperatingSystem=*Server 2012*)" -Properties Name,IPv4Address,OperatingSystem
foreach ($Computer in $MyComputers)
{
    $NewStartTime = Get-Random $TimeOptions
    if ($Computer.IPv4Address -ne $null)
    {
        Invoke-Command -Computer $Computer.Name -ScriptBlock { param ($AtTime)
        $MyTaskStart = New-ScheduledTaskTrigger -Daily -At $AtTime
        Set-ScheduledTask -TaskPath \Microsoft\Windows\TaskScheduler\ -TaskName "Regular Maintenance" –Trigger $MyTaskStart } -ArgumentList $NewStartTime
    }
    else
    { write-host "No Address for:" $Computer.Name "-- SKIPPED” }
}

The intent is that, if truly a random distribution, then servers will launch evenly over the selected times.  As the task is potentially going to run for up to an hour, the start times have been set to be on the hour.

This does not take into account other operations that might be going on, like backups, or production workloads.  Those are going to be unique for any given environment, and sites will need to determine what is the best course of actions for themselves.  I just didn’t want to leave no suggestion.

 

Evolution of the Microsoft Windows Private Cloud – Windows Azure Pack

Tags

,

For a rather long period of time EMC has been working on enabling various Cloud solutions.  In the Microsoft space, we have been developing and delivering Private Clouds of varying styles. All of them providing high levels of performance, all of them elastic, all of them automated … but it always felt like there could be more.  The infrastructure piece was always robust, but the way that users would consume resources from the system seemed needing more attention.  We have demonstrated multiple ways to consume these services, and even Microsoft System Center took multiple runs at this interaction.  Self-service portals are now becoming a necessary component of Private Cloud solutions.

Consumers of cloud services should not be burdened with the details of which physical server runs their service, or other physical aspects that relate to the infrastructure.  They should be more concerned about access to, and availability of, their services.  They may also care about service levels for access and performance.  But they should not be concerned about which physical server, or even which Cloud their services are running in … rather that they are running, and running optimally.

For those that have used a public cloud service, you will rarely have been presented details on the physical server your service will be running on.  Certainly you may have geographic information, but not that your service is running on a particular physical server in a given datacenter.  You do have choice about characteristics of the service, for example, if you are deploying a Virtual Machine in an Infrastructure as a Service model, you will want to define CPU, memory and possibly storage sizing.

Ideally, Private Cloud solutions should abstract away the physical infrastructure from the consumer, leaving them with choices that matter to their service.  For hybrid cloud solutions, this would infect be a mandatory requirement.  Services in a hybrid world should be dynamically moved between an on-premises solution and a public cloud solution.  So choices selected would need to be limited to those valid in both offerings (or you would need to be able to translate between characteristics in one versus the other).

So it’s great to see that the next evolution from Microsoft seems like it’s going to fit the self-service bill!  This incarnation is called the Windows Azure Pack, and it’s delivered on a Windows Server 2012 R2 and System Center 2012 R2 family.  While much of the discussion around the Microsoft sites talks in terms of using the Windows Azure Pack (WAP) for Service Providers/Hosters … an Enterprise style customer is also acting very much like a Service Provider internally to their business groups.  It’s a great way to deliver services internally!

From the Public to the Private Cloud

With the introduction of Windows Azure Pack, Microsoft Private Cloud consumers can now enjoy many of the benefits of an “As a Service” model.  Be that as an Infrastructure As A Service (IaaS) or Platform as a Service (PaaS), Windows Azure Pack can fit the bill.

Implementing both an Administrative Portal and a Consumer (Tenant) Portal, IT organizations are now able to behave much like a Service Provider to their internal customers.  Customers as consumers, can then select service offerings that their IT organizations develops for them from a gallery.  The mechanics of what happens during deployment is then fully automated within the System Center 2012 R2 framework.  For example, should virtual machines be required to be deployed to a cloud system, then System Center Virtual Machine Manager 2012 R2 will execute the required steps to deploy from its library the necessary templates, and execute any required customizations.  The consumer can then access the resources once they are deployed.  Importantly, no IT operations involvement is required – it’s fully automated.

For IT staff, they are now able to focus on building service offerings to meet the requirements of the consumers.  They are also able to look at overall status of their environment, including consumption rates, availability, etc.  They are also enabled with tools to allow for Chargeback services to the consumers of the provided services.  These are the sorts of functions that Public Clouds have provided for some time – features that Private Clouds have been wanting to deliver.

Being Scalable and Elastic

There’s still a very important role for the infrastructure in all of this.  Private Clouds, like Public Clouds, are assumed to have limitless scale and elasticity .. and a good degree of automation.  Nothing will derail a good Private Cloud more than having to call an IT person at each and every step.  Indeed in many cases, the scale and size of a consumers environment may change over time, and they may want to mitigate costs by sizing their system appropriately for different events.  Classically, finance systems need to scale to a much larger degree when they approach end of month, and of quarter and end of year processing.  Allowing the customer to increase and decrease resources dynamically is ideal (of course this assumes that the service itself is designed for such functionality).

If the Private Cloud needs to have the elasticity, scale and automation that the consumers are looking for, then so too does the underlying infrastructure.  Given that the Private Cloud solution offering is based on Windows Server 20012 R2 and System Center 2012 R2, then features like Offloaded Data Transfer (ODX), SMB 3.0 and even UNMAP operations can benefit the solution, providing performance, flexibility and optimizations that the environment can utilize.  We’ve dealt with many of these features in earlier posts, and they all apply to Windows Azure Pack, as it consumes this services implicitly.

Deploying Windows Azure Pack

As mentioned. Windows Azure Pack consumes the services of the underlying infrastructure, both hardware and software.  As a result, the minimum requirements are to have a System Center 2012 R2 deployment that manages one or more Clouds as defined within System Center Virtual Machine Manager.  These clouds are surfaced up to the Windows Azure Pack through the integration with System Center Orchestrator and its Service Provider Foundation service (a separately installable feature within Orchestrator).

There is guidance provided at the TechNet site here.  A minimal installation can be a great starting point, and that’s available as the “Azure Pack: Portal and API Express” option.

Summary of EMC & Windows Azure Pack

Demo of Windows Azure Pack on an EMC Private Cloud

New Microsoft Windows Server Catalog

Tags

, , , ,

I’m certain that everyone was sitting on the edge of their seats waiting for this to go live, right?  Well perhaps not, but it’s worthy of noting for some of the changes that the Server Catalog brings.  Well apart from listing that now include Windows Server 2012 R2.

For those that don’t know what I’m talking about, this is THE list of certified components (H/W and S/W) for the Microsoft Windows platform .. both server and client.  It’s at (unsurprisingly) … http://www.windowsservercatalog.com/  The site (or former versions of it) has been around for a good many years, and if you wanted to know if your platform was supported for the version of Windows that you are running, this is the place to go.

Sure it covers the server side of the house, but it also covered the storage subsystems. For EMC, we have a selection of both hardware and software that we certify for the Windows platform, but I’m going to focus on just the hardware part.

One of the new features that has been added for Storage Devices is a break out of the supported Windows Storage features.  I’ve previously blogged about features like ODX and UNMAP, and invariably the first question is, “Is my platform supported?”  Now, the Windows Server Catalog can answer that.

For a given storage configuration, you can easily see the supported features. In the image below, I have selected the VMAX 20K product. Within its certification details are the features that are supported.  Amongst the feature list, you can see that both ODX and Thin Provisioning (this is where UNMAP comes in) are supported.

ServerCatalog

Our certification team in Elab, have been hard at work completing the certifications and providing feedback into our product teams.  At the new Server Catalog site launch, EMC had the broadest set of storage certifications for external RAID storage devices on Windows Server 2012 R2. 

This is where you can start to see EMC’s commitment to the Windows platform for our storage devices.

SQL14 Buffer Pool Extension & EMC ExtremSF/ExtremSW

Tags

, ,

Microsoft SQL Server is introducing a good number of new features with the upcoming “SQL14” release.  Most notably, it implements an in-memory solution (code named Hekaton) .. however, there are other features included.  The SQL14 Customer Technology Preview has been available for some time, and the CTP1 media was used for this exercise, running on Windows Server 2012.  I decided to look at one of these new features called Buffer Pool Extension (BPE), and wondered how it might behave to some similar EMC technology that we’ve used and validated previously for SQL Server environments.

Buffer Pool Extension

As the name suggests BPE is a mechanism to extend the buffer pool space for SQL Server. The buffer pool is where the data pages reside as they are being processed to execute a query, and it’s generally limited by the main memory (DRAM) available on the server itself.  While available memory in servers has been on the increase, so have the database sizes as well, having an adequately sized buffer pool helps keep data pages around, and keeping them around means that you don’t have to go to disk to execute a subsequent query that references these same pages.  In general, performance is going to be better, the less disk I/O you have to generate.

Performance is of course based on the speeds and feeds.  DRAM is typically very, very fast, disks on the other hand are orders of magnitude slower that DRAM speeds.  More recently a rash of new “Server Flash” solutions have come to market – these are generally PCIe based solutions. These server flash solutions fall (in terms of performance) between DRAM and disks. This is because they’re sitting on the PCIe bus, and subsequently have a more efficient means to service I/O .. it helps that they are also flash based (no moving parts).  These devices can also have very large throughput characteristics, and generally have pretty low latencies because of the performance characteristics of the PCIe bus.  The other thing that they deliver is large amounts of storage at a cheaper cost that something like DRAM.  Arguably Solid State Disks (SSDs or even Enterprise Flash Drives) have some of these characteristics, but drives of this type are much slower than Server Flash, because they live behind IDE or SAS controllers, FibreChannel controller or some other HBA.

So if you want to expand the SQL Server buffer pool so as to keep data pages around (besides what is available in DRAM) .. you will be able to use SQL14 BPE to help with that.  Effectively, you define a Buffer Pool Extension  as a physical file.  You specify where the file lives (so the storage needs to be seen as an NTFS volume), and once defined, SQL14 will start to use this space to keep data pages around.  Which data pages, and for how long, depends on the active dataset size, the space defined.  One interesting rule is that “dirty pages” cannot exist on the BPE device.  A dirty page is a page that has been updated, but has not been flushed to disk yet (of course the change is always written to the log file).  Dirty pages are either flushed to disk by something like a lazy writer, or a checkpoint operation.  Once a page no longer has changes to flush, it can be moved to the BPE storage.  Equally, a page that is read to satisfy a query, but is not updated, may be put out on the BPE storage – if you’ve already read the page, then it’s trivial to push it to the BPE.

Configuring BPE is done via the ALTER SERVER T-SQL statement, for example, in this environment:

ALTER SERVER CONFIGURATION

SET BUFFER POOL EXTENSION ON

    (FILENAME = ‘F:\BPE\EMCExtremeSF.BPE’, SIZE = 200GB);

go

You can also imagine that there are various algorithms in place to age the pages on the BPE storage, and discard the oldest/unused pages in preference to data that is just read, and might be re-read. It’s a complex beast when you consider the various activities going on.  But the goal is simple … keep more data available on high performance (and I wold argue, low latency) storage, such that you can improve the overall efficiency of the database environment.

EMC Server Flash

For the testing, EMC’s ExtremSF device was used.  In this instance, a 300 GB SLC version of the ExtremSF line, which now includes eMLC versions that provide over 1 TB of storage. The ExtremSF controller was used in two ways to look at optimizing the SQL14 environment.  Firstly, it was used as a traditional storage device, which, through Disk Management was partitioned to provide a 200 GB storage allocation (actually the volume was a little larger than 200 GB, the BPE file itself was created at 200 GB size).  In the second set of tests, the ExtremSF card was used in combination with the ExtremSW product to cache the SQL14 data files – more on that later.

The Test System

Because the testing needed to put some pressure on the SQL14 environment, I did what I would probably not recommend in any production environment, and that was to reduce the available amount of memory for the SQL14 instance to 20 GB.  That would subsequently, severely limit buffer pool space, and limit scale as a result of requiring much more I/O.  It also forces behavioral changes to SQL Server, forcing more writing of data pages, etc … again, I would never recommend doing this in practice!  But these limits were constant across the tests, as was the workload, the only variables ended up being the use of BPE, and subsequently ExtremSW.

The database itself contained around 750 GB of data and index.  Thus the dataset was 37.5 times larger than the total amount of memory allocated to SQL Server. Of the 20 GB allocated to SQL Server, only a portion of that is used by the SQL instance for the buffer pool .. so the ratio of data to buffer pool would actually be a little more extreme.  But what’s more important to consider is the “active” dataset – for example, you could have a 1TB set of data and index, but if you are only actively accessing a very small portion of the data/index, then it doesn’t really matter how large the dataset is … it matters more about the data that you are actively touching. In this case, the OLTP workload was fairly random across all the data/index.

Another aspect that remained constant throughout all testing was the underlying storage used for the database itself.  This was, unsurprisingly, an EMC array.  Here I even limited the total number of spindles, thus forcing more I/O to the data files, and subsequently increasing latency.  

Prior to each test, the database was restored from a backup.  Multiple runs were executed for each configuration (BaseLine, BPE and ExtremSW), and the average was used in the presentation of results (unless stated otherwise).

So what did the relative performance look like?

The Results

The performance is presented in terms of relative difference, since the actual numbers themselves do not matter – just how the workload changed for the various configurations.  The numbers used are the Transactions per minute (tpm).

Results

So for the same workload, same configuration of DB, but varying the usage of server flash as being SQL14 Buffer Pool Extension or using the same server flash as EMC ExtremSW, the system processed 1.72 times more when using Buffer Pool Extension, and 2.32 times more when using the same infrastructure with ExtremSW.

But there’s more ….

Efficiency Vs Time

How long it takes to make effective use of the performance enhancing server flash is also interesting. So a quick comparison of the workloads against the ExtremSF card for both the BPE implementation and that of ExtremSW.

SQL Server Batch Requests per second is one metric that may be used to identify how much “work” is being done by SQL Server, as it is a metric to determine the number of statements being executed.  Given that the workload is the same in every run, then this may give a comparison, and doing so, we see the following. (in this case, these are the numbers from two specific runs – not averaged across runs).

BatchReqComparison

The X-axis in this case shows the time, in Hrs:Min:Sec from the start of each run, and since the two runs executed for different periods to time (the test with BPE enabled was run for much longer to allow the utilization to reach steady state) … you can see that the ExtremSW test was terminated after about 7 hours, although steady state was attained after only about 3hrs.  The test run with BPE reached its steady state after about 10 hrs.  Also worth noting is the slope of the change.  ExtremSW had a much more aggressive improvement over a shorter period of time.  Overall, at steady state, the ExtremSW environment was processing more batches/sec than the Buffer Pool Extension implementation.

If it’s the same card, why is there such a difference?

Given that the hardware used was the same, there are implementation characteristics that will alter the performance.  Not the least of which is the aforementioned fact about the BPE storage only being able to hold non-dirty pages.  As a result, any updated pages will need to be pushed to durable media before being moved into the BPE.  That’s likely to be a small, but not necessarily trivial impact.

ExtremSW, on the other hand, is a rather different beast.  In the Windows environment ExtremSW is implemented effectively as a filter driver.  When configured as cache, the storage allocation on the ExtremSF card is used as a central storage (cache) pool by the driver.  In the following image, the ExtremSF card is seen as HardDisk0 (Disk0 in the GUI). The “ExtremeSF” NTFS volume was created to consume space from the device, such that when the ExtremSW implementation was activated, it would only use 200 GB, which is the “OEM Partition” seen on that device.

ExtremSW

Individual LUNs (HardDisks as seen by Windows) are then bound to this cache pool. As data is read from the disks, that data is stored in the cache immediately, and remains there until it becomes stale at which point it effectively gets dismissed.  Data that is updated is also stored in the cache pool, and as of this release of ExtremSW, all writes are implemented as Pass-Thru, so the write has to go to the backing disk in all cases … but the updated state is retained in cache (you don’t need to re-read what you have written).

Thus all data is cached on reads and writes, so there’s a tendency to be more efficient.  At least when comparing addition mechanisms that need to destage data out.

Again, in this configuration, the ExtremSW cache size was limited to 200 GB.  So it was effectively the same space on the ExtremSF card as the BPE file. There were 12 data LUNs in use (HardDisk4 thru HardDisk15 having NTFS volumes DATA01 thru DATA12, as seen in the previous mage) and these were bound to the ExtremSW cache pool, by executing the following calls to the VFCMT utility (the management tool for ExtremSW).

vfcmt add -source_dev harddisk4 -cache_dev harddisk0

vfcmt add -source_dev harddisk5 -cache_dev harddisk0

vfcmt add -source_dev harddisk6 -cache_dev harddisk0

vfcmt add -source_dev harddisk7 -cache_dev harddisk0

vfcmt add -source_dev harddisk8 -cache_dev harddisk0

vfcmt add -source_dev harddisk9 -cache_dev harddisk0

vfcmt add -source_dev harddisk10 -cache_dev harddisk0

vfcmt add -source_dev harddisk11 -cache_dev harddisk0

vfcmt add -source_dev harddisk12 -cache_dev harddisk0

vfcmt add -source_dev harddisk13 -cache_dev harddisk0

vfcmt add -source_dev harddisk14 -cache_dev harddisk0

vfcmt add -source_dev harddisk15 -cache_dev harddisk0

SQL Server transaction log devices don’t really benefit from ExtremSW, and it’s not really recommended in such instances to include the transaction log.  In this environment, the transaction log was on a separate LUN … HardDisk16 … and that was left out of the ExtremSW environment.

Conclusions?

It’s clear that Buffer Pool Extension has a positive impact to this SQL Server workload.  Its performance impact is definitely related to the characteristics of the storage used for the BPE file.  Server based Flash storage devices, like ExtremSF, have the performance characteristics to improve the throughput of SQL Server environments. This testing was based on CTP1 of the SQL14 product, and much change could be expected in the intervening time before launch.  As a result, performance may change with respect to efficiencies of BPE.

ExtremSW definitely is very efficient in improving the performance of SQL Server databases – there’s a number of papers that cover solutions using SQL Server 2008, etc.  It’s also true that ExtremSW is not specifically tied to SQL Server.  As mentioned, it’s a filter driver that binds to LUNs.  What those LUNs are used for, is irrelevant to ExtremSW, because the implementation is simply going to cache the data on those devices.  So if there’s an application that re-reads the same data, then it will see a benefit.

SQL Server Buffer Pool Extension is obviously a SQL Server feature, so its benefits are limited to this application.  Conversely, BPE is included in the appropriate versions of SQL Server, so you get that with that version. ExtremSW is an incremental cost as it is a separately licensed solution.  ExtremSF (the server flash card) is assumed to be common in both BPE and ExtremSW implementations.

In the end, the overall efficiency is also tied to the application, and the overall active dataset size.  Again, in this case, the Dataset size was around 750 GB, the cache size (both BPE and ExtremSW) was 200 GB.  As the ExtremSF card size, the dataset size and/or the active portion of the data changes, so will the results, and overall effect on any given environment.  Alas, it is the great “It depends” .. because it does. 

Windows Server 2012 – SMB 3.0 and VNX Unified

Tags

, , ,

With the advent of Windows Server 2012, support for the next iteration of Server Message Block (SMB) was released.  SMB 3.0 introduced a slew of new capabilities that provided scalability, resiliency and high availability features.  With these, a much broader adoption of file-based storage solutions for Windows Server 2012 and layered applications is possible.

It’s probably worthwhile to reiterate the point that SMB functionality is a combination of client and server component parts.  SMB 3.0, for example, is implemented as a part of Windows Server 2012 and Windows 8 client.  Existing Windows 7 clients will not support SMB 3.0, nor will Windows Server 2008 R2 and earlier.  What ends up happening with server/client combinations is that they will negotiate to the highest level of SMB that they can (together) support.  So a Windows 7 client connecting to a Windows Server 2012 file share will be negotiated to a SMB 2.0 level. This would be true for a Windows Server 2008 R2 client to a Windows Server 2012 server.  In these combinations, none of the SMB 3.0 features are supported. As a result, you should assume that the following only applies to Windows Server 2012 and Windows 8 – acting as file severs and clients as appropriate.

Now while saying that it’s only Windows Server 2012 … there are solutions like the VNX Unified platform that provide SMB 3.0 services.  The VNX file implementation has been upgraded to support SMB 3.0.  In fact, EMC was the first storage vendor to implement an SMB 3.0 solution to market,

SMB 3.0 … The Gift that keeps giving

The feature set of SMB 3.0 is rather large, and we’re not going to dissect each here, but there are features that add functionality like Remote VSS for backup/restore. SMB 3.0 Encryption that protects communications over IP networks effectively scrambling the data for anyone trying to eavesdrop on the network link.  There are even a number of caching solutions like BranchCache and Directory lease caching that help accelerate the performance for remote office users.  All these features are also part of the VNX Unified File implementation.

There are however, some rather important features …

Continuous Availability

This feature is implemented to ensure high availability configurations.  If you are planning on running applications that consume storage from a file share, then there’s an expectation that it provide a degree of high availability such that it does not become a single point of failure.

In the Windows world, Continuous Availability is provided when implementing the Scale-out File Server role within Windows Failover Clustering. This role allows all nodes in the cluster to service file share requests and protects against outage of the file share in the situation where a single server fails, or goes offline during something like a OS patch installation.

For VNX, Continuous Availability is a feature implemented against the specific file shares. When enabled the file share implements additional functionality, and is protected against outage of a single DataMover (the component providing the share) by persisting information to a redundant, secondary DataMover.  In the event that the primary DataMover fails, persistent information against files open on the shares is resumed by the secondary DataMover. File locks, application handles, etc .. all are transparently resumed on the secondary DataMover. As DataMovers are implemented in an Active/Passive configuration both scale and incremental redundancy is added.

Multi-Channel Connections

The main theme for SMB 3.0 is to provide the storage services for a range of applications. The support for SMB 3.0 extends from general purpose file shares to SQL Server databases and even as the location of Hyper-V Virtual Machines and the applications that they run.  As a result, the workloads can be significant.  It’s clear that scalable performance is a requisite.

For a Windows Scale-Out File Server implementation, SMB 3.0 clients can connect to the various IP addresses advertised by the servers themselves. For a VNX File Share, any given DataMover can have multiple connections into the environment, and as a result, scale is provided by connectivity to these different end-points (that is shown in the video below). The discovery of the multiple end-points and utilizing them is automatically managed by the client/server connection.

But Multi-Channel connectivity not only provides scale, it also provides high availability.  Should a single network connection fail, communications will remain active on any remaining connections.

Remote Direct Memory Access (RDMA)

Networking in general terms, generates work for server CPUs.  This happens because the TCP/IP traffic needs to be assembled and dissembled and that’s generally done by the CPU.  As a result, the more data you store on a file share, the more work the CPUs tend to have to do.  But the whole point is to put lots of data out on SMB 3.0 targets .. so what’s the admin to do?

The recommendation from Microsoft is to move to RDMA deployments.  The RDMA implementation mitigates the additional work for the CPU in constructing and deconstruction packets by essentially turning the transfers into memory requests (that would be the direct memory part).  There is still work to get the data into memory on the source, and to extract it on the target, but all the packet overheads are removed.

RDMA is implemented by a number of vendors using varying technology. The one that currently seems to tout the best performance (that’s a moving target) would seem to be Infiniband. This solution does require Infiniband (IB) controllers in the servers, and a switch or two for communications, but it is very low latency and has large bandwidth capabilities. 

IB is not the only RDMA game in town .. other vendors are delivering RDMA over Converged Ethernet (RoCE), which may be able to use existing converged infrastructure.

Because we are talking about the VNX, then we should add that as of the date of this post, VNX does not have an RDMA solution. At least not natively in the File head. But if there were a desperate need to implement RDMA, then the storage could certainly be surfaced up as block storage to a Windows cluster and share out via Scale-Out File Share services. If those servers had IB controllers, then you can have your cake and eat it too. It’s just that it’s a layer cake🙂

[youtube https://www.youtube.com/watch?v=3DqDjBk1jYQ&hl=en&fs=1]

Windows Server 2012 – Thin LUNs and UNMAP

Tags

, ,

Amongst the various storage enhancements in Windows Server 2012 we have seen the introduction of support for Thin storage devices.  In the past, Windows Server environments would tolerate a Thin device, in that these operating systems did nothing special for Thin storage.  As long as the LUN was able to deal with Read and Write operations – they could happily work with Windows Server.  But that limited some of the benefits that Thin devices brought .. specifically around storage pool efficiencies.

Why Thin devices at all?

The real benefit of Thin storage is in the fact that Thin devices only do an allocation of actual storage when needed.  Effectively, a Thin LUN is connected to some backend storage pool.  The pool is where all the data really lives.  The Thin LUN itself, might be considered a bunch of pointers that represents blocks (or Logical Block Address ranges).  Where no data has been written against the LUN, the pointers don’t point to anything.  When data is written, blocks are allocated from the pool, the pointer now points to that block, and the data is written.

Thin implementations are efficient, since they don’t consume any storage until required, and let’s face it, in the non-Thin world, people padded volumes with a bunch of additional space “for growth”.  Such padding, created “white space” and meant that storage efficiencies went down.  Thin LUNs fix that, because, for the most part, it doesn’t matter how big they appear, it only matters how much storage they allocate.  You can have a 20 TB Thin LUN, but if you only ever write 5 GB to it, then only 5 GB will ever be consumed from the Pool.

Of course, if the Pool is actually smaller than the sum of the “advertised” space of the LUNs, and everyone starts to allocate all that space, you do end up with an issue.  But that’s a discussion for another time.

What does Thin device support help us do?

In effect, Windows Server 2012 does a couple of things if it detects a Thin storage LUN. The first is that it supports a concept of Threshold notifications.  These notifications are actually Windows log entries that are generated when new allocations are made against a LUN, and certain percentage of allocations occur.  As an example, consider that you have a 1 TB Thin LUN, and there’s a threshold notification for when 80% of the “advertised” size of the LUN is consumed – that would mean that when the next write over 800 GB occurs, a notification is sent from the storage array back to the Windows Server 2012 instance, and it will log the event in the Event log.

The second (and honestly, more interesting) piece is that Windows Server 2012 will now send UNMAP operations to the LUN when a filesystem object is deleted, or when it attempts to Optimize the volume.  In the past, Windows Server environments did nothing to tell the LUN that a file had been removed, and that the space that was allocated for it were now no longer required.  That meant that any blocks that were at some point written to, would always remain allocated against the Thin device.  The only way to resolve this, was to use a variety of manual techniques to free this now unused space.  Windows Server 2012 mitigates the need to do manually intervene, and makes this space reclamation automatic.

When is Thin device detection enabled?

Thin LUN detection occurs dynamically between a storage array that supports the Windows Server 2012 specification, and the installation of Windows Server 2012. As this is a new feature, you will find that existing arrays with their pre-existing version of firmware or microcode may not support this feature.  In general, customers might expect that they will need to update the firmware or microcode on their systems to have this feature.  Of course each product is different, and so it will be necessary to check for exact specific vendor.

For EMC storage arrays, this functionality support is being made available in the VNX and VMAX product lines. For both products, this will become available as FLARE and Enginuity updates for VNX and VMAX, respectively. Unfortunately, for customers with prior generations of CLARiiON and Symmetrix DMX products, there are currently no plans to offer Windows Server 2012 Thin compliance implementations.  Thin devices will work in the same ways as always for these earlier versions of EMC arrays – but they will not be seen as Thin from Windows Server 2012.

How does Thin device support execute?

For the most part, the interactions between a Windows Server 2012 instance and a Thin LUN are automatically managed.  Nothing new really happens when a write is sent to a Thin LUN.  This will, as always, be serviced by the LUN as it always has been.  Notwithstanding the threshold notification activity.

It is really the activity that happens after a file is deleted that is where the difference in behavior occurs.  After a file deletion, Windows Server 2012 NTFS will identify the logical block address ranges that the file occupied, and will then submit UNMAP operations back to the LUN.  This will cause the storage array to deallocate (UNMAP) the blocks from the pool used for the LUN.

While this is automatic for the most part, it is also possible to manually get the Windows Server 2012 instance to re-check the volume, and where blocks could be deallocated, issue those deallocations.  This is done either through the “Optimize-Drive” Powershell commandlet, or from the Optimize Drive GUI.  There is also an automatic task that is executed on a regular cycle by Windows Server 2012 – this is also visible from the Optimize Drive GUI.

The Fix Is In

Invariably there are items that need some tweaking when you release a new feature.  Support for Thin LUNs is no different in this regard.  Optimizations have been added to the support of Thin devices and UNMAP operations, and these have been made available in KB2870270 (as of July, 2013).  Go straight over to http://support.microsoft.com/kb/2870270 Download it, test it in your environment, and when you get to deploy a server don’t forget to include it.

Thin Device support – When It All Works

When fully implemented, Thin device support happens quietly in the background … but in the following demonstration, we’ve attempted to show you how this all happens.

Windows Server 2012 R2 – Shared VHDX

Tags

, ,

One of the new features available in Windows Server 2012 R2 is the ability to create a Shared VHDX configuration.  This is about being able to attach a single VHDX to more than a single Hyper-V guest.  But why?  To be able to run a Cluster in the guest OS, and use these device as “Shared Disks” of course!  Previously, the only choices were to use iSCSI based storage in the guests, or implement Virtual HBAs (this option was introduced in Windows Server 2012).

In Hyper-V environments, up to and including Windows Server 2012, a VHD and/or a VHDX could only be accessed by a single Guest OS at any given time.  Administrators were blocked from attaching a single VHDX to more than one active guest OS by file locking. But in Windows Server 2012, for a VHDX, you can now enable a sharing option (sharing a VHD is not supported).

Enabling Shared VHDX

The option to share is provided in the “Advanced Features” option after expanding the VHDX object, as seen below. 

WS2012R2 Shared VHDX Option

You just have to check the “Enable virtual hard disk sharing” on all Hyper-V VMs that will be part of this cluster, and you’re all set right?  Well almost.  Certainly if you are using Windows Server 2012 R2 as the guest OS (that OS instance running in the VM itself) this is sufficient.  However, if you were to install, say, Windows Server 2012 (not the R2 version) …. then you could be in for some not so obvious issues.

Speaking from experience, it was slightly frustrating to share a VHDX file between and be able to see the “LUN” representing the shared VHDX in the VMs, but then running the Cluster Validate option, only to see that there were no available storage devices.  Huh?  the volume could be partitioned, and the changes were reflected on the other VMs, even the disk signatures were the same … but nothing would make the “disks” available for clustering.

Could this only be an option for Windows Server 2012 R2 guests?  Surely not.

Using Shared VHDX Storage for Guests prior to Windows Server 2012 R2

It finally dawned on me that perhaps the guest needed updating.  Not OS updating … but the Guest Integration Services.  These services have versions that are built-in for each Windows OS version (well certainly for recent releases).  However, as this is only a Beta at this point, new integration components are not available .. and these may leverage new functionality.

Updating these components can be done a number of ways, but the easier is to use Hyper-V Manager (or Windows Failover Cluster Manager) on the parent, and connect to the VM itself. Then select the option from the Action menu item, as shown below .. of course you do have to run the setup within the VM itself.

WS2012R2 Shared VHDX Integration Services

After the update to the Integration components a recheck of the Cluster Validate in the guest running Windows Server 2012 showed the shared VHDX storage. This may not work for other OS versions earlier than Windows Server 2012 – at least references on TechNet would seem to indicate limited support.  That would not be surprising, given that most existing Windows Server OS versions have, or are about to go end of life.  Might be interesting to see if this is supported on some non-Windows OS instances.

WS2012R2 Shared VHDX Cluster Disks

Such a simple issue, but not so obvious the first time around.

Windows Server 2012 – ODX

Tags

, , , ,

Many new features have been made available within Windows Server 2012.  They range across all product features of Windows Server 2012 from User Interfaces through to kernel changes. Some of these new features you will be able to explicitly interact with, enable and use, and others will be implicitly made available.  The Windows Server 2012 Offloaded Data Transfer (ODX) is one of these implicit features … well, where it’s supported.

What does ODX help us do?

The key is in the name, and Offloaded Data Transfer, does just that … offload the data transfer activity to a storage array.  The power of ODX is in its leveraging the resources of the storage array itself.  Systems expend a good deal of resource in copying data from a storage array, to turn around and push the same data back out to the array.  This happens when users make duplicate copies of data sets (files in this case), or when someone copies a file from one LUN to another. Users will often complain about the time it takes to copy data from one LUN to a LUN on a different server, where the data has to travel from the storage array to the originating server, and then across an Ethernet connection to the receiving server and then finally to a LUN.

Not only does this sort of copy activity consume bandwidth from HBAs that might be better served dealing with transactional workloads, these operations consumes CPU resources from the servers and even congest network resources when transferring across servers. Freeing these resources help systems run much more efficiently, allowing for compute resources to be better spent servicing business workloads, rather than dealing with the additional overhead of moving these large file objects.

There is also one other area where there’s been a need to deal with moving large file objects around, and this is the world of virtualization. Virtual Machines are in a large part defined by the hard drives that comprise their storage area. In some instances, deployments may wish to copy these virtual machine hard drives (VHDs in Hyper-V .. or now the new VHDX format) between systems. This is especially true if the VHDs contain an image of a ‘sysprep’ OS instance, be it Windows Server or Windows client.  Deploying VDI style implementations where you may want to copy a large number of VHDs is one such example.

What ODX effectively does is let the array do the actual data transfer internally.  No more pulling the data back to the server to subsequently send it back down to the same array.  Rather, Windows Server 2012 will coordinate with the array to transfer the blocks that contain the data from one location on one LUN, to a different location on the same LUN, or another LUN – depending on what the copy actually requires.  The transfer of the blocks is executed by the array.  The coordination of NTFS meta-data changes is handle by the Windows Server 2012 instances involved.

When is ODX enabled?

The enabling of the ODX feature occurs dynamically between a storage array that supports ODX, and the installation of Windows Server 2012. As this is a new feature, you will find that existing arrays with their pre-existing version of firmware or microcode may not support this feature.  In general, customers might expect that they will need to update the firmware or microcode on their systems to have this feature.  Of course each product is different, and so it will be necessary to check for exact specific vendor.

For EMC storage arrays, this functionality support is being made available in the VNX and VMAX product lines. For both products, this will become available as FLARE and Enginuity updates for VNX and VMAX, respectively. Unfortunately, for customers with prior generations of CLARiiON and Symmetrix DMX products, there are currently no plans to offer ODX implementations.

How does ODX work?

The primary problem  for storage arrays is that they are, by and large, oblivious to the existence of volume managers, and these things called files.  At the array level, the only important structures are the Logical Block Address ranges specified in SCSI commands. It’s the volume manager that deals with the location of the file within the filesystem, which itself lives within a partition on a LUN.  So telling an array to copy a file is a rather complicated problem .. from the perspective of the array itself.

The implementation of ODX deals with this by breaking the file into LBA ranges. These are the physical locations of the data comprising the file.  There can be one or more ranges for any given file. If a filesystem is highly fragmented, then the file may be resident in lots of different locations across the filesystem.

The point here is that there’s a set of ranges that represent the data of the file. Similarly it’s necessary to have a complementary set of ranges that will hold the data when it is transferred. Given a set of source LBA ranges, and a set of target LBA ranges, the array can now effectively do the copy of those areas from source to target without transferring via the host.  The array still doesn’t know what the data is, or that it represents a “file” .. it is just dealing with address ranges.

But that is only part of the solution. Remember that there is NTFS meta-data involved.  Simply copying blocks of data to a target location does not make a file appear within the NTFS volume. That meta-data change – the creation of the file entity, and matching it’s allocated blocks to point to the similar target LBA address ranges – is the other piece of ODX.

How do you call ODX into action?

To execute ODX operations, requires a “copy engine”, one exists within Windows Server 2012 itself, and it gets invoked whenever the COPY API is used.  That sounds more complicated than it is, and just think that any copy operation will automatically call ODX where it can.  So if you do a Copy (CTRL-C) and a Paste (CTRL-V) … that will execute ODX if it can.  Using File Explorer and dragging a file from one NTFS volume to another … again ODX will be used if possible.  Even XCOPY.EXE and any PowerShell commandlet that would invoke ODX, will.

I keep saying “where possible”, and this is a rather important point.  ODX requires that the source LUN and the target LUN both support ODX.  Clearly doing a copy and paste in the same NTFS volume that is on a single LUN from an array, where that array is running ODX capable code will execute ODX operations.  But where it gets tricky is across LUNs on different arrays.  Here, there’s a need for the arrays to talk together, and have some way of transferring the blocks across the two arrays.  Today, there is no support for using ODX across different arrays.  It’s also typically not possible to use ODX from local storage on a server to a LUN on an array.  In these cases, copy operations will use the traditional Read/Write operations.

ODX and Legacy Copy

Perhaps it’s best to think of ODX as a privilege rather than a right.  ODX operations, by design, can fail.  Storage arrays are physical entities and they have their limits.  It could be entirely possible to overwhelm an array with ODX operations, for example, or for the array to simply have a great deal of other work to get done.  Arrays are typically expected to service all workloads with some implicit level of performance, and well behaved systems might be expected to not crush the other user workloads.  So there are cases where ODX calls could end up being rejected.

A copy engine, when it receives an ODX operational failure, will revert to legacy copy (Read/Write) for the remained of the current “file” copy.  A background timer is also started, and this timer is set for 3 minutes.  While the current copy engine will continue to process the current file as a legacy operation until it is completed (irrespective of the three minute timer), if it then moves on to a new file copy, and the three minutes have elapsed, then it will attempt ODX again.  Any other copy engines will continue to process ODX operations independently – they are  not affected by an ODX failure against a different copy engine.

The Fix Is In

Invariably there are items that need some tweaking when you release a new feature.  You would be well served to install the latest Windows Fix that includes updates to this feature, and that would be KB2870270 (as of July, 2013).  Go straight over to http://support.microsoft.com/kb/2870270 Download it, test it in your environment, and when you get to deploy a server don’t forget to include it.

ODX – When It All Works

Not withstanding all the limitations, and caveats … once ODX is in operation, it can certainly deliver on the promise of optimized operations.  Most people would like to focus on the speed aspect, but not in all cases is this just about speed .. it’s about not consuming HBA (Fibre Channel) bandwidth, and not choking Ethernet networks with data transfers.

But here’s a sample of what happens when you get ODX right … this is for an EMC VMAX array .. don’t assume all arrays will provide this level of performance!

Then there are the rules

Well there’s only really a single rule, and that applies to VMAX.  The requirement here is that the NTFS volumes for both source and target destinations have been formatted with an Allocation Unit Size of 64 KB.  This is one of those options that are selected when you are creating the NTFS volume.  Without this set, you might find that ODX doesn’t happen a lot. The requirement stems from the fact that cache slots on a VMAX array are 64 KB, and we need at least the start of the file to begin on a cache slot boundary – this happens when the file is at a 64 KB boundary. If the NTFS volume starts at a 64 KB boundary (this piece happens by default in Windows Server 2012 and later), and then you set Allocation Unit size to 64 KB (this is the manual intervention required when formatting the volume) … files will start on a boundary, and everything should work.

Follow

Get every new post delivered to your Inbox.