Set up Multi-AZ infrastructure for a SQL Server Always On FCI by using Amazon FSx
Manish Garg, T.V.R.L.Phani Kumar Dadi, Nishad Mankar, and RAJNEESH TYAGI, Amazon Web Services
Summary
If you need to migrate a large number of Microsoft SQL Server Always On Failover Cluster Instances (FCIs) quickly, this pattern can help you minimize provisioning time. By using automation and Amazon FSx for Windows File Server, it reduces manual efforts, human-made errors, and the time required to deploy  a large number of clusters.
This pattern sets up the infrastructure for SQL Server FCIs in a Multi-Availability Zone (Multi-AZ) deployment on Amazon Web Services (AWS). The provisioning of the AWS services required for this infrastructure is automated by using AWS CloudFormation templates. SQL Server installation and cluster node creation on an Amazon Elastic Compute Cloud (Amazon EC2) instance is performed by using PowerShell commands.
This solution uses a highly available Multi-AZ Amazon FSx for Windows file system as the shared witness for storing the SQL Server database files. The Amazon FSx file system and EC2 Windows instances that host SQL Server are joined to the same AWS Directory Service for Microsoft Active Directory (AWS Managed Microsoft AD) domain.
Prerequisites and limitations
Prerequisites
- An active AWS account 
- An AWS user with sufficient permissions to provision resources using AWS CloudFormation templates 
- AWS Directory Service for Microsoft Active Directory 
- Credentials in AWS Secrets Manager to authenticate to AWS Managed Microsoft AD in a key-value pair: - ADDomainName: <Domain Name>
 
- ADDomainJoinUserName: <Domain Username>
 
- ADDomainJoinPassword:<Domain User Password>
 
- TargetOU: <Target OU Value>
 
 
- You will use the same key name in AWS Systems Manager automation for the AWS Managed Microsoft AD join activity. 
- SQL Server media files for SQL Server installation and Windows service or domain accounts created, which will be used during cluster creation 
- A virtual private cloud (VPC), with two public subnets in separate Availability Zones, two private subnets in the Availability Zones, an internet gateway, NAT gateways, route table associations, and a jump server 
Product versions
Architecture
Source technology stack
Target technology stack
- AWS EC2 instances 
- Amazon FSx for Windows File Server 
- AWS Systems Manager Automation runbook 
- Network configurations (VPC, subnets, internet gateway, NAT gateways, jump server, security groups) 
- AWS Secrets Manager 
- AWS Managed Microsoft AD 
- Amazon EventBridge 
- AWS Identity and Access Management (IAM) 
Target architecture
The following diagram shows an AWS account in a single AWS Region, with a VPC that includes two Availability Zones, two public subnets with NAT gateways, a jump server in the first public subnet, two private subnets, each with an EC2 instance for a SQL Server node in a node security group, and an Amazon FSx file system connecting to each of the SQL Server nodes. AWS Directory Service, Amazon EventBridge, AWS Secrets Manager, and AWS Systems Manager are also included.
Automation and scale
AWS services
- AWS CloudFormation helps you set up AWS resources, provision them quickly and consistently, and manage them throughout their lifecycle across AWS accounts and Regions. 
- AWS Directory Service provides multiple ways to use Microsoft Active Directory (AD) with other AWS services such as Amazon Elastic Compute Cloud (Amazon EC2), Amazon Relational Database Service (Amazon RDS) for SQL Server, and Amazon FSx for Windows File Server. 
- Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the AWS Cloud. You can launch as many virtual servers as you need and quickly scale them up or down. 
- Amazon EventBridge is a serverless event bus service that helps you connect your applications with real-time data from a variety of sources. For example, AWS Lambda functions, HTTP invocation endpoints using API destinations, or event buses in other AWS accounts. 
- AWS Identity and Access Management (IAM) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them. 
- AWS Secrets Manager helps you replace hardcoded credentials in your code, including passwords, with an API call to Secrets Manager to retrieve the secret programmatically. 
- AWS Systems Manager helps you manage your applications and infrastructure running in the AWS Cloud. It simplifies application and resource management, shortens the time to detect and resolve operational problems, and helps you manage your AWS resources securely at scale. 
Other tools
Code repository
The code for this pattern is available in the GitHub aws-windows-failover-cluster-automation repository.
Best practices
Epics
| Task | Description | Skills required | 
|---|
| Deploy the Systems Manager CloudFormation stack. | Sign in to your AWS account, and open the AWS Management Console.Navigate to the CloudFormation console, and create the Systems Manager CloudFormation stack by uploading the ssm.yamltemplate. Provide values for the following parameters: StateUnJoinAssociationLoggingBucketName – Provide a name for the S3 bucket that the template will create for logging purposes.SSMAssociationADUnjoinName – Provide a name for the AWS::SSM::Associationresource.SSMAutomationDocumentName – Provide a name for the Systems Manager Automation runbook.EventBridgeName – Provide a name for the EventBridge event bus.
Deploy the Systems Manager CloudFormation  stack by launching the ssm.yamlCloudFormation template. The template will create the Systems Manager Atomation runboook that is initiated when a new EC2 instance with the tagADJoined: FSXADDlaunches. The Automation runbook will add the instance to AWS Managed Microsoft AD directory.
 | AWS DevOps, DevOps engineer | 
| Deploy the infrastructure stack. | After successful deployment of the Systems Manager stack, create the infrastack, which includes EC2 instance nodes, security groups, the Amazon FSx for Windows File Server file system, and the IAM role. Navigate to the CloudFormation console and launch the infra-cf.yamltemplate. To deploy this stack, the following parameters are required: ActiveDirectoryId– ID for AWS Managed Microsoft AD
ADDnsIpAddresses1– Primary DNS IP address of AWS Managed Microsoft AD
ADDnsIpAddresses2– Secondary DNS IP address of AWS Managed Microsoft AD
FSxSecurityGroupName– Name of the Amazon FSx security group
FSxWindowsFileSystemName– Name of the Amazon FSx drive
ImageID– ID of the base Windows 2012 R2 image or Amazon Machine Image (AMI) used to create the SQL Server instance node
KeyPairName– Key-value pair to attach to the EC2 instance nodes for access
Node1SecurityGroupName– Name of the first node security group
Node2SecurityGroupName– Name of the second node security group
OUSecretName– Name of the secret that contains the AWS Managed Microsoft AD information
PrivateSubnet1– ID of the first private subnet
PrivateSubnet2– ID of the second private subnet
SqlFSxFCIName– Name of the tag applied to the primary and secondary nodes and to Amazon FSx.
SqlFSxServerNetBIOSName1– Name of the primary EC2 instance node (maximum of 15 characters)
SqlFSxServerNetBIOSName2– Name of the secondary EC2 instance node (maximum of 15 characters)
VPC– VPC ID
WorkloadInstanceType– Type of EC2 instance
Deploy the infrastack. The stack will create all the infrastructure components that are required to set up Windows SQL Server FCI.After the EC2 instance nodes are launched, the Systems Manager Automation document will be invoked to join these instances to AWS Managed Microsoft AD. You can track the progress on the Systems Manager console Automation page.
 | AWS DevOps, DevOps engineer | 
| Task | Description | Skills required | 
|---|
| Install Windows tools. | Log in to the primary EC2 instance, which is node 1. To install the Windows features (Active Directory and FCI Tools), run the following PowerShell script. Install-WindowsFeature -Name RSAT-AD-Powershell,Failover-Clustering -IncludeManagementTools
Install-WindowsFeature -Name RSAT-Clustering,RSAT-ADDS-Tools,RSAT-AD-Powershell,RSAT-DHCP,RSAT-DNS-Server
Log in to the secondary EC2 instance, which is node 2, and run same script to enable features on node 2.
 | AWS DevOps, DevOps engineer, DBA | 
| Prestage the cluster computer objects in Active Directory Domain Services. | To prestage the cluster name object (CNO) in Active Directory Domain Services (AD DS) and prestage a virtual computer object (VCO) for a clustered role, follow the instructions in the Windows Server documentation. | AWS DevOps, DBA, DevOps engineer | 
| Create the WSFC. | To create the Windows Server Failover Clustering (WSFC) cluster, do the following: Log in to the primary EC2 instance, which is node 1. To create the Amazon FSx file share and grant full access to the listed AD service account, run the following code. Invoke-Command -ComputerName "<FSx Windows Remote PowerShell Endpoint>" -ConfigurationName FSxRemoteAdmin -scriptblock {
New-FSxSmbShare -Name "SQLDB" -Path "D:\share" -Description "SQL Databases Share" -ContinuouslyAvailable $true -FolderEnumerationMode AccessBased -EncryptData $true
grant-fsxsmbshareaccess -name SQLDB -AccountName "<domain\user>" -accessRight Full
}
 This command will also create the continuously available (CA) file share, which is optimized for use by Microsoft SQL Server.To create the failover cluster on the primary instance (node 1), run the following command. New-Cluster -Name <CNO Name> -Node  <Node1 Name>, <Node2 Name> -StaticAddress <Node1 Secondary Private IP>, <Node2 Secondary Private IP>
 The command requires the following parameters: Name– The name of the cluster (CNO)
Node– The names of the primary and secondary nodes, respectively
StaticAddress– The secondary IP addresses of the primary and secondary nodes, respectively
A domain administrator or regular user must have administrator permission on both the nodes to create the Windows Server Failover Clustering (WSFC) cluster. Otherwise, the previous command will fail and return the message, You do not have administrator privilege on servers.After the cluster is created, run the following command to attach the file share witness. Set-ClusterQuorum -FileShareWitness \\<FSx Windows Remote PowerShell Endpoint>\share\witness
 | AWS DevOps, DBA, DevOps engineer | 
| Install the SQL Server failover cluster. | After the WSFC cluster is set up, install the SQL Server cluster on the primary instance (node1). In the T drive on both nodes, create tempdbandlogfolders. The folders are used in the PowerShell commands.After you copy the SQL Server media files for SQL Server installation on both nodes, run the following PowerShell command on node 1 to install SQL Server on node 1.
D:\setup.exe /Q  `
/ACTION=InstallFailoverCluster `
/IACCEPTSQLSERVERLICENSETERMS `
/FEATURES="SQL,IS,BC,Conn"  `
/INSTALLSHAREDDIR="C:\Program Files\Microsoft SQL Server"  `
/INSTALLSHAREDWOWDIR="C:\Program Files (x86)\Microsoft SQL Server"  `
/RSINSTALLMODE="FilesOnlyMode"  `
/INSTANCEID="MSSQLSERVER" `
/INSTANCENAME="MSSQLSERVER"  `
/FAILOVERCLUSTERGROUP="SQL Server (MSSQLSERVER)"  `
/FAILOVERCLUSTERIPADDRESSES="IPv4;<2nd Sec Private Ip node1>;Cluster Network 1;<subnet mask>"  `
/FAILOVERCLUSTERNETWORKNAME="<Fail over cluster Network Name>"  `
/INSTANCEDIR="C:\Program Files\Microsoft SQL Server"  `
/ENU="True"  `
/ERRORREPORTING=0  `
/SQMREPORTING=0  `
/SAPWD="<Domain User password>" `
/SQLCOLLATION="SQL_Latin1_General_CP1_CI_AS"  `
/SQLSYSADMINACCOUNTS="<domain\username>" `
/SQLSVCACCOUNT="<domain\username>"  /SQLSVCPASSWORD="<Domain User password>" `
/AGTSVCACCOUNT="<domain\username>"  /AGTSVCPASSWORD="<Domain User password>" `
/ISSVCACCOUNT="<domain\username>" /ISSVCPASSWORD="<Domain User password>"  `
/FTSVCACCOUNT="NT Service\MSSQLFDLauncher"  `
/INSTALLSQLDATADIR="\\<FSX DNS name>\share\Program Files\Microsoft SQL Server"  `
/SQLUSERDBDIR="\\<FSX DNS name>\share\data"  `
/SQLUSERDBLOGDIR="\\<FSX DNS name>\share\log" `
/SQLTEMPDBDIR="T:\tempdb"  `
/SQLTEMPDBLOGDIR="T:\log"  `
/SQLBACKUPDIR="\\<FSX DNS name>\share\SQLBackup" `
/SkipRules=Cluster_VerifyForErrors `
/INDICATEPROGRESS
 | AWS DevOps, DBA, DevOps engineer | 
| Add a secondary node to the cluster. | To add SQL Server to the secondary node (node 2), run the following  PowerShell command. D:\setup.exe /Q  `
/ACTION=AddNode `
/IACCEPTSQLSERVERLICENSETERMS `
/INSTANCENAME="MSSQLSERVER"  `
/FAILOVERCLUSTERGROUP="SQL Server (MSSQLSERVER)" `
/FAILOVERCLUSTERIPADDRESSES="IPv4;<2nd Sec Private Ip node2>;Cluster Network 2;<subnet mask>" `
/FAILOVERCLUSTERNETWORKNAME="<Fail over cluster Network Name>" `
/CONFIRMIPDEPENDENCYCHANGE=1 `
/SQLSVCACCOUNT="<domain\username>"  /SQLSVCPASSWORD="<Domain User password>" `
/AGTSVCACCOUNT="domain\username>"  /AGTSVCPASSWORD="<Domain User password>" `
/FTSVCACCOUNT="NT Service\MSSQLFDLauncher" `
/SkipRules=Cluster_VerifyForErrors `
/INDICATEPROGRESS
 | AWS DevOps, DBA, DevOps engineer | 
| Test the SQL Server FCI. | On the Windows instance for one of the nodes, in Administrative Tools, launch the Failover Cluster Manager.Navigate to Nodes, and confirm that the node status is Status Running.Select Roles, open the context (right-click) menu for SQL Server (MSSQLSERVER), and select Move and Select Node.After the node selection, SQL Server should be running on the other node.
 | DBA, DevOps engineer | 
| Task | Description | Skills required | 
|---|
| Clean up resources. | To clean up the resources, use the AWS CloudFormation stack deletion process: Open the AWS CloudFormation console.On the Stacks page, select the infrastack. The stack must be currently running.In the stack details pane, choose Delete.Select Delete stack when prompted.Repeat steps 2-4 for the ssmstack.
After the stack deletion is complete, the stacks will be in the DELETE_COMPLETEstate. Stacks in theDELETE_COMPLETEstate aren’t displayed in the CloudFormation console by default. To display deleted stacks, you must change the stack view filter as described in Viewing deleted stacks on the AWS CloudFormation console. If the deletion failed, a stack will be in the DELETE_FAILEDstate. For solutions, see Delete stack fails in the CloudFormation documentation. | AWS DevOps, DBA, DevOps engineer | 
Troubleshooting
| Issue | Solution | 
|---|
| AWS CloudFormation template failure | If the CloudFormation template fails during deployment, do the following: | 
| AWS Managed Microsoft AD  join failure | To troubleshoot the join issues, follow these steps: Open the Systems Manager console.Select the deployment Region.In the left pane, choose Automation, and locate the failed Automation runbook.Open the Automation runbook, and check for the Execution status and Execution steps.Investigate the details of the failed step to see the exact error or failure.
 | 
Related resources