# Amazon FSx Active Directory Validation Troubleshooting Guide

This following guide adds detailed explanations for various scenarios that
may cause failures or warnings during validation and how to fix them.

This guide is broken down into sections by test name and then in sub-section
by failure ID (key in either `Failures` or `Warnings` structure of validation
result). Use text search on this guide to rapidly jump to relevant section.

## Validate EC2 Subnets

### GetEc2Subnet

Failure to get ec2 subnets is likely due to the EC2 instance this is
being called on lacks 'ec2:DescribeSubnets' permission as described
in prerequisites of README.

### GetEc2Vpc

Failure to fetch EC2 VPC is likely due to EC2 instance this is
being called on lacks 'ec2:DescribeVpcs' permission as described
in prerequisites of README.

### InvalidSubnetCount

Amazon FSx for Windows supports either **ONE** subnet for Single-AZ deployment
types or **TWO** subnets for Multi-AZ deployment types. Passing in a different
number of subnet IDs is invalid.

### InvalidSubnetIds

Subnet ID(s) supplied **DOES NOT** exist.

### SubnetsInSameAZ

Multiple subnets are currently only supported for Multi-AZ deployment type.
To ensure availability and durability, these subnets **MUST** be in different AZs.

### SubnetsInSeparateVPCs

If multiple subnets are provided, they **MUST** belong to the same VPC.

## Validate connectivity with DNS Servers

### DnsCommunication

This is a warning if **ONE** of the DNS servers provided cannot be reached
and is a failure if **ALL** of the DNS servers provided cannot be reached.
Please verify that the provided DNS servers are either in VPC CIDR range
or RFC-1918 IP ranges. In addition, that there is reachability on TCP + UDP
port 53 from this machine to DNS servers. Try running `Test-FSxADDnsConnection`.

### DnsResolution

A failure indicating we're able to connect to one or more DNS server(s)
but encountered an error resolving records. Please verify the server(s)
are correctly configured.

### InferredDomainName

In some cases, the domain DNS root provided `a.b.c` does not resolve but `b.c` does. We'll provide `InferredDomainName` = `b.c` to present this information.
This scenario may occur when a domain controller DNS
root is provided. Either confirm this inference via specifying said failure as
domain DNS root in a new call or provide an alternate value.

### InvalidDnsIpCount

Amazon FSx will accept either **ONE** or **TWO** DNS server IP addresses with **TWO** being
preferred for high-availability.

### InvalidDomain

We're unable to find any domain controllers for the domain. Please verify that
the correct domain name is entered. In some cases, `InferredDomainName` may be
populated with a possible alternative.

## Validate FSx service user credentials

### DomainControllerNetworkValidation

If a domain controller is either in VPC CIDR range or RFC-1918 IP ranges but
is not reachable, a full network validation is run to help pinpoint required
port connectivity that is missing. This test will only run on the first such
domain controller in the interest of keeping test time reasonably short for
large AD setups. If you wish to run this network validation against any other
domain controllers, try `Test-FSxADControllerConnection`.

### InvalidCredentials

Service account credentials provided are not valid.

### NoCommunicationWithDCs

This is a warning of **ONE** or more domain controllers are not reachable.
This is an error if **ALL** of the domain controllers are not reachable.
Please validate that the domain controllers have requisite
network connectivity.

See https://docs.aws.amazon.com/fsx/latest/WindowsGuide/self-manage-prereqs.html

Also see DomainControllerNetworkValidation.

### UnreachableDomainControllers

This is an error if any of the domain controllers are unreachable.
This test will execute a ping request and also validate it can bind to the ports {**88**, **389**, **686**, **9389**} on every Domain Controller.
In the interest of keeping the test time reasonably short for large AD setups, this test will only validate the connectivity of the first 100 domain controllers found.
This test will fail if any of the Domain Controllers are either unresponsive to the ping request, or if we cannot bind to the mentioned ports.
Please validate that the domain controllers have requisite network connectivity.

See https://docs.aws.amazon.com/fsx/latest/WindowsGuide/self-manage-prereqs.html

### UnauthorizedReadOnUsersContainer

Failed to resolve service account user in the Active Directory. Does the service
account have 'Read' permission on the Users container for the directory?
While this is typically granted by default, please verify access. See Appendix A: Debug Active Directory Permissions.

## Validate domain properties

### OldDomainFunctionalLevel

Domain functional level **MUST** be Windows Server 2008 R2 or higher.

### SingleLabelDomain

Amazon FSx **DOES NOT** support Single Label Domain (SLD) domains.

## Validate organizational unit

### InferredOrganizationalUnit

A common scenario is putting in name of the organizational unit
rather than the distinguished name. Matches are presented as possible
alternatives for convenience. Distinguished name is preferred for
an unambiguous reference. 

### InvalidOrganizationalUnit

Unable to locate organizational unit by the provided distinguished name.
Does this organizational unit exist and the service account has permissions
to it? In some cases, InferredOrganizationalUnit may be populated with
a possible match.

### UnauthorizedReadOnComputersContainer

When no organizational unit is provided, Computers container for the
Active Directory is used as a fall back. This error occurs if we are
not able to read the default Computers container.
While this is typically granted by default, please verify access. See Appendix A: Debug Active Directory Permissions.

## Validate Admin Group

### InvalidAdminGroup

We're unable to successfully retrieve the FSx admin group. Is the group
name correct and the service account user has permissions to read the
admin group?

### UnauthorizedReadOnDefaultContainers

Service account must have Read permission to both the Users and
Computers containers to be able to successfully look up the FSx admin group.
While these are typically granted by default, please verify access. See Appendix A: Debug Active Directory Permissions.

### AdGroupInBuiltinContainer

This occurs if the specified Domain Admin group exists but is in the 'Builtin'
container. Amazon FSx does not support Domain Admin group located in the
'Builtin' container. For more information, please see our documentation: https://docs.aws.amazon.com/fsx/latest/WindowsGuide/creating-joined-ad-file-systems.html

## Validate that provided EC2 Subnets belong to a single AD Site

For all of below, reference Appendix B: Subnet and Active Directory Site configuration

### MultipleAdSitesWithoutSubnets

This occurs if there are no explicit mappings of EC2 subnets to
Active Directory sites.

### NoAdSites

This occurs when there are no Active Directory sites at all within the domain,
this includes the default site.

### NoAdSubnetForEc2Subnet

It is required that each EC2 subnet is mapped to an Active Directory subnet.
This provides a list of subnet IDs which are not mapped.

### SubnetsInSeparateAdSites

The EC2 subnets for Amazon FSx integration are
mapped to Active Directory subnets in different sites. This can cause
high latency due to inter-site replication delay. It is recommended to map
the subnets to a single Active Directory site.

## Looking up DNS entries for domain controllers in site

This performs similar DNS resolution as the 
"Validate connectivity with DNS Servers" step except scoped to the Active Directory site
that is associated with the subnets provided. See explanation there for
possible error cases. The most likely scenario for errors is that
there are no domain controllers resolved for the site.

## Validate connectivity with at least one AD Domain Controller in AD sites

This performs similar domain controller reachability checks as the
"Validate FSx service user credentials" step except scoped to the Active Directory site
that is associated with the subnets provided. See explanation there for
possible error cases. The most likely scenario for errors is that
there are no reachable domain controllers within the site.

### DomainControllerNetworkValidation

For this test, we will run the full network validation against
a domain controller even if it is reachable to ensure the full
set of required ports is open.
If you wish to run this network validation against any other
domain controllers, try `Test-FSxADControllerConnection`.

## Validate 'Create Computer Objects' permission

For all of below, OU refers to either the OrganizationalUnit param supplied
or the Computers container for the domain if the param was not supplied.

### NoAvailableTestComputerName

We failed to generate a random computer name starting with prefix of
"amznfsxtest" followed by hex digits. Please remove all computer objects
with above prefix from the OU.

### ReadOnlyDCsInSite

All the domain controllers in the mapped Active Directory site are read-only.
Please add write-able domain controllers to the site or remap the
Active Directory subnets to another site with write-able DCs.

### UnauthorizedCreateComputerObjectsOnOU

Please make sure the service account has Create computer objects permission on the OU.
See Appendix A: Debug Active Directory Permissions.

### UnauthorizedListChildrenOnOU

Please make sure the service account has ListChildren permission on the OU.
See Appendix A: Debug Active Directory Permissions.

## Validate 'Validated write to DNS host name' permission

### UnauthorizedWriteDnsHostNameOnOU

Please make sure the service account has Validated write to DNS host name permission on the OU.
See Appendix A: Debug Active Directory Permissions.

## Validate 'Validated write to service principal name' permission

### UnauthorizedWriteServicePrincipalNameOnOU

Please make sure the service account has Validated write to service principal name permission on the OU.
See Appendix A: Debug Active Directory Permissions.

## Validate 'Reset Password' permission

### UnauthorizedResetPasswordOnOU

Please make sure the service account has Reset Password permission on the OU.
See Appendix A: Debug Active Directory Permissions.

## Validate 'This Organization' list children permission

### UnauthorizedThisOrganizationListChildrenOnOU

Please make sure 'This Organization' has List Children permission on the OU.
See Appendix A: Debug Active Directory Permissions.

## Validate 'Read and write Account Restrictions' permission

### UnauthorizedRestrictAccountsOnOU

Please make sure the service account has Read and write Account Restrictions permission on the OU.
See Appendix A: Debug Active Directory Permissions.

## Validate 'Delete Computer Objects' permission

### UnauthorizedDeleteComputerOnOU

Please make sure the service account has Delete computer objects permission on the OU.
See Appendix A: Debug Active Directory Permissions.

Note: given the lack of permissions, the computer object specified in this
field was not deleted, please manually delete it.

## Validate health of computer objects
 
### InvalidComputerObject

We're unable to successfully retrieve the computer object. Is the computer
object name correct? Has the computer object been deleted?

### MisplacedComputerObject

This occurs if the computer object has been moved to another OU after file
system creation.

### DisabledComputerObject

Please make sure that the computer objected is enabled.

# Appendix

## Appendix A: Debug Active Directory Permissions

### Active Directory Permissions

Active Directory permission include Access Control Entries (ACEs) which are
allow can be either allow or deny and can be inherited or explicit.
For a given location in tree and type of rule (explicit vs. inherited),
deny takes precedence over allow. Explicit ACEs will take precedence over
inherited ACEs. Inherited permissions from a nested container takes precedence
over inherited permissions from its ancestors. Above rules may be non-trivial
to reason through manually. Thus, is is recommended to use tooling to
visualize effective access for a given principal to a target entity.

### Visualizing Effective Access

One helpful way to visualize this is with ADSI Edit (adsiedit.msc)
[Microsoft docs](https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2003/cc773354(v=ws.10)).

Right click on target entity on which you're granting the permission(s)
(typically an organizational unit), and go to `Properties`, `Security`,
`Advanced`, `Effective Access`. `Select a user` and input the principal
to whom you're granting the permissions to (typically the
service account for Amazon FSx in this case). Hit `Check Names`, `OK`.
Hit `View effective access` to confirm that requisite permissions are
granted.

Note: the above operations should be performed with an user which
has permissions to target entities.

### Delegating Service Account Permissions

Please consider using Add-FSxADPermissions to delegate service account
permissions. This is a self-contained script (no dependencies on other files)
which can be sent to your domain administrator and executed to set up the
minimum privileges necessary for Amazon FSx. It is idempotent and supports
the `-WhatIf` flag to preview changes without performing any mutations.

## Appendix B: Subnet and Active Directory Site configuration

We recommend having a 1:1 mapping between your EC2 subnet and Active Directory
subnet and that all subnets below to the same Active Directory site.

You may skip this step if you only a single site defined (there is one created
by default called "Default-First-Site-Name").

See more in [Understanding Active Directory Site Topology](https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/plan/understanding-active-directory-site-topology).
