Version 2.0 - Correcting Failure Condition Level explanations
After having created our first Availability Group, we now want to go behind the scenes and into the details of what does happen when creating an Availability Group. As we discussed in earlier parts, WSFC plays an important role. We will look at those components more closely in the blog.
In Part 6 we described how we created the AG using the wizard in the last screen before finally creating the AG, you might have noticed the button 'Script' like seen here:
So let's look at the script and go step through step what is happening in each of those steps.
As mentioned earlier in Part 6, the creation of an AG needs to be performed by a login which is part of the sysadmin group.
Step 1: Creation of the Communication Endpoints
The communication between different instances running replicas of the same Availability Group is performed over Endpoints which need to be created. If endpoints from earlier use of Database Mirroring do exist, some attributes of those endpoints might be changed accordingly. Creating the endpoints freshly on each of the instances running replicas, the following SQL Commands are executed:
USE [master]
GO
CREATE ENDPOINT [Hadr_endpoint]
AS TCP (LISTENER_PORT = 5022)
FOR DATA_MIRRORING (ROLE = ALL, ENCRYPTION = REQUIRED ALGORITHM AES)
GO
IF (SELECT state FROM sys.endpoints WHERE name = N'Hadr_endpoint') <> 0
BEGIN
ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED
END
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [<Domain>\<user name>]
After this execution, the endpoint should be found here:
Using the 'New Availability Group Wizard', the preset default port always is 5022 as already used by Database Mirroring. The name of the endpoint is preset to 'Hadr_endpoint'. In opposite to Database Mirroring where some of the endpoints could have different roles (like Witness), all endpoints will be assigned to fulfill 'all' roles. Encryption as with DBM is enabled for the traffic between the replicas. The next command will check whether the endpoint exists and then grant connect permissions to the login which tries to create the endpoint.
The sequence of these commands is performed on each instance running a replica.
Second Step: Creation of AlwaysOn Health monitoring session
Next object created is an X-Events Sessions which monitors the health of AlwaysOn configuration. The AlwaysOn_health sessions is already pre-delivered in SQL Server. Hence it only needs to be started on server level. We will later talk about how to analyze events of this collection. On all instances running replicas the following statements are executed:
IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='AlwaysOn_health')
BEGIN
ALTER EVENT SESSION [AlwaysOn_health] ON SERVER WITH (STARTUP_STATE=ON);
END
IF NOT EXISTS(SELECT * FROM sys.dm_xe_sessions WHERE name='AlwaysOn_health')
BEGIN
ALTER EVENT SESSION [AlwaysOn_health] ON SERVER STATE=START;
END
The X-Event session can be found here:
The files are persisted to disk and can be found in the log directory besides errorlog files of SQL Server as seen here:
Third step: Create the Availability Group on the Primary
The first command which is only executed on the future primary instance of the Availability Group is the command which actually creates the AG. This command will also create the service for the AG on the Windows Cluster configuration. The command which is getting executed looks like:
USE [master]
GO
CREATE AVAILABILITY GROUP [ERPPRODE64]
WITH (AUTOMATED_BACKUP_PREFERENCE = PRIMARY)
FOR DATABASE [E64]
REPLICA ON
N'<name of replica #1>' WITH (ENDPOINT_URL =N'TCP://'<name of
replica #1>.<domain name>.local:5022', FAILOVER_MODE =
AUTOMATIC, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
BACKUP_PRIORITY = 50,
SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)),
N'<name of replica #2>' WITH (ENDPOINT_URL = N'TCP://'<name of
replica #2>.<domain name>.local:5022', FAILOVER_MODE =
AUTOMATIC, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
BACKUP_PRIORITY = 50,
SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)),
N'<name of replica #2>' WITH (ENDPOINT_URL = N'TCP://'<name of
replica #2>.<domain name>.local:5022, FAILOVER_MODE =
MANUAL, AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
BACKUP_PRIORITY = 50,
SECONDARY_ROLE(ALLOW_CONNECTIONS = NO));
GO
As you can see the first part defines the name of the AG and also defines the database(s) which go into the AG. Then we see a section for each of our replicas in the command. Since we got one primary and two secondary replicas, we are getting three parts. Each of the sections defines the endpoint URL for communication, the Availability mode and the Failover mode. It also is defined whether the replica once it is in the role of a secondary replica is accepting read-only connections. In our case as seen in the wizard in part 6, we didn't allow read-only connection, hence this part 'ALLOW_CONNECTIONS = NO'. Checking the Windows Cluster Administrator, we see the AlwaysOn Service with the name of the AG:
Fourth Step: Create Availability Group Listener
The fourth step in the setup of an Availability Group is to create the listener. It is created on the same server as the AG was created (future primary replica). In our case we were running one secondary replica in another subnet. Hence two static IP addresses will be assigned to the Listener.
USE [master]
GO
ALTER AVAILABILITY GROUP [ERPPRODE64]
ADD LISTENER N'ERPPRODE64vn' (
WITH IP
((N'xxx.xxx.xx4.50', N'255.255.255.0'),
(N'xxx.xxx.xx8.148', N'255.255.252.0') ) , PORT=1433);
GO
In part 6 we decided to have the Listener listen to port 1433 as if it would be a normal SQL Server instance. With this we avoided to define a different port to the SAP connection string since SQL Server SNAC on the client side always will use port 1433 as default for a first connection. The execution of this command can take one to two minutes since the following steps are executed:
- A service with the name of the AG Listener is created. It is getting two network resources assigned to it. Each network resource represents an IP address of one of the subnets. If all replicas are running in one subnet, only one network resource will be assigned to this cluster service. The service representing our Availability Group will take a dependency on this new network resource.
- The two IP addresses are registered in the DNS with the name of the listener. In order to register both addresses in the DNS, a functionality called RegisterAllProvidersIP property is used. For more details on this functionality, please check out: http://support.microsoft.com/kb/947048
Checking the Windows Cluster Administrator, the picture changed compared to before to this one:
As you can see the new network resource is up and running with one IP addresses online and the other IP address offline. In case of having all replicas within one subnet one IP Address would exist only and would be online all the time. In our case with two subnets, only the IP address for the subnet, the current primary replica is in, will be online. The IP address of the subnet not running the primary is offline.
The cluster service of our AG listener is dependent on either one of our two network resources being online as seen here:
The cluster service of our AG again depends on the cluster service of the AG listener as seen here:
Checking on the 'possible owners' of the Availability Group service, we realize that so far these services are only allowed to run on the primary replica like seen here:
In opposite to the AG listener service which has all three nodes set as 'possible owners' already. To enable the Availability Group service to at least one other node, we go through step #5
Fifth step: Have the secondary replicas join the Availability Group
Now it is time for the secondary replicas to join the Availability Groups which then enables he Availability Group Service to run on the nodes of the secondary replicas as well or in other words to failover to those nodes.
The command which is executed on every mode is:
ALTER AVAILABILITY GROUP [ERPPRODE64] JOIN;
GO
After this command is executed on both nodes the configuration should be ready to use. Checking the possible owners of the AG service we now see this:
We realize that our node sapdenali7 still is not listed as possible owner. Reason for this is that we defined that sapdenali7 should not be target of an automatic failover. Hence it is not listed as possible owner. If we would change the configuration of our AlwaysOn configuration in a way that all replicas have a failover mode of 'manual', only the node running the primary replica would be checked as possible owner. With this our AlwaysOn configuration is as we created it in part 6 in the wizard.
Please note: We listed the commands here not to motivate you to execute the creation of an AG with single SQL commands, but just to show what is going on behind the scenes. We still recommend using the excellent 'New Availability Group' wizard to create an AG. It simply is the easiest way to get to an AG and also the least complex.
Please note: We also explained some background of what happens on the WSFC level for understanding the principles and what is happening in which step of the setup and what you'll find on services in the WSFC configuration at the end. AlwaysOn is not designed to issue failover activities or any other administration activities with the Windows Cluster Administrator. These kinds of activities should be executed out of SQL Server Management Studio (SSMS).
A sixth step to eventually configure
Now that we got an Availability group configured and have all the changes done to the SAP configuration as described in part 6, one of the first things we want to do is to see whether the failover as promised with the AlwaysOn SQL Server feature really does work with SAP. Hence we have some users in the SAP system play around with it and issue a failover manually (see part 8 of the series). As expected it all works. Since it did work so great the first time, we will try it a second time and a third time. All is great. Now we are getting a bit braver and also want to check whether the unexpected failover will work. In order to simulate an expected failover, we either reboot the server the primary replica is running on or simply stop the SQL Server instance. So we perform a first of such simulated 'unexpected' failovers. Since it did work very well, we do the same 'simulation' again. Since we don't trust AlwaysOn yet, we think about another unexpected type of simulating a condition which forces an unexpected failover and immediately execute on it and are shocked. The third time we try to simulate an unexpected issue in the last few hours, nothing works anymore. The failover is not getting executed. The SAP application is not accessible anymore. And the Availability Group goes into resolving state as seen here:
Checking in the Windows Cluster Administrator, we will see this situation:
All attempts to bring up the AG service will fail. Also checking for critical events associated with this cluster service in the cluster manager will not reveal a lot, except that the start of the server on the designated primary failed. Checking the developer traces of the SAP workprocesses, we can find masses of these entries:
C ERROR: -1 in function SQLConnectWithRetry (SQLConnectWithRetry) [line 2304]
C (983) [42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Unable to access database 'E64' because its replica role is RESOLVING which does not allow connections. Try the operation again later.
So obviously our 3rd simulation of an unexpected failover did not work, despite the first two 'simulations' working out perfectly. Reason has nothing to do with AlwaysOn or SAP, but with the configuration of the Availability Group cluster service. The problem is very well demonstrated below. We are allowing two unexpected failover within a 6h range only
Means if we want to test AlwaysOn through its paces and even eventually for later productive purposes we need to change these settings in order to be able to perform our simulations of unexpected issues causing automatic failover in a shorter time frame.Dependent on the preferences and priorities for production systems we also recommend adjusting these settings. You also can think about allowing or disallowing immediate failback or allowing a time range where it is allowed. What we certainly don't end up is some kind of toggling behavior where every other minute a failover happens between the two replicas configured for automatic failover. On the other side, a failover, in case of a shaky infrastructure, every 1h could be less of a pain than having standstill of the system after two successful failovers. As our tests showed the manual failovers are not affected by these settings.
E.g. in Microsoft's SAP ERP deployment leveraging AlwaysOn, we changed the settings to 5 failovers within 1h.
Please note: In the currently deployed systems of SAP running against AlwaysOn configurations no other WSFC properties than the one described above did get changed.
A seventh step eventually to configure
As with Database Mirroring, there is a session timeout definition for the communication between the primary and the secondary replicas. Some of our customers needed to change this default value of 10 seconds in DBM to higher values due to network conditions they experienced in their network infrastructure. In AlwaysOn we got the exact same default for these communication sessions again. The time out values can be changed in the properties of the availability group as seen here:
The values should be set to the same value for all the different replicas.
Please note: The value of the Session Timeout in AlwaysOn does have some different impact than with DBM. In DBM hitting this timeout multiple times in a row could have resulted in a failover or other consequences since the communication over the endpoints were also used to detect whether an instance is healthy responding or not. This is different in AlwaysOn as we will see in the next two sections
Number Eight HealthCheckTimeout, Failure Conditions and other settings
AlwaysOn does allow setting failure conditions for a failover. Or in other words one can influence the conditions which need to be met for a failover. There is a default setting by SQL Server when generating the AG. After the AG got generated, one can change those conditions. To check what the settings are one can go into the Windows Cluster Manager, go to our AG service in the Cluster Manager and mark the service in other resources. Get to the properties via a right click as seen here
As you can see the FailureConditionLevel is 3 which is in the middle of the available levels 1-5 for non-shared disk deployments (please note that the failure condition levels for shared disk deployments range from 0-5). For the exact definition of the levels, please check out the conditions associated to different levels in this article: http://msdn.microsoft.com/en-us/library/hh710061.aspx#FClevel
In order to explain a bit more of that article we need to dive a bit deeper into the way how failure detection is done. It got changed quite a bit compared to earlier SQL Server releases. As with the shared cluster scenario (FCI), in the non-shared cluster scenario, the Windows resource DLL is the one instance ultimately triggering failovers based on the health of the AlwaysOn configuration. The Windows side resource DLL is complemented by the SQL Server side Resource DLL which contains all the details on the health checks to be executed and on deciding on whether to initiate a failover based on the test results it gets when checking the health of the principal replica. The SQL Server resource DLL and Windows Side resource DLL are communicating over a set of interfaces. On the SQL Server side, we radically changed the tests we evaluate health of SQL Server primary replicas. Here is roughly how it works:
- The SQL Server resource DLL is establishing a lasting connection to SQL Server
- It calls a stored procedure sp_server_diagnostics with a repeat interval (see also documentation on the stored procedure on: http://msdn.microsoft.com/en-us/library/ff878233.aspx )
- The resource DLL now expects the result set being sent in the interval set.
- Based on the result sets hitting time out thresholds and the issues reported in the result sets of the sp_server_diagnostics procedure and the FailureConditionLevel, the resource DLL now decides whether the primary replica is healthy or not
Means some of the settings on the properties below are related to each other:
- The resource DLL is calling the stored procedure sp_server_diagnostics with a repeat interval which is 1/3 of the HealthCheckTimeout (default = 30000ms). If the HealthCheckTimeout threshold is exceeded and not a single result set has been received by the resource DLL, the server is reported as hung.
-
Additional the resource DLL will now check the FailureConditionLevel and analyze the result sets of sp_server_diagnostics. If you checked out the documentation for those condition levels, you see that:
- Level 1 checks if SQL Server service is down.
- Level 2 additionally checks if the SQL Server instance is not responsive (Resource DLL cannot receive data from sp_server_diagnostics within the HealthCheckTimeout settings).
- Level 3 additionally checks if System stored procedure sp_server_diagnostics returns ‘system error’.
- Level 4 additionally checks if System stored procedure sp_server_diagnostics returns ‘resource error’.
- Level 5 additionally checks if System stored procedure sp_server_diagnostics returns ‘query_processing error’.
The checks are accumulative. Means the checks which are done in level 5 are all checks of the lower levels plus the checks of level 5.
All the checks of the different levels are accumulative. Using AlwaysOn Availability Groups, there are special checks done from Level 1 and Level 2 on. One check is whether the lease timeout is hit and another one is whether the healthCheck Timeout is hit. Both values can be adjusted as shown in the graphics above.
As a standard condition level SQL Server is installing an AG with level 3. Means it checks whether the SQL Server process is up, whether the sp_server_diagnostics stored procedure responds within the HealthCheckTimeout and whether there were 'system errors' reported in the result set. This pretty much should come closest to what earlier releases of SQL Server did.
There is another setting to keep in mind. As mentioned with the Connection Timeout between the endpoints, there is a change where the connection time out between the endpoints. In DBM this timeout threshold could be used as indication that a SQL Server instance is isolated network wise and hence we could decide to move an instance out of the principal role to avoid a split brain scenario. In AlwaysOn that connection timeout does not affect these kinds of decisions. However we still need to exclude split brain scenarios. One of the constructs we use to avoid such split brain scenarios is a Lease Timeout which by default is 20000ms. The Lease Timeout needs to cover a scenario like this:
- The SQL Server instance running the primary replica of an AG is detected unresponsive by the failure detection logic we got.
- As a result a failover is getting issued.
- However the resource DLL is not getting through to the unresponsive SQL Server instance anymore
- Therefore the SQL server instance would assume that it still is in the role of the primary replica and eventually sooner or later accept queries again and respond to queries.
- To avoid such a scenario, the SQL server instance running a primary replica of an AG needs to get a lease from the resource DLL before the time out threshold hits. If the lease is not renewed within that threshold, the primary replica will be moved from primary into the status resolved and (if not happened yet), a failover to a secondary will be initiated
Note: All the SAP tests and deployments were done with the default settings of those properties so far.
Well, this was it for part #7 of this series. In the next part we go deeper into the SAP Failover behavior which also will explain a bit how the SAP re-connect mechanism works.