Month: October 2014

Synchronous DB Mirroring and running in an ‘exposed’ state

A few people I have spoken to as of late have been quite surprised to find that their ‘zero data loss’ Synchronous Mirroring solution can actually allow data loss!

In fact, data loss is ALWAYS a possibility, regardless of your budget and HA solution in place. It’s just that if you spend enough money it is very, very, unlikely.

I fear that some businesses are running under the assumption that they will not have any data loss if they have a Synchronous DB Mirroring solution in place, so I want to demonstrate here that it is possible for Mirroring to open up a window of vulnerability whereby it is running exposed and you are no longer safeguarded against data loss.

Consider the following example:

A basic Synchronous DB Mirroring configuration from Site A, to Site B. With manual failover. This scenario DOES NOT include the use of a WITNESS server, and was previously known as “High Protection” mode.

This could be represented by the following simple diagram.

Mirroring

Now let’s take a brief refresher on Synchronous Mirroring so that we can set the stage… when an application submits a transaction to the database in Synchronous mode, this transaction is first committed at the Mirror, then at the Principle, before finally giving ACK back to the application. Therefore we have 2 copies of our data hardened to the transaction log on two different servers, at two different sites.

This obviously comes at the cost of latency, so a DB Mirror in Synchronous mode should not have a large geographical separation from its Principal.

So let’s look at a couple of potential disaster scenarios:

  1. The server hosting the Principal database is lost. The application starts chucking errors and users cannot connect. You can redirect your application to the Mirror database without a single lost transaction. Albeit some small downtime (less if you use a witness!)
  1. The server hosting the Mirror database is lost. (or even just the connection to the Mirror) The application cannot commit at both databases so it will wait for the period of time specified In the PARTNER_TIMEOUT database Mirroring property (default 10 seconds), before committing any transactions.

At this point we have no scope for data loss within the Mirroring scenario. But what happens after that 10 seconds is where things get interesting…

After the amount of time specified in DB Mirroring PARTNER_TIMEOUT (10 seconds by default, remember) the Principal database will now start accepting transactions again, but it is now operating in an ASYNCHRNOUS state!

It is stockpiling all of the transactions on the Principal database so that it can apply them to the Mirror when it comes back online. Until such time, it is no longer keeping two copies of your data.

You can change your PARTNER_TIMEOUT value using the following TSQL:

Partner_Timeout

Do consider however that there is not generally a good reason to change this value. For example if you do not want data loss, then whether you revert to ASYNCH after 10 seconds or 30 seconds is irrelevant.

Note that a value cannot be specified that is less than 5 seconds. (This would too easily cause false failovers if using a Witness). And a value greater than 30 seconds is going to cause clients to receive timeouts.

So what are the implications of PARTNER_TIMEOUT? Well it means that if we lose the Mirror, after 10 seconds we are running ‘exposed’. Or in other words, if we lose the server hosting the Principal database during this time we can no longer guarantee zero data loss.

This by design makes sense. DB Mirroring is a High Availability  technology and is designed to keep our service available and online.

But what if the risk of experiencing data loss was catastrophic to the business!? Perhaps to some businesses an outage is the preferred result over running the risk of losing a single transaction…

So when your Mirror is unavailable, after investigating the problem you may decide to:

  • Take no action and allow the Mirror to catch-up. If for example it was a temporary network issue. (running exposed)
  • Completely re-establish the Mirroring session to the same or another server if it was completely lost. (running exposed)
  • Stop the application until such time as you have another Mirror. (No data loss)

In most cases it is unlikely that you would stop the application and prevent users from connecting just because of ‘potential’ data loss, but this is something to keep in mind if you are in a shop where the target RPO is zero.

Note that the above is just an example scenario. If you have a REAL need to recover to a point-in-time, then you will want to couple your DB Mirroring with a SAN and other technologies such as Failover Clustering, or AlwaysOn availability groups.