High Availability | Disaster Recovery solutions in SQL 2012


High Availability|Disaster Recovery solutions in SQL 2012

HA – DR in SQL 2012 and Windows 2012, let see the landscape of Available Options for SQL HA-DR specific architectures is improving significantly. Below is a summary of the high availability and disaster recovery solutions available for a SQL environment.

Start with, it is important to outline some key terminology.

*       RTO (Recovery Time Objective): The duration of acceptable application downtime, whether from unplanned outage or from scheduled maintenance/upgrades. The primary goal is to restore full service to the point that new transactions can take place.

*       RPO (Recovery Point Objective): The ability to accept potential data loss from an outage. It is the time gap or latency between the last committed data transaction before the failure and the most recent data recovered after the failure. The actual data loss can vary depending upon the workload on the system at the time of the failure, the type of failure, and the type of high availability solution used.

*       RLO ( Recovery Level Objective): This objective defines the granularity with which you must be able to recover data — whether you must be able to recover the whole instance, database or set of databases, or specific tables.

*       High Availability (HA): The principal goal of a high availability solution is to minimize or mitigate the impact of downtime.

*       Disaster Recovery (DR): Disaster recovery efforts address what is done to re-establish availability after an outage. It refers to restoring your systems and data to a previous acceptable state in the event of partial or complete failure of computers due to natural or technical causes.

*       Quorum: Generally speaking, a quorum is the minimum number of members of an assembly, who must be present before the members can conduct business. The requirement for a quorum protects against unrepresentative action in the name of the body by a disproportionately small number of individuals (adapted from Bing Dictionary and Wikipedia).

*   In the context of Windows Clustering, the quorum configuration determines the number of failures the cluster can sustain.

*   In the context of SQL database mirroring or availability groups with automatic failover, quorum makes sure that a database is owned by only one partner or replication at a time.

Single Site SQL Failover Clustered Instance

**     Primarily an HA feature

2.     Uses time tested Windows Clustering technology. Advanced heartbeat, failover, quorum model, and quorum voting options available in the latest versions of Windows.

3.     Protects against Machine failures (ex. CPU/motherboard causes machine to become unresponsive or failed)

4.     Protects against OS failures (ex. blue screens)

5.     Protects against SQL Instance failures (ex. SQL hangs/ AV’s)

6.      Application connects using one virtual server name, which is not tied to a specific machine name. The current owning node is abstracted via the virtual server name.

7.     Works well with most other features  such as log shipping, replication,  and asynchronous availability groups

8.     Manages external dependencies well upon failover. All system and user databases reside on a shared drive – failover simply provides access to the same shared drives to the new owning node. Registry keys are replicated via cluster checkpoints.

9.       Supports rolling upgrades scenarios

10.       Instance-level protection without data redundancy (Instance [-]DATA RLO)

**     Considerations:

a   There is no concept of a secondary database. Does not maintain a redundant copy of the data and so does not protect against  an I/O subsystem failure

b   No special requirements with respect to database recovery models

c   Must ensure nodes are maintained properly (patching levels should match)

d   A major outage like a datacenter power failure, or failure of the network link to the primary data center is not addressed because all nodes are within the same datacenter.

Database Mirroring_01

Automatic Failover_01

1.     Primarily a DR solution. Can be used as HA when used with synchronous/automatic failover options.

2.     Protects against I/O subsystem failure on the primary server or datacenter

3.     Log stream compression available for increased performance

4.     Automatic page repair

5.     Automatic failover is possible, requires a 3rd witness server

6.      Automatic redirection to secondary is possible

7.     Can choose between several mirroring and failover options: synchronous or asynchronous streaming, automatic or manual failover

8.     DB Level protection (database RLO)

9.       Considerations:

*   Database must be using the FULL recovery model

*   Database on secondary is not writeable

*   Database on secondary is not readable (can create database snapshots to work around this, but this can quickly become burdensome)

*   Additional management overhead of third witness server

*   Deprecated in SQL 2012

*   Only 1 secondary allowed

*   Can only failover at database granularity – cannot group a set of databases to failover together

*   If automatic client redirection is required, the client must change the connection string to include the new FAILOVER_PARTNER option. Also, the client connectivity components must be at a version which supports the new connection string – so may not be suitable for legacy applications.

*   Database failover occurs at the user database level, not at the server level. Note that the system databases master, tempdb, and model cannot be mirrored and that only one secondary database is supported with database mirroring in SQL Server 2008. Databases have dependencies on objects that reside in other databases such as the master database. These objects often include logins, jobs, certificates, and server audits but can include other objects. A process (normally involving scripts) needs to be adopted to keep objects that are external to the database synchronized across the servers. Suitable for environments where these external dependencies are not changed very often (which tends to be true in most environments).

*   Some features are not supported, such as cross database transactions.

*   Special configuration considerations when using Replication. Visit here for more info

Log Shipping in 2012

tlogshipping

1.     Primarily a DR solution

2.     You Can use compressed backups

3.     Very good way to validate the transaction log backups you are already taking

4.     You Can run in a delayed restore mode to protect secondary from human error on primary site (ex. If TableX was deleted on Primary by mistake, and there is a 2-hour delay, you may have enough time to be able to recover TableX from the Secondary)

5.     Database Level protection – RLO

6.     You Can have multiple secondaries

7.     Very good option when other DR options are not possible or supported by a vendor since it is essentially a constant transaction log backup and restore sequence.

8.     Considerations as fellows:

**   Database must be in Full or Bulk Logged recovery model

**   No automatic failover

**   No automatic redirection

**   Can only failover at database granularity – can’t group a set of databases to failover together

**   Database on secondary not readable during restore

**   Database on secondary not writeable

**   Databases have dependencies on objects that reside in other databases such as the master database. These objects often include logins, jobs, certificates, and server audits but can include other objects. A process normally involving scripts needs to be adopted to keep objects that are external to the database synchronized across the servers. Suitable for environments where these external dependencies are not changed very often which tends to be true in most environments.

Peer 2 Peer T Replica :

p2p-t-replica

1.     Primarily a scalability solution

2.     With Peer-To-Peer replication, we have the capability to keep multiple copies of the data on multiple sites, providing a scale-out, HA (if bandwidth permits) and DR solution. Because data is maintained across the nodes, peer-to-peer replication provides data redundancy, which increases the availability of data.

3.     Database can be in Simple recovery model

4.     Database on secondary is both readable and writeable

5.     No automatic redirection

6.      No concept of a failover since all  “nodes” are always “active”

7.     Can have multiple nodes participate in the topology

8.     Can choose to replicate only a subset of tables within the database (table level or filtered rows from table RLO)

9.       Since all databases are writeable, you can create different indexes optimized for reporting environment, optimal solution if you are running very expensive reporting queries which need custom indexes

10.       Considerations are fellows:

**   Client will have to change connection string to an available node if the node it is connected to becomes unavailable.

**   Replication is mainly intended for reporting offloading. Therefore the architecture builds an interim queue in the distribution database, causing higher latencies than would be observed with the Availability Groups architecture.

**   Does not handle conflicts gracefully, application owner will need to ensure data modified at different sites does not conflict (app should logically partition the updates). P2P replication will simply overwrite data (lost updates) unless configured to fail when a conflict occurs, which requires manual intervention Visit here to see more info.

**   Does not support initialization and re-initialization via a snapshot

***   Replicated tables must have a primary key

Multi-Site Clustering with SAN Replication:

Multi-Site Clustering with SAN Replication

Multisite Clustering with SAN Replication

1.     HA and DR solution using Failover Clustering in combination with SAN replication

2.     Protects against I/O subsystem failure on the primary datacenter

3.     Data disks can be synchronously or asynchronously mirrored

4.     External dependencies handled well as in typical single site failover clusters

5.     Instance-level protection WITH  data redundancy (Instance [+] DATA RLO)

6.     MS SQL 2012 supports multi-subnet environments, eliminating the need for a VLAN

7.     Considerations are fellows:

*   Additional expense of SAN replication technology

*   Additional complexity of managing and configuring SAN  replication technology

*   Quorum model and node vote options should be configured to ensure an outage at the DR site will not cause the cluster to lose quorum.

Multisite Clustering with NO SAN Replication and with NO shared storage:

Multi-Site Clustering with NO SAN Replication1.     SQL 2012 supports multi-subnet environments, eliminating the need for a VLAN

2.     Will become even more relevant:

*      SQL 2012 now supports databases on SMB

*      Windows 2012 clusters support 64 nodes

*      Hyper-V Replicas feature on Windows 2012

*      Hyper-V support for SMB

3.     Used often to support DR within a SQL 2012 Availability Group topology.

MS SQL Availability Groups:

SQL Availability Groups:

SQL Availability Groups

1.     HA (synchronous, local datacenter) and DR (asynchronous, remote datacenter) solution

2.     Protects against I/O subsystem failure on the primary server or datacenter

3.     Log Stream compression available for increased performance

4.     Automatic Page Repair

5.     Automatic redirection to secondary is possible via an AG listener

6.     Automatic failover is possible, does not require a 3rd witness server, windows cluster is used for quorum

7.     Up to 4 secondaries

8.     Quorum is based on all nodes of the underlying Windows Cluster

10.    You Can failover a set of databases as a group (database set RLO)

11.    You Can choose between several mirroring and failover options: synchronous or asynchronous streaming, automatic or manual failover

12.    You Can offload T-Log backups  on secondary

13.    Considerations are fellows:

*   Database must be using the FULL recovery model

*   Database on secondary is not writeable

*   Database on secondary can be set as readable (Readable or READ-INTENT secondary options)

*   Note that the system databases master, tempdb, and model cannot be part of an AG. Databases have dependencies on objects that reside in other databases such as the master database. These objects often include logins, jobs, certificates, and server audits but can include other objects. A process (normally involving scripts) needs to be adopted to keep objects that are external to the database synchronized across the servers. Suitable for environments where these external dependencies are not changed very often (which tends to be true in most environments).

*   Some features are not supported, such as cross database transactions visit here

*   Special configuration considerations when using Replication visit here

*   Quorum model and node vote options should be configured to ensure an outage at the DR site will not cause the cluster to lose quorum.

References:                  

–        High Availability and Disaster Recovery (OLTP)—a Technical Reference Guide for Designing Mission-Critical OLTP Solutions

–        High Availability Solutions (SQL Server)

–        High Availability and Disaster Recovery at ServiceU: A SQL Server 2008 Technical Case Study

–        Microsoft SQL Server AlwaysOn Solutions Guide for High Availability and Disaster Recover

–       Quorum: How a Witness Affects Database Availability (Database Mirroring )

–        Quorum vote configuration check in AlwaysOn Availability Group Wizards (Andy Jing)

–        WSFC Quorum Modes and Voting Configuration (SQL Server)

–        Cross-Database Transactions Not Supported For Database Mirroring or AlwaysOn Availability Groups (SQL Server)

–        Database Mirroring (OLTP)—a Technical Reference Guide for Designing Mission-Critical OLTP Solutions

–        How to create multiple listeners for same availability group (Goden Yao)

–        Peer-to-Peer Transactional Replication

–        Selecting a High Availability Solution

–        Automatic Page Repair (Availability Groups/Database Mirroring)

–        http://blogs.technet.com/b/josebda/archive/tags/smb3/

–        MMS 2013 Demo: Hyper-V over SMB at high throughput with SMB Direct and SMB Multichannel

–        Configure Microsoft SQL Server to Use Scale-Out File Server

–        Support policy for Microsoft SQL Server products that are running in a hardware virtualization environment

 

Mehboob

Microsoft Certified Solutions Associate (MCSA)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s