Exchange BC/DR Part 1: Choosing an Email Continuity Solution

These days few phrases are hotter in the technology industry than “Business Continuity and Disaster Recovery”, or “BC/DR” for those of you who love a good buzz word. While BC/DR is nothing new, disasters such as 9/11 and Hurricane Katrina have highlighted the fact that almost anything can, and given enough time; will happen. Whatever it may be, widespread power outages, flooding, hurricanes, tornadoes, or terrorism, if your business doesn’t have a BC/DR plan, your business doesn’t have a prayer.

Background

Recently I was asked to lead my company’s BC/DR initiative. The first phase of the larger project was to find and implement a BC/DR solution for our Microsoft Exchange Messaging systems. What I learned during the evaluation process is that, for the most part, current Exchange Continuity offerings can be classified within 3 categories:

  1. 3rd-party hosted systems: A 3rd party hosts a full or partial copy of your Exchange data in remote data centers. Depending on the provider, and how much your willing to spend, the size of the available archive varies.
  2. Software-based, internally hosted systems: Just like may other applications in your environment, you buy a server (or several servers), install and configure the application, and you’re good to go.
  3. Appliance-based solutions: Install an appliance, perform a very simple configuration, and you’re good to go.

Before I go into details about why we chose the solution we did, here’s a quick breakdown of our messaging environment.

Our current architecture consists of a single Microsoft Exchange server with 3 mailbox stores, as well as public folder data. The total size on disk of all of our Exchange data is under 200GB. We also have a single front-end server running Outlook Web Access. In addition to our corporate headquarters, we have 2 remote sales offices, and nearly 100 field reps working out of their homes. In total we have fewer than 500 users, the majority of whom connect to the server via Microsoft Outlook (2003, 2007) running in cached mode. Outlook connections are initiated from local PCs, Terminal Servers, and remotely over our VPN or via RPC over HTTP. We do not use Blackberry devices, but we do rely heavily on Windows Mobile Messaging from Smart Phones and other mobile devices.


What We Didn’t Choose

I don’t want to name the names of the solutions we didn’t choose. I think it’s important that you start any evaluation process with a clean slate. Besides, every environment is different, and what worked for us very likely will not be what makes sense for you. I will, however; explain why solutions were eliminated, and why we arrived at our eventual decision.

3rd Party Hosted

The 3rd party solution we looked at was recommended to me by a former coworker. It is a Linux-based solution. There’s nothing wrong with Linux, of course, but it’s important to point out that with this type of solution your replicating Windows data to a Linux environment; which means it’s not going to work exactly like Exchange on Windows does. Depending on what features of the Exchange platform you utilize, this may or may not matter.

The footprint of the hosted solution was relatively small. We would have needed to add one server to facilitate replication. The simplicity, however; comes at a cost. There is no support for OWA, Windows Mobile Messaging, or integrated VOIP applications such as Cisco Unity. While we might have been able to live without OWA, the lack of Mobile Messaging and Unity support were show stoppers.

On the positive side, the solution is hosted in highly-secure, tier 3 data centers. It’s about as secure as an off site solution can be. While not relevant for my environment, Blackberry support is extensive. A variety of archival and discovery options are available as well. Neither is part of the standard continuity package, however; and the costs rise quickly as you add those features to your annual plan (we were quoted roughly 20K annually, with a 30 day available archive).

Software-based

The two software-based solutions we looked at were remarkably similar. Even the user interfaces had a number of frightening similarities. Both solutions would require us to place bridgehead servers in both our headquarters, and our disaster recovery site to facilitate replication. Additionally, one of the solutions we considered would have also required that we place a Microsoft SQL Server in each location.

Both solutions provided explicit support for Blackberry devices as well as OWA. In each case the vendors told us that, although they had no explicit support for Mobile Messaging or Cisco Unity, they didn’t see any reason why they wouldn’t work. I was a little put off by the fact that neither could say for sure if they had any customers using either Mobile Messaging or Unity services.

As I said, both of the software-based solutions worked in a similar manner. When a disaster event is triggered, the software, for lack of a better description, rewrites a number of Active Directory Exchange attributes, such that the new values point to the server running the BC/DR software, instead of your (presumably unavailable) Exchange server. Replication is achieved by replaying Exchange log files on the BC/DR server. There are validation checks in place that prevent corrupt data from being replicated to the DR device. Neither solution required any additional software be installed on the production Exchange server. This is important for two reasons. Firstly there’s not additional load placed on your Exchange server. Secondly, should you have any issues with your Exchange server, it’s a safe bet that Microsoft isn’t going to work with you until after you’ve uninstalled the agent.

While corruption prevention should be a no brainier, not all email BC/DR solutions do it. Be sure that any solution your looking at can prevent corrupt data from being replicated to your DR site.

The solution that requires SQL servers also has available archiving and E-discovery options. As is always the case, the additional options will cost you more than a little bit extra. We looked at a demo of the archiving options, and although it appears to be an adequate solution, we weren’t keen on getting locked into anything with out taking a look at similar products from Zantaz, Symantec, and EMC.

While we didn’t have any major reservations about the multiple Active Directory rewrites, my team, almost to a man, felt like it perhaps wasn’t the most graceful way to achieve a seamless failover.

Among the software-based solutions, the feature we liked most was the ability to failover on multiple levels. The appliance-based solution I’m about to discuss is an all-or-nothing failver solution; meaning that to failover a single mailbox, you have to failover the entire server. The software-based solutions, because they work by modifying Active Directory attributes, would allow us to failover an individual mailbox, a single-store, or target failover based on other attributes such as group membership. All of us thought it was a really cool feature, but at the same time, none of us could come up with a realistic scenario where we’d utilize it.

In the end, the single most important factor that steered us away from the software-based solutions was cost. Although software licensing itself was reasonable enough, the actual real cost of procuring 4-6 additional servers, 4-6 Windows Server licenses, 1 additional Exchange license, and in the case of one solution, 2 Microsoft SQL server licenses, coupled with the fact that we’d have an additional 4-6 devices to maintain on a daily basis, was more than a little prohibitive.

What We Chose

The appliance-based Teneros ACA

After taking a close look at 4 solutions, we decided on the V3000 series appliance from Teneros. In terms of how it actually works, the Teneros solution is unique. Instead of having to worry about multiple AD rewrites, the Teneros appliance simply assumes that IP of your production Exchange server. To every device on your network (with one very important exception - more on this in part 2), the Teneros ACA looks like the Exchange server. It’s able to do this, because unlike every other Exchange Continuity solution we looked at, the Teneros appliance is actually a Microsoft Exchange server.

In terms of the architecture, the Teneros continuity appliance (a separate appliance is required for DR) sits between your network switch and your Exchange server (the nic cable from Exchange plugs directly into the appliance, not a patch panel, not into your switch). It’s admittedly a little bit (lot) scary at first, but once you see how well it works, any lingering concerns you have will dissipate quickly. The details of how it does what it does are somewhat complex, but I’ll cover them in a little bit more detail in part 2 of this series.

There were several things that sold us on the Teneros solution:

  • Because it’s an Exchange server, to end users it behaves EXACTLY like your production Exchange server.
  • The appliance is 100% supported by Teneros from their NOC. They take care of monitoring, maintenance, and updates (more about this in part 2). From our perspective it’s near-zero administrative overhead.
  • Teneros provides explicit support for OWA, Mobile Messaging, and believe it or not, Cisco Unity services. Explicit support is also provided for Symantec Enterprise Vault, GoodLink, and of course Blackberry.
  • When factoring in the cost of procuring additional servers, Microsoft licensing, and administrative overhead, pricing was a wash.
  • Like the other solutions we looked at, the Teneros appliance has built in corruption prevention, and does not rely on agents to achieve replication.
  • Failover can happen either manually or automatically based on thresholds you specify. Actual failover is guaranteed within 60 seconds, although in practice it happens much faster.
  • The thoroughness of their quoting process. Before Teneros would give us an concrete pricing, they asked us to complete a fairly extensive site survey which covered all common aspects of our messaging infrastructure, as well as some pretty important, but lesser known stuff, that none of the other vendors bothered to ask about. In short, they didn’t assume anything about my environment.

As we were working through our decision process, the Teneros engineering team was extremely responsive to the multitude of questions I threw at them. While you’d expect exactly that type of service from anyone who was hoping to take a large sum of cash from you, we’ve discovered this is far from always the case.

In part 2 I’ll cover installation, configuration, and use of the Teneros Application Continuity Appliance in our production environment.

Feel free to comment on how you have fulfilled your Exchange BC/DR requirements


Tagged as: , , , ,
author

Tony works as a Systems Administrator for an Internet content provider. When he's not working at his "real job", he spends as much time as he possibly can playing and writing about golf. He also enjoys photography and spending time with his wife and 2 dogs.
Email this author

Comments

Trackbacks

There are no trackbacks