Exchange BC/DR Part 2: Installing Teneros in the Production Environment
I mean no disrespect to the guys over at Information Week, InfoWorld, and Windows & .Net Magazine, who do an outstanding job with product reviews, it’s just that most of those reviews usually contain a sentence that starts with “We tested <insert product name> in our lab and…” That’s all well and good, but for the rest of us, even if we’re fortunate to have a test environment that’s even remotely on the scale of our production environment, we eventually have to move new products out of the safety of the lab, and into our production environments where they can do some real damage. As any experienced Systems Administrator knows, things don’t always work as they did in the lab, and when they don’t, well, that’s when the real fun starts.
As if the transition from test to production isn’t nerve racking enough, any time you’re working with mail systems, the fear factor is amplified. Let’s face it, end-users know when email isn’t working, and they won’t hesitate to let you know that they know that it’s not working…and that they’re pretty sure it’s your fault. I can deal with corrupt SQL servers, downed web servers, and countless other issues that crop up from time to time; all without skipping a beat, but nothing, and I mean nothing, makes me sweat like an Exchange problem. It’s an undeniable fact that the communications systems are the lifeblood of most every company. Mail has to flow, or people can’t work, and your business grinds to a halt. That’s where Teneros comes in.
The Teneros ACA (Application Continuity Appliance) guarantees that mail will continue to flow, even when your Exchange server is less than healthy. And the best part…your users probably wont’ even notice.
In Part 1 of this post I detailed some of the reasons why we chose Teneros over competing solutions. It also didn’t hurt that the company won Microsoft’s 2007 Partner of the Year award in the appliance division. My thinking was that if Microsoft is willing to say that Teneros works for Exchange, chances are it does. What follows is an overview of my personal experience implementing the Teneros ACA for High Availability in my production environment. We also ordered the Disaster Recovery Appliance which I will cover in Part 3.
Phase 1: Procurement
Purchasing Teneros wasn’t unlike purchasing any other technology solution. We volleyed numbers back and forth, and were given a date when we could expect the appliances to arrive. We were also told that an engineer would be on site to help with the installation shortly thereafter.
While our Teneros appliances didn’t arrive ahead of schedule as we had hoped, they did arrive within a day or two of the target date. Unfortunately our HA unit arrived with two right rails (no left), and the accompanying manual CDs were completely blank. If that wasn’t bad enough, we later learned that the wrong code had been installed on our appliance. The actual Teneros software differs slightly depending on whether you have an HA or DR appliance, as does the hardware. Ours got crisscrossed. Needless to say, I wasn’t pleased.
Our own research taught us that Teneros outsources the assembly and packaging of the appliances to a 3rd party (who shall remain nameless). The screw-ups by their assembler certainly didn’t make Teneros look good, and definitely made me a little less secure in our decision, but I tried to remain steadfast to the idea that the guys who design, engineer, and support the solution aren’t the same the guys who put the parts in the cardboard box.
What I will say is that the Teneros guys were fantastic. They apologized without trying to make excuses (how could they). The Engineers got me the manuals, a new set of rails, and perhaps most impressively, the guys in the Teneros NOC were able to remotely flash the code on the appliances to correct the factory screw up. Oh, and it took them less than 15 minutes to do it. Call it the silver lining. If nothing else, we were impressed that Teneros could remotely managed the ACA as efficiently and effectively as they had promised.
Phase 2: Installation
Teneros installation, like so many other network devices, starts with the rails. While the rails that ship with the Teneros appliance aren’t quite as sexy as Dell’s rapid rails or similar snap-in-place rails from HP, they still make for a relatively simple installation (assuming you have both a left and a right rail). You will need your own cage nuts, and if our installation is any indication, the screws that ship with the rails aren’t very useful either. If you have any spare rack hardware on hand (most of you probably have drawers full), this won’t be a problem at all.
Once you have the Teneros ACA in your rack, and both power supplies plugged in, the Teneros Engineers will provide you with an pre-installation checklist (and will probably ask you to power the box on) so that you’re fully prepared and ready to go when their Engineer arrives on-site to help with the initial configuration.
Make sure you have two outlets available for your new ACA. Given the importance of the device, it’s no surprise that it comes with redundant power supplies. It’s also worth noting, the Teneros ACA will make horrible noises if you don’t plug them both in. If you have the power infrastructure to do so, I recommend distributing the power over multiple feeds.
The good news about the installation is that Teneros provides an on-site Engineer to assist with the configuration process. The better news is that the configuration is so incredibly simple, you probably don’t need the Engineer, but he’s there just in case.
Perhaps the scariest part of the initial setup is having to unplug your Exchange Server from your network and plug it directly into the Teneros appliance. Even with their Engineer looking over your shoulder, these are tense moments. In 30 seconds or less though, you should be back up and running normally.
Now is probably a good time to discuss what Teneros calls the “Instant-On Network Switch”, which, among other things, is what allows the pass-through to the Exchange server to actually work. The clever design allows the Teneros ACA to hijack/steal/borrow the IP address of your Exchange server so that all network traffic destined for Exchange finds its way to the Teneros appliance. Apart from allowing your users to keep sending and receiving mail (as well as pretty much anything else they would be doing in Outlook) when Exchange is down, this design eliminates the need to reconfigure periphery mail services such as Anti-Spam servers, mail relays, and other applications that rely on Exchange for POP/IMAP services. I wish I had more time to spend on it, because it’s kind of a big deal. The key thing to remember is all the switching and re-routing happens on the Teneros appliance. No changes are made to your Active Directory during failover and failback.
When your Exchange server is running normally, the Instant-On Switch, as if it wasn’t even there, allows all traffic to pass through to the Exchange server. The appliance itself has been designed such that even if the Teneros appliance loses power, or fails completely, it won’t interfere with Exchange-bound network traffic.
Phase 3: Configuration
If configuring the Teneros ACA was any easier you’d probably think you were doing it wrong. All you need is a username and password, and some pretty basic network information (name server IPs, the IP of your Exchange server, and the IP address of your backup server). You will also need 4 new IP addresses for the Teneros appliance. These IPs need to be on the same subnet as your Exchange box. Without getting too specific, one of those IPs is for your administrative web interface, 2 are more or less for the Teneros NOC, and the 4th is the IP address you use to access your actual Exchange server after Teneros has assumed its primary IP.
Providing an alternative address for Exchange allows you direct access to your server to do the things you need to do in order to get Exchange back up and running (investigate, diagnose, repair, etc.).
By providing the IP address of your backup server you allow the Instant-On network switch to route packets from that backup server to Exchange directly. In short, the backup server is the only device on your network that knows that Teneros isn’t really your Exchange server. Every other device is totally clueless, and that’s a good thing. It’s a fairly simple idea, but the implications are tremendous. You are able to follow normal recovery procedures without have to mess with things like alternative restore points, host files, or anything else that could complicate your efforts to get your Exchange server back up and running quickly.
Once you’ve entered all of your configuration data, the Teneros ACA will run through some tests to validate the information you provided. Assuming everything checks out, the ACA begins the process of replicating all of your Exchange data. Our initial replication took a little over 3 days total. Not included in that time was an 18 hour period where replication got hung up.
When I noticed that the progress meter hadn’t moved for the better part of a day I called support. They were able to quickly diagnosed an anomaly in our OWA configuration. Their developers offered to add a bit of custom code to our appliances to account for it, but given that it would have caused ongoing maintenance issues, we elected to fix the problem ourselves (by no means is this a knock on Teneros, there was most definitely a configuration problem, and it was most definitely on our end). The problem itself was actually relatively minor, and within seconds of resolving the issue, the progress meter began to move.
Phase 4: Testing in the Production Environment
With all my data replicated and verified it was time to test. We developed a detailed testing plan and scheduled some after hours time to test Teneros.
Immediately prior to pressing the failover button, I sent a couple of messages out, scheduled some appointments, and created a new task as well as a new public folder. This would allow me to verify that newly created objects had been replicated to the appliance. With the preliminary work squared away, it was time to press the button.
I’m going to be perfectly honest. I was really hoping it was going to work. From a “continuing my employment” perspective I felt like I needed it to work. That said, I’d be lying if I told you that I thought the odds were any better than 50/50 that it was actually going to work. Having come this far, however; I didn’t really have a choice, so I took a deep breath, exhaled, and pressed the button.
The ensuing moments were tense as we waited to see what was actually going to happen. In less than 15 seconds, the Teneros Appliance reported that it was active, and that my users were connected to it. I clicked on my Outlook client and immediately got the “you must restart Outlook” message. So far things were on track.
The need to restart Outlook is due to a limitation in Exchange 2003’s MAPI implementation. Exchange 2007 clients do not need to be restarted in order to establish connections to the Teneros Appliance.
As soon as Outlook restarted I began verifying that my new calendar items, tasks, and public folders were available through Teneros. My coworker quickly verified that his mailbox contained the message I had sent him only seconds before failover. So far, so good. Everything was there. Breathing ever-so-slightly easier we worked our way through out test plan.
We sent messages to, and from, outside of our network. We verified that Windows Mobile clients were able to send and receive mail. We created additional tasks, and public folders, and calendar items. We deleted messages from our mailboxes. Thus far everything was running beautifully. In fact one of my coworkers remarked that he felt like Outlook was actually running a little faster.
Things weren’t perfect, however; along the way we did hit 2 snags. When testing our OWA front-end we discovered that, although we were able to authenticate, OWA forms (like the message box, and contacts window for example) were not functional (they actually looked like missing images - big red X). We also discovered that remote users connecting via RPC over HTTP (as well as a 3rd party POP application) were unable to authenticate with the Teneros Appliance.
With everything else working normally, I called Teneros support to discuss the two specific errors. The OWA issue, it turns out, is isolated to Vista clients (of which we only have a few - but of course my client is among them). It also turns out that they had a patch available and would be installing it during our next scheduled maintenance.
I should mention that maintenance windows are specified during the initial configuration. When they occur is totally up to you (the NOC is, of course, 24/7), and they can be changed/postponed at any time.
Teneros support also had a quick answer for our authentication issues. It turns out that for whatever reason, the ACA cannot handle authentication requests using UPN format (logging in with your email address), which is exactly what we were doing. They couldn’t tell me for sure if it would every be fixed, but the work around is simple enough. We changed the credentials to the DOMAIN/USERNAME format, and sure enough, everything else started working as expected.
We continued testing (in a much more relaxed state) for another 45 minutes or so. We verified that we were able to connect to, and work on, our Exchange server on the maintenance IP. Most importantly, we verified that our backup server was able to connect to our actual Exchange server without any reconfiguration.
Once we were satisfied that the Teneros ACA for High Availability had performed as advertised, I clicked the failback button and once again hoped that things would go smoothly.
Failback is not an instantaneous process. First Teneros has to verify that Exchange is healthy. Then it has to replicate any changes (new objects, deletions, etc.) back to the primary Exchange server. Once that is complete, it runs through a series of checks to validate that it can actually failback without causing you any headaches.
In total we were failed over for about an hour an half. The failback took a little over 3 hours. Upon completion of failback operations, I received an E-mail message notifying me that failback was successful and that my Exchange server was once again running the show. I also got another application popup asking me to restart Outlook.
After restarting Outlook we ran through our tests a second time, verifying not only inbound/outbound mail, but also that any objects we created prior to, and during the failover were present on the production server. Mail items, calendar items, tasks, public folders; every last thing that should have been on Exchange was. The stuff we deleted, wasn’t. From a replication and availability standpoint Teneros performed near flawlessly (only the OWA issue remained).
The following week, after verifying that the OWA/Vista patch had been installed, we ran a second test to confirm the Teneros ACA’s ability to auto-detect a failure and subsequently initiate failover without intervention. Let’s face it, Exchange server failures are probably going to occur at a time when you’re not logged into the console, so any BC/DR solution needs to be able handle things on its own. The appliance provides several monitoring points, any of which can be used to trigger an automatic failover. Our test case was to dismount one of our mail stores, which is exactly what I did.
Within moments of the 30 second failure threshold being breeched, Teneros did exactly what it was supposed to do; It took control. Our second round of tests went almost exactly as our first, including another issue with OWA. This time around support traced the problem to a patch level discrepancy between our Exchange server and the Teneros ACA. We were told that the patch would be installed during our next maintenance window (our logs indicate that a patch was in fact applied).
We kept the failover period during our 2nd test to under an hour. As expected, Teneros detected that our store was back online within seconds of us remounting it. It quickly verified the health of our Exchange server, and presented the option to failback. Because we were failed-over for only a very short time, full failback to our Exchange server took less than 1 hour.
With two very successful tests under our belt, we consider our Teneros ACA for High Availability to be fully implemented and the HA portion of our Exchange Continuity Project complete. As I mentioned, we’ll be testing out the Teneros ACA for Disaster Recovery device within the next couple of weeks. I will cover implementation and testing of the Teneros ACA for Disaster Recovery in part 3 of this series.
Final Thoughts
Cons
- Quality control from the 3rd party that builds the appliances for Teneros needs improvement
- Presently there is no support for UPN-based logins, so remote users accustomed to logging in with their email addresses must learn to use DOMAIN\USERNAME format
- Have not yet been able to get OWA to work for 100% of our users (we have not tested since the latest patch was installed)
- Certain updates/hot fixes must be consistent between your Exchange server and the Teneros appliance, which requires some coordination with the Teneros Support Team.
- Every Exchange server in your environment will require its own Teneros appliance (for this reason Teneros may not be the best fit for very large Exchange environments)
- A complete Teneros BC/DR Solution requires 2 devices. Although it is possible to purchase on the DR component, the HA appliance offers total protection against higher probability localized failures.
- No E-Discovery option
- Annual support costs are more expensive than most *
- Locked down system allows local administrators very limited access *
Pros
- Extensive support for 3rd party applications and devices including; Cisco Unity, Blackberry, GoodLink, Symantec Enterprise Vault, and Windows Mobile Messaging.
- Provides access to a 100% archive of all Exchange objects including; mail, calendar items, tasks, etc. If it’s in Exchange Teneros replicates it
- Object-based replication with corruption prevention keeps corrupt data off the Teneros Appliance
- Multi-pass failback guarantees no data is lost during failback operations
- Simple installation, even simpler configuration - You will not believe how easy this actually is
- Outstanding on-site support during the installation and configuration
- Amazing technical support. A knowledgeable human answers the phone within 3 rings, and is ready to help you
- As mentioned in the Cons section, annual support is higher than most, but Teneros handles 100% of the maintenance and upkeep. They install all the software updates. They also monitor the device and will notify you if there are any issues. For any environment, but especially those will smaller staffs, this is a tremendous plus.
- Full support for Exchange 2007 (this will require an update to the ACA, which Teneros Support can do remotely for no additional cost)
- Extensive remote management capabilities allows the team at the NOC to do whatever it takes to keep you up and running.
- Locked-down environment prevents you from making any mistakes that could jeopardize the stability of your E-mail continuity environment
- It works as advertised which gives your IT staff confidence, and peace of mind
Overall I’m extremely satisfied with the performance of the Teneros Application Continuity Appliance for High Availability. Since no two environments are identical, I’ll stop short of saying that Teneros is absolutely the right solution for you (there is no “one size fits all” solution), however; if the capabilities of the Teneros solution are inline with your business requirements, I wouldn’t hesitate to recommend that you let the Teneros team run you through a product demo. The Teneros appliance-based approach to Exchange Continuity is unique in the marketplace, and well worth a look. With a couple of minor procurement and OWA issues aside (this is why we test things, right), the Teneros appliance has performed absolutely as promised, and perhaps equally as important, their engineering and support teams have exceeded my expectations.
If you haven’t done so already, please see part 1 of this series, which details why my company chose the Teneros solution.
While I can’t provide any specific details about my implementation above and beyond what I’ve already said. If you have any questions related to real-world implementation of the solution, please leave a comment and I’ll do my best to respond.












Good job this was really useful & I did really enjoy, many thanx
Will you be publishing the 3rd part of this series covering the ACA DR piece? Are you still satisfied with the Teneros solution after 6 months or so? Thanks!