Get a holistic view of your entire Microsoft Exchange organization with a single script

The longer I work in IT, the more I realize just how critical monitoring is to proactive outage prevention and recovery. There is a big problem with that. In many IT organizations, your monitoring expert is not your Exchange expert and vice-versa. Theoretically speaking they should collaborate to come up with the most complete and useful Exchange application monitoring possible. That’s not the real world though.

In the real world, these things take time to iron out; time that neither person has to spare. In anything but a large IT organization that can have a full-time monitoring team, the monitoring expert will have other duties to perform. The same goes for the Exchange expert. They will have other duties or at the very least other user-facing projects and deliverables. Monitoring is often put on the back burner because basic templates will give you server up/down and resource monitoring along with some application specific things too. When looking at it from an investment in man hours perspective, that’s good enough until an outage. Then whatever caused that outage is added to the monitoring solution likely never to cause an issue again. Round and round this cycle goes…

“In the real world, these things take time to iron out; time that neither person has to spare.”

Monitoring isn’t alone in this regard. Most admin-only improvements are tabled for user-facing objectives. Rightfully so in most cases. There are only so many man hours available, so they are spent where the maximum benefit can be seen. Improving the life and productivity of an entire company or a large subset of it is more practical than improving them for a small admin team.

I took it upon myself to create a monitoring dashboard with the Exchange module for PowerShell. I decided to have this email itself to recipients in HTML. I think it’s much better to have something show up in your inbox. Plenty of days get too hectic to remember to go to a website to look at a dashboard. Instead, this dashboard comes directly to you.

I selected the criteria that I felt would give me the most bang for my buck in heading an outage off at the pass or responding quickly to an outage. It color codes everything green (pass), red (failure), orange (warning – applicable to DAG database copies). This makes for easy scrolling on a mobile device. All you need to do is flick a finger and look for anything not green.

“Plenty of days get too hectic to remember to go to a website to look at a dashboard. Instead, this dashboard comes directly to you.”

Additionally the subject will notify you if there are any active alerts to save yourself some scrolling. I also used a few parameters for when I run it in non-scheduled fashion. One outputs a report only to a console, one outputs it only to email, and one outputs the html email report to a specified file. A regular run will output a report to both the console and email. Depending on how big your environment is, this may take some time. In my current environment, it takes about 30 minutes to run. Most of this time is chewed up by RPC calls to remote-site servers. On-screen displays let you know what step it is on and the result of previously completed steps again in color coded fashion.

I can’t claim all the credit here. When I first started building DAGs on Exchange 2013, I used Paul Cunningham’s get-daghealth script to ascertain if there were any issues with these DAGs. I paired down the database reporting since it was too verbose for an overall dashboard, and I added checks for services, components, and health sets as well as discovery on non-DAG members like standalone CAS and Mailbox servers. I reworked the DAG member replication health checks, and I added a bit of intelligence based off the discovery and the results of all the various health checks. I also added some on screen output, so you can determine where the script is with its checks. I left some database reporting relics in the DAG database portion in case I felt like going back to them and including them.  All in all, it’s over twice as long.

The checks performed are:

  • Checks for standalone CAS servers. If found, it:
    • Acquires the site name of each server
    • Checks for components that are not online and attempts to start them
    • Checks for Exchange-specific services that are not running
    • Checks for Health Sets that are not in a healthy states
  • Checks for standalone Mailbox severs: If found, it:
    • Acquires the site of each server
    • Determines if server is also a CAS
    • Acquires the databases on each server, their mount state, and their content index state
    • Checks for components that are not online and attempts to start them
    • Checks for Exchange-specific services that are not running
    • Checks for Health Sets that are not in a healthy states
  • Checks for DAGs, if found it:
    • Acquires the site of each DAG member
    • Determines if server is also a CAS
    • Acquires all DAG databases and database copies with mount state, copy health, content index state, and copy queue status.
    • Checks for components that are not online and attempts to start them
    • Checks for Exchange-specific services that are not running
    • Checks for Health Sets that are not in a healthy states
    • Tests each DAG members replication health

Something is included in the final report only if it’s discovered. You don’t have blank sections of reports if you have no standalone CAS servers for example.

I’ve uploaded the script to the Microsoft Technet repository here.

How it looks:

Click for full report example

 

 

 

tales01

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s