Thursday, November 3, 2011

Building Monitors: Why not to use ‘Critical’ status change all the time…

When people are using SCOM to build their own Monitors, they tend to use the ‘Critical’ status change too easy. It’s better not to use that status too often or without giving it much thought. 

Why? I know, of course, when people start to build Monitors of their own they deem something important enough to monitor it with SCOM. That goes without saying.

However, SCOM has a certain Health Model to reckon with. In that Health Model three different Health Statuses are present:

  1. Healthy
    clip_image001[4]

    Color: Green. All is well and running smoothly. Server/Application/Service is functioning as expected.

  2. Warning
    clip_image001[6]
    Color: Yellow. An issue has occurred. Something is not OK. However, the overall functionality nor availability is directly affected. Action is required though in order to prevent real outage/downtime.

  3. Critical 
    clip_image001[8]
    Color: Red. A server/application/service is down and outage is happening. Functionality is severely affected. Immediate action is required.

This is the WHY…
Keep the last Status (Critical) in the back of your mind. For SCOM this is looked upon as REAL downtime (Critical State = Unavailable for SCOM!!!). So when you create a Monitor, targeted against a certain Class (like SQL Server for instance) and the Monitor raises an Alert and triggers a Critical State, SCOM will look upon it as the targeted SQL Servers being UNAVAILABLE.

So when you run a Report about the availability of your SQL servers, this Critical Alert will have a negative effect on the total percentage of the Availability of those SQL Servers… Which is a bad thing, especially when you only wanted to monitor something which is important but not CRITICAL to that SQL Server.

Conclusion
Be careful when creating Monitors and assigning Critical statuses to them. Think about the consequences when running availability Reports against those servers. Many times a Warning condition will do the trick as well without having a negative side-effect on the overall availability of those servers.

No comments: