Thursday, October 13, 2011

OLE DB Data Sources: Oracle and some potential pitfalls and false-positives

There are many commercials out there how to get some hair back. I know, I am almost bald, but I don’t care. But sometimes there are issues which make you loose hair or – the hair one still got – to go grey.

Recently I had such an issue. All about monitoring an Oracle database, running on a Windows server. The customer wanted to monitor that database with SCOM without purchasing a MP for it.

With the OLE DB Data Source wizard one can connect to such a database, run a query against it and collect some performance counters as well. How nice! At least, this is what theory tells us. But real life is a bit harder as it turned out.

The Beginning Of It All
First I used Maarten Damen’s blog posting, to be found here. A really good posting it is since it contains really good information. However, while creating the OLE DB Data Source, I added a query and I decided to populate BOTH Run-as Profiles which are created by the same wizard. Both decisions turned out to be wrong. However, SCOM didn’t show that directly to me.

It took a while for the OLE DB Data Source to show up in the SCOM Console and even a it longer to enter a monitored status (Healthy! Yeah baby, I made it!) but all seemed to be well. After all, it got into a Monitored status and a GREEN one at that! Nice!

The Day After…
But in reality things weren’t that OK. For sure, the Monitor was OK and GREEN. But after a week or so, the Health Explorer didn’t show any state change. Nothing. The Monitor was green, but nowhere to be found since when. And that triggered me to take a look into the OpsMgr event log of the Watcher Node.

Ouch!

SNAFU is paying us a visit
EventID 1102 all over the place. Descriptions like: Rule/Monitor "OleDbCheck_a28c517f4fdc4038b880b2ac02796256.NoConnectionMonitor" running for instance "XYZ Oracle DB Check" with id:"{F8C4A05D-AAD4-56CA-7CC4-9AC095323F83}" cannot be initialized and will not be loaded. Management group "XYZ"

When I recycled the Health Service cache, the error came back and the Health Explorer of the same OLE DB Data Source showed now a Healthy status with a date and time: exactly matching the same date and time the first EventID 1102 was logged!

So this is a real false-positive. Yes, all is OK sir! And all the while, nothing is OK! Like SNAFU

What Caused it all
Deep and thorough investigation turned out this issue:

  • Used Run-as-Profile
    When the OLE DB Data Source wizard has done it’s job, TWO Run-as-Profiles are created. And one must only configure ONE of them. Not both! While running the wizard one has the option to select Simple Authentication:
    image

    So now you only have to configure that Run-as-Profile and not the other! The Synthetic Transaction Action Profile is only used when Integrated Security is being used. But when one has selected the option to use Simple Authentication, one can skip this Run-as-Profile and only configure the Simple Authentication RunAs Profile.

Time For Another Error…
And yes, things got moving now. EventID 1102 went away and instead we got another EventID 11852: OleDb Module encountered a failure 0x80040e14 during execution and will post it as output data item. : ORA-00903: invalid table name. Because of this the Health Explorer for this OLE DB Data Source went into a critical state.

The Oracle DBA checked everything: yes, the account used by SCOM is OK and logged on to the Oracle DB. And yes, the query looked for an existing table and YES the account has enough permissions to run that query against that table. Tried many other queries, the DBA even created a table for that account and SCOM OLE DB Data Source check, but still the same error!

So somehow along the lines the query got garbled which caused Oracle to pull the plug.

However, we already had spend a lot of time in order to get rid of the first EventID 1102 (even by Microsoft CSS we were told to configure BOTH Run-as-Profiles so it was trial and error here…) so we decided to remove the query as a test.

And guess what? All became well now! So the presence of the data source is successfully tested but not the responsiveness by running a query. At the end the customer decided it to leave it at that. So only the presence of the data source is tested and not a single query is being used.

So after a long journey we came back to the posting of Maarten Damen (by removing all the fluff we had added by ourselves) and got it working.

Still it’s interesting to know what happens to the query fired from SCOM to the Oracle database. Guess it will be another time to check that one out.

2 comments:

Roselyn said...

Good article, it helped me get rid of the EventID of 1102. Like your article says, I am now getting continuous EventID 11852 even though I am not performing any query tests. The performance view does show connection times, but the state of this monitor is Critical. How do I solve this?

OleDb Module encountered a failure 0x80004005 during execution and will post it as output data item. Unspecified error
: ORA-12560: TNS:protocol adapter error

Workflow name: OleDbCheck_28635f17fb5d42ab8937d63fd7b886b6.InitTime
Instance name: AMF OLE DB DS Monitor1
Instance ID: {703A0F6C-FE79-E2E4-5F5C-A0FC1CDA2595}
Management group: MG

Marnix Wolf said...

Hi Roselyn.

Is the correct Oracle provider present on the server which runs the transaction? Almost sounds like the wrong provider is used or called upon.

Cheers
Marnix