Tuesday, April 28, 2015

Troubleshooting Event ID’s 33333, 11502 & 4502

Issue
A SCOM 2012 R2 environment was logging many Event ID’s 33333, 11502 & 4502 in the OpsMgr event log of the Management Server designated to send out the Alerts. And believe me, you DON’T want to see these events, ever:

  • Event ID 33333
    Data Access Layer rejected retry on SqlError.

  • Event ID 11502
    The Microsoft Operations Manager Connector Framework Alert Forwarding module failed to mark an alert for forwarding because the connector the module is configured for no longer exists.

  • EventID 4502
    A module of type "Microsoft.EnterpriseManagement.Mom.Modules.McAlertWriteAction" reported an exception System.Data.SqlClient.SqlException (0x80131904).

And these events were logged almost every minute since Alerts are filtered for Notifications once per minute. So the OpsMgr event log of this particular server looked like a X-mas tree. Red & yellow all over the place…

Time for some deep troubleshooting.

Cause
First thing I noticed was that these events were ‘only’ logged on the MS server responsible for the Notifications. So the issue was directly related to it.

Secondly, the three earlier mentioned Event ID’s gave more information in their descriptions:

  • Event ID 33333
    the UPDATE statement conflicted with FOREIGN KEY constraint "FK_Alert_ConnectorId". The conflict occurred in database "OperationsManager", table "dbo.Connector", column "ConnectorId".

  • Event ID 11502
    the connector the module is configured for no longer exists. Connector Id: [GUID].

  • EventID 4502
    the UPDATE statement conflicted with the FOREIGN KEY constraint "FK_Alert_ConnectorId:. The conflict occured in database "OperationsManager", table "dbo.Connector", column 'ConnectorId'.

Apparently this SCOM 2012 R2 MG is trying to use Channels/Connectors/Subscriptions which aren’t present (anymore), causing the Notifications component to turn sour and  light up the OpsMgr event log of the particular MS server like a X-mas tree…

Close to finding the cause of it all…
In SCOM all the Notifications you create (Channels, subscribers, subscriptions AND the subscriptions of the Connectors) are saved in this MP: Notifications Internal Library (ID: Microsoft.SystemCenter.Notifications.Internal).

As it turned out, this MP was exported from an old SCOM 2007x environment and imported straight into the SCOM 2012 R2 environment. So ALL the configuration for the Notifications in SCOM 2007x was carried straight into the SCOM 2012 R2 environment, whether all Connectors were in place or not…

So finally the culprit was found. But now the question was asked how to solve it? The Quick & Dirty way or the more challenging way?

Resolution
Yes, you CAN import the default Notifications Internal Library MP from the SCOM 2012 R2 installation media. This way the ‘old’ MP will be overwritten and replaced by a completely empty Notifications model.

For this customer totally unacceptable since many other Channels, Subscribers, Subscriptions and Connector config was already being used. So a total reconfiguration – involving missing Notifications – was out of the question.

So no Quick & Dirty approach here. Time for the more challenging approach, editing the Notifications Internal Library in Notepad++ and removing all the misconfigurations one by one.

The VERY helpful Event ID 33333
Even though I DON”T like Event ID 33333 one bit, in this particular case this very same event provided me with very good information: the GUID of the ConnectorId, masked in yellow in this screendump out of respect for this customer:
image

This GUID is also present in the  Notifications Internal Library. So I exported this MP, saved a copy of it (in case I would wreck this MP), opened the copy and searched for the GUID of the ConnectorId.

For every entry I found I deleted the WHOLE Rule section, so from <Rule ID up to </Rule>. This can take up to 100+ rows…

But this wasn’t enough since every Rule also has a Rule ID or a ProductConnectorSubscription[GUID]. All those entries must be deleted as well. When you don’t you break this MP. Believe me, I’ve been there Smile.

So the first deletion took me some time in order to get it 100% right. But when I imported that MP EventID 33333, 11502 & 4502 for that particular ConnectorId were GONE! By then I knew what to do…

  1. Checked ALL Event ID’s 33333 and copied the related unique ConnectorId;
  2. For every unique ConnectorId I adjusted the MP accordingly by removing the related Rule sections AND the related Rule ID or a ProductConnectorSubscription[GUID] entries;
  3. Repeated Step 2 for every unique ConnectorId I found in Event ID 33333;
  4. When done I incremented the version number of the MP, saved it and imported it;
  5. And PRESTO, when the new configuration became active, NO MORE event id’s 33333, 11502 & 4502. Yeah!!!

Recap
When moving from SCOM 2007x to SCOM 2012 where SCOM 2012x is a brand new environment, think twice before importing core MP’s like Notifications Internal Library straight from SCOM 2007x into SCOM 2012x.

Changes are things won’t work and will even break stuff. So be careful here.

And when all is working as expected and you remove a Connector, first make sure there are NO subscriptions any more for it. When there are AND you remove the Connector WITHOUT removing the related Subscriptions before that action, you’ll find yourself in the same situation. At least you know now how to solve it…

No comments: