Thursday, March 3, 2011

Zenoss: Dr SNMP Extend or How I Learned to Stop SSHing and Love the OID

So there are all these scripts that you run on your remote hosts for monitoring. Since Zenoss has a built-in nagios parser, you basically run all of these scripts via SSH. What sucks is that establishing X number of SSH sessions for Y number of devices in Zenoss builds up a significant number of TCP connections and other wonderfully painful bottlenecks on your Zenoss system. Now, because all of these scripts already exist on the remote host, you can just as easily run it as root with a simple SNMP Extend line in your snmpd.conf. The line you would need to add would look something like this:

extend remote_command /customdir/customscript custom args -s 123

Do not forget to restart snmpd for the changes to take place. The next step would be to create a Zenoss DataSource for this --- bearing in mind that Zenoss works better with OID numbers than MIB names --- the straightforward approach would simply be to walk the SNMP tree and convert to OID:

snmpwalk -v2c -cpublic hostname 'NET-SNMP-EXTEND-MIB::nsExtendResult."remote_command"' -On

Two things to note in that command:
  1. If you want to see the MIB name, just simply remove the -On
  2. If you noticed that I used snmpwalk instead of snmpget, then you will understand that I do so because sometimes I like to walk the entire Extension tree using nsExtendObjects and snmpget would just barf on that
Rather than painstakingly reproduce all that using screenshots, I will illustrate the template creation steps using zenossYAMLTool syntax, which is how I normally make changes to my Zenoss system (to avoid tons of images here as well as tediously clicking through the GUI). Assuming that I will not be graphing in this template, the YAML needed to create the template looks like this:

- action: add_template
  description: Result Threshold retrieving Output Summary
  targetPythonClass: Products.ZenModel.Device
  templateName: Result2Output
  templatePath: /Server/Linux/TestCase
  GraphDefs: []

Now that we have the OID, adding a DataSource is fairly easy:

DataSources:
  - dsName: remote_command
    cycletime: 300
    enabled: true
    eventClass: /Cmd/Fail
    oid: 1.3.6.1.4.1.8072.1.3.2.3.1.4.x.114.101.109.111.116.101.95.99.111.109.109.97.110.100
    parser: Auto
    severity: 3
    sourcetype: BasicDataSource.SNMP
    DataPoints:
    - dpName: remote_command
      isrow: true
      rrdtype: GAUGE

You will find the red-highlighted number will increment as you add additional Extensions into the OID table. I had originally had a script that would generate the OID value for a specified extend_command, but since I cannot predict which position it would appear in the OID table, we will have to rely on snmpget/snmpwalk -On.

In order for it to trigger an alert, create a MinMax Threshold that will trigger an event (to keep it simply, we will be assuming 0=success and 1=failure here):

Thresholds:
  - thresholdName: remote_failure_output
    enabled: true
    escalateCount: 0
    eventClass: /Perf/Snmp
    maxval: '0'
    minval: ''
    severity: 3
    dsnames:
    - remote_command_remote_command

For the complete YAML file, please read my next blog on how to use transforms to extract the nsExtendOutputFull after the nsExtendResult triggers and event.

And that is it for now. Of course, for the die-hard nagios plugin fanatic, you may be apt to point out that using SSH enables you to pass DataPoints via the stdout as such:

STATUS: Some useful output message here|data1=100;;; data2=10;20;30

This would be added to Zenoss with a single DataSource that contains multiple DataPoints. For my zenossYAMLTool syntax, the YAML would look something like this:

DataSources:
  - dsName: remote_command
    cycletime: 300
    enabled: true
    eventClass: /Cmd/Fail
    parser: Auto
    severity: 3
    sourcetype: BasicDataSource.COMMAND
    usessh: true
    DataPoints:
    - {dpName: data1, isrow: true, rrdtype: GAUGE}
    - {dpName: data2, isrow: true, rrdtype: GAUGE}

To recreate the same effect using SNMP Extend, you would need to add multiple DataSources, each with a single DataPoint, as the OIDs are mapped directly to each DataSource. The script should output the message and the data values on separate lines:

STATUS: Some useful output message here
100
10

And you would make use of the multiple nsExtendOutLine."remote_command".# OIDs to gather your DataSources:

DataSources:
  - dsName: remote_command
    cycletime: 300
    enabled: true
    eventClass: /Cmd/Fail
    oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.12.114.101.109.111.116.101.95.99.111.109.109.97.110.100.1
    parser: Auto
    severity: 3
    sourcetype: BasicDataSource.SNMP
    DataPoints:
    - dpName: remote_command
      isrow: true
      rrdtype: GAUGE
  - dsName: remote_data1
    cycletime: 300
    enabled: true
    eventClass: /Cmd/Fail
    oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.12.114.101.109.111.116.101.95.99.111.109.109.97.110.100.2
    parser: Auto
    severity: 3
    sourcetype: BasicDataSource.SNMP
    DataPoints:
    - dpName: remote_data1
      isrow: true
      rrdtype: GAUGE
  - dsName: remote_data2
    cycletime: 300
    enabled: true
    eventClass: /Cmd/Fail
    oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.12.114.101.109.111.116.101.95.99.111.109.109.97.110.100.3
    parser: Auto
    severity: 3
    sourcetype: BasicDataSource.SNMP
    DataPoints:
    - dpName: remote_data2
      isrow: true
      rrdtype: GAUGE

Yes, the YAML appears longer but it is not more complex, merely an exercise in copy-and-paste repetition. The multiple DataSources can easily be written as a loop in a simply generator script. So you have to decide which way you want to handle it --- though, I suppose, these examples make it seem like just a choice between configuration vs performance.

No comments:

Post a Comment