Wednesday, June 1, 2011

Automating Zenoss Multi-Graph Reports

Surprisingly, I discovered that my spur-of-the-moment python/TAL hack for Zenoss 3 years ago still shows up on their message board. However, with the modifications made to the Zenoss Reports architecture I do not think I can revitalize the Dynamic Zenoss Graph Reports.

Previously, I had posted about my zenossYAMLTool that can be used to import and export Zenoss objects into YAML for the sake of manipulating the database without ZenPacks. I made some minor changes to my original zenossYAMLTool so that it can be imported into another python script. Using the various export methods, you can easily generate Multi-Report graphs be gathering the necessary information and then importing them back into Zenoss after you've manipulated the data.

Here is an example:

import zenossYAMLTool as z
class NextGraph(Exception): pass
for dclass in sorted(list(set(['/'.join(dc.split('/')[0:-1])
    for dc in z.list_devices() if dc.startswith('/Server')]))):
    seq = -1
    graphs = []
    groups = []
    for gpName in [ 'laLoadInt15', 'ssCpuUser', 'ssIORawReceived' ]:
        seq = seq + 1
        for t in z.export_templates(dclass):
            try:
                for g in t['GraphDefs']:
                    for p in g['GraphPoints']:
                        if p['gpName'] == gpName:
                            g['gdName'] = '%s (%s)' % (g['gdName'], p['legend'])
                            newgraph = g
                            p['legend'] = '${dev/id | here/name | here/id} ${graphPoint/id}'
                            p['lineType'] = 'LINE'
                            p['sequence'] = seq
                            newgraph['GraphPoints'] = [p]
                            newgraph['GraphPoints'] = [p]
                            graphs.append(newgraph)
                            newgroup = { 'collectionId': dclass.split('/')[-1],
                                'combineDevices': True, 'ggName': g['gdName'],
                                'graphDefId': g['gdName'], 'sequence': seq }
                            groups.append(newgroup)
                            raise NextGraph
            except NextGraph:
                break
    colls = [{ 'collectionName': dclass.split('/')[-1],
        'CollectionItems': [{ 'collectionItem': 'Item',
            'compPath': '', 'deviceId': '',
            'deviceOrganizer': '/Devices' + dclass,
            'recurse': False, 'sequence': seq }] }]
    report = {
        'action': 'add_multireport',
        'numColumns': 2, 'title': '',
        'reportName': dclass.split('/')[-1],
        'reportPath': '/Multi-Graph Reports/Screens',
        'GraphGroups': groups,
        'Collections': colls,
        'GraphDefs': graphs, }
    z.import_yaml([report])

Download my python script that will build a Multi-Graph Report for all DeviceClasses in your system displaying: Load, CPU, Memory, IO, Throughput, Disk and OSProcesses. Modify it accordingly by refactoring the python code or altering the related cf file that controls the script.

Friday, May 20, 2011

Neustar: BASHing UltraDNS

Last time I posted my python script that extracted all the methods from the PDF file and provided an interactive interface to the API here.

When I created the generic script, I had needed a way to view the CNAMEs and update them, so I had only needed the following methods: UDNS_UpdateCNAMERecords, UDNS_GetCNAMERecordsOfZone. Of course, since the associated Create and related ANAME methods were identical in nature, they were simple to test and, thus, my parse_xml techniques were based solely on these.

Yesterday, I had the need to create about a score of new zones as well as grant permissions to a few unprivileged users. As such, I had the chance to revisit the WebUI and, while adding a zone is not too taxing, granting permissions was like extracting wisdom teeth. First off, the tiny 15x15 (is it even that?) lock icon was not intuitive for me and it probably took half an hour to even figure out HOW to add permissions (I did not actually keep track of time so I may be exaggerating the length or brevity of time elapsed) and even when I did, I was rewarded with a complex collapsible folder browser method that reset itself after I went through all the check boxes on each individual user. After adding just one, I realized that I had to use the API and not subject myself to any more torture (not that the API is significantly less painful, mind you, but at least it provided a relief from the mundane repetition).

The UltraDNS_API.py contains a few examples at the bottom of how you would automate using python, but I decided to whip up a quick bash script for kicks. While doing so, I discovered that the output was not consumable by my existing parse_xml techniques, so I cloned the iterator to aide in the "unmatched" case, such as this one and will revisit it later on, if need be. I also removed my login credentials from the autogenerated UltraDNS_API.cf file and provided a way to pass them in via command line switch. The resulting UltraDNS_API.py can be found in the same GitHub location.

The BASH script itself, utilizes its own basename to create a users and zone lists, which it edits prior to asking for you username and password and then passes the parameters to the python script:


#!/bin/bash
# Using the UltraDNS_API.py to create multiple zones and their prmissions

SCR=${0##*/}
DIR=${0%/*}
[[ $DIR == '.' ]] && DIR=$PWD
cd $DIR

ZONE=${SCR%.sh}.zones
USER=${SCR%.sh}.users
TOOL=UltraDNS_API.py
CONF=${TOOL%.py}.cf

# Edit zones and users first
vi $ZONE $USER

# Ask for username and password
echo -n "Username: "; read username
echo -n "Password: "; read password

for zone in $(cat $ZONE); do
    python $TOOL -M UDNS_CreatePrimaryZone -c 'n' \
        -a "{'username': '$username', 'password': '$password'}" \
        -d -p "{'zonename': '$zone', 'forceimport': 'False'}"
    for user in $(cat $USER); do
        python $TOOL -M UDNS_GrantPermissionsToZoneForUser -c 'n' \
            -a "{'username': '$username', 'password': '$password'}" \
            -d -p "{'user': '$user', 'zone': '$zone',
            'allowcreate': 'True', 'allowread': 'True',
            'allowupdate': 'True', 'allowdelete': 'True',
            'denycreate': 'False', 'denyread': 'False',
            'denyupdate': 'False', 'denydelete': 'False'}"
    done
done

Worked like a charm! Of course, if you run the python script without parameters for the two methods utilized in the BASH script, you will be prompted with the last answers used.

Tuesday, May 3, 2011

Python API: UltraDNS

I needed to update a CNAME in our UltraDNS account last week and the WebUI was simply too much to bear given the amount of objects we have in a single domain.  Obviously it was not in alphabetical (or any) order. Nor does the search appear to work. So I finally decided to look into the XML-RPC API to see if there was a less painful way to handle it.

Like most people, I like to see what is available "out there" before I build something from scratch. I found Josh Rendek's pyUltraDNS, but it appears that it was built just to create A records. There is a method named 'generic_call' in the UDNS class, but it appears to be limited only to the CreateAName methodName (or any methodName that uses the exact same parameters). Additional methodNames could be incorporated if you duplicate the call and retrieve methods but, since creating one-offs of each methodName goes against my philosophies of automation and scale, I decided to keep looking. Though it is rather irksome that the uber-chic pyUltraDNS name is associated with a python module that does not cover ALL the UltraDNS methods, the script did teach me a little bit about retrieving data from an OpenSSL socket, so it was worth investigating.

I decided to follow the same logic as my ongoing AWS script, which was to make it interactive --- if the /user does not pass parameters to the class methods, then the script would prompt you each step of the way. Halfway through writing my interactive script, I also discovered Tim Bunce's UltraDNS perl module at CPAN, which does a really cool job on extracting the Methods from the PDF documentation. So, using pyPDF to avoid the need for the user to deal with the "save/export as plain text" step, I added something similar. In addition, I also have the script download the PDF file from UltraDNS if it is not found in the same directory as the script.

All the methodNames configurations are based off of the NUS_API_XML.pdf documentation since I do not have the need, nor the resources to test every single methodName. Since I was mostly dealing with the Create and Update CNAME methods, I added 2 "parsers" that will strip out the useful information from the XML responses (I am normally not a fan of XML, but UltraDNS responses make me less of a fan). If you add any parsers or want me to add additional parsers, please send me a copy of the XML response and I will try to incorporate it in. The same goes for any bugs, since the XML response will assist in helping me rework the script without having to reproduce the problem. Hope you find the script useful, download it from my github repository:

     UltraDNS_API.py

P.S. I have also NOT tested it on all versions of python, only 2.6, which is what I tried to constrain my MacPorts installations to for the sake of consistency.  Feel free to comment if any of the methodNames do not work correctly.

Tuesday, April 12, 2011

XenServer: Citrix Cobbler, Part Two

Originally, I was only going to cover cobbler kickstarts of XenServer. But then, I figured, if I explained how I was automating my ant farm, it would be irresponsible of me not to cover how to automate your ants. Otherwise, you would build one ant manually (aka virtual image) and clone it repeatedly for your additional ants (and, morally, I object to ant cloning). As stated in the previous blog, cobbler is predominantly for RedHat-based installations as it is built around PXE kickstarts (though there are many people who have tried to adapt Debian-based pre-seeds into a kickstart-like infrastructure), so this blog will discuss how I provision CentOS on my XenServer using the xen portion of the CentOS distribution.

The most important script necessary for this to work is my xen virtual machine manager bash script, which is essentially a bunch of Citrix $XE command line options wrapped into a bash getopts script. My suggestion would be to copy it to /usr/local/bin so that it can be executed without having the specify the path --- use whatever configuration management software you want (i.e. cfengine, puppet, chef, etc) or have it installed as part of your cobbler post-install from /var/www/cobbler/aux/citrix, as specified in Part One of this blog. Of course, you may also want to just copy it over manually.

The second most important thing is to configure a privileged user that can ssh to the Citrix XenServer and execute the xen virtual machine manager script as root. Originally, I had configured all the ssh commands to run as root@xenserver, but have since altered everything to utilize sudo and non-root ssh keypairs. We will not be discussing sudoers or authorized_keys in this blog, so if you do not know how to handle that, you should not continue reading. For the sake of the rest of this example, we will refer to this priviledged user as 'uber'.

Now, some people may just create a basic xen virtual machine and clone it as needed for future installations. That will not be covered here, though, the technique is similar and, if you understand all the steps, you should be able to make the cobbler and bash modifications to handle that as well. I prefer to kickstart everything from scratch and then apply configuration management upon post-install that will customize the process accordingly.


1. create Citrix XenServer and install xen virtual machine manager script

I did mention before that this was the MOST important step, so I am listing it again as Step 1. Build your own Citrix server or use the technique I described in Part One. Create your own script, or just use mine.


2. create and configure 'uber' user

On the cobbler host, you need to create and generate ssh keypairs (this example uses RSA) for ~uber/.ssh but on the Citrix host, you need to add the public key to authorized_keys. You will also have to make sure that 'uber' is allowed in sudoers to execute the xen virtual machine manager script.


3. create custom cobbler power template

You can manipulate almost anything in cobbler using cheetah templates. Create /etc/cobbler/power/power_citrix.template as follows:

# By default, cobbler builds the kickstart tree for all systems
# more reliably than the distro being attached to the interface
#set treelist=$mgmt_parameters['tree'].split('/')

#if $power_mode == 'on'
ssh -i ~$power_user/.ssh/id_rsa $power_user@$power_address sudo xenvm_mgr.sh \
    -S "$server" -D "$treelist[-1]" \
    -m "xenbr0=$interfaces['eth0']['mac_address'],xapi1=$interfaces['eth1']['mac_address']" \
    -s "$virt_path" \
    -V "$virt_file_size" \
    -C "$virt_cpus" \
    -M "$virt_ram" \
    -c "$name"
#end if

# xenvm_mgr must exit 0 when VM exists, or poweroff will fail for new VMs
#if $power_mode == 'off'
ssh -i ~$power_user/.ssh/id_rsa $power_user@$power_address sudo xenvm_mgr.sh \
    -d \
    -x "$name"
#end if

All the $variables specified are specific to cobbler, but you may need to adjust "Local storage" according to how you configured the Citrix XenServer or modify xenbr0/xapi1 to different interface numbers, depending on your architecture. My xen virtual machine manager script has wildcard matching so you do not have to know the precise storage name (but you will have to be careful as it will take the first match listed). There are also tests that check that available cpu, memory and disk are available or the new virtual machine is destroyed and the xenvm_mgr will exit non-zero and cause the poweron to retry a few times before failing.


4. configure the cobbler profile

Assuming you imported a Redhat-based distribution, i.e. RHEL 5.5, it should have created two distinct profiles like:

   rhel5-arch
   rhel5-xen-arch

because the distribution contains the kernel and initrd for xen virtual instances. You will "cobbler system add" your new virtual instance using this profile.


5. configure the cobbler system (aka, the xen virtual instance)

Whether you utilize the following options when you do the "cobbler system add" or update them afterwards using "cobbler system edit" is of no consequence to me, I will illustrate using the edit method:

cobbler system edit --name vmname --power-user=uber --power-type=citrix --power-address=citrixdns

What you need to note here is that the power-type corresponds directly to the power_type.template create in Step 3 and the power-address is the dns name or the ip address of your Citrix XenServer. If you need to modify the VM settings, you can override them via cobbler as well:

cobbler system edit --name vmname --virt-cpu=2 --virt-file-size=100 --virt-ram=4096 --virt-path="Local storage"


That's all! The kickstart info is retrieved from the imported profile. The cheetah template now handles the following (so be careful how you use this):

create a new virtual machine using the xen virtual machine manager:

cobbler poweron --name vmname

destroys existing virtual machine using xen virtual machine manager:

cobbler poweroff --name vmname


If you do not want the poweroff to destroy, modify the cheetah template. If you want to add a reboot case in the cheetah template, go right ahead. This just about covers it, I think. Have fun extending the cobbler power templates to handle other virtualizations.

Thursday, April 7, 2011

XenServer: Citrix Cobbler, Part One

Ah, blissful zen when eating a citrus cobbler...no, wait, I guess I misread that title, lol. Seriously, though, like many of you out there who have played with Citrix XenServer, you have found dozens of site out there that will teach you how to automate a Citrix XenServer installation (also referred to as unattended or PXE installation, etc). In Part One of this topic, I will explain how I manipulated cobbler, which is predominantly for RedHat-based "kickstart" installations, to install Citrix XenServer 5.6 using the automated answer file. Part Two will discuss "kickstarting" the Xen-distribution of the CentOS via cobbler. Both will assume that you have some rudimentary knowledge of how cobbler works (if not, cobbler documentation is fairly straightforward and easy to fin online).

First, there are plenty of references on how to automate a Xenserver installation scattered throughout the web. From what I can tell, it has not changed much throughout the 4.x to 5.x versions. The version of Citrix XenServer that I started using was 5.6.0, so my knowledge of the Citrix PXE boot install comes from this document.

Second, I will assume you have some rudimentary knowledge of cobbler if you are thinking of using the technique I will be discussing in this blog. By rudimentary, I assume that you've at least tried at least a few "cobbler import" and "cobbler profile add" calls along with the creation of one or two kickstart templates. If you are more advanced, perhaps you've even written some cheetah templates and/or created your own sync-triggers. Regardless, you should understand that cobbler was designed first-and-foremost to handle Redhat-based distributions. Xenserver is compatible with Redhat, but the PXE syntax used in the kickstart is not compliant, nor safe to allow cobbler to manage automatically, which will be explained below.

For the purposes of this exercise, we will assume that all distros, profiles and related files will utilze the following name variable:

    XNAME=citrix560


1. cobbler distro

Citrix XenServer for Linux comes with two ISO files: the XenServer-5.6.0-install-cd.iso and the XenServer-5.6.0-linux-cd.iso. While you can import the installation ISO, I would not recommend importing the Linux ISO because it will be created in a different location in the ks_mirror than you need it to be. The best method for "importing" the distribution into the cobbler ks_mirror is to simply mount the ISOs and copy them appropriately, as follows:

    KSDIR=/var/www/cobbler/ks_mirror/$XNAME
    mount -o loop XenServer-5.6.0-install-cd.iso /mnt/xen
    mount -o loop XenServer-5.6.0-linux-cd.iso /mnt/sub
    mkdir $KSDIR
    rsync -av /mnt/xen/ $KSDIR/
    rsync -av /mnt/sub/packages.linux $KSDIR/

The problem with NOT using the import function, of course, is that it will not create the distro json settings required --- of course, since this is not really Redhat compliant, the assumptions that cobbler makes regarding the kernel and initrd will be invalid, so you will need to follow these steps anyways:

    cobbler distro add --name=$XNAME --initrd=$KSDIR/boot/xen.gz --kernel=$KSDIR/boot/isolinux/mboot.c32

For convenience of the profile kickstart scripts, it is also advisable to create the symlink and kickstart metadata that the cobbler-import step does:

    ln -s $KSDIR ${KSDIR/ks_mirror/links}
    cobbler distro edit --name=$XNAME --ksmeta="tree=http://@@http_server@@/cblr/links/$XNAME"


2. cobbler profile

The automatic import also creates a default profile for each distro, which can be done manually with the following command:

    KSFILE=/var/lib/cobbler/kickstarts/$XNAME.ks
    cobbler profile add --name=$XNAME --distro=$XNAME --kickstart=$KSFILE

If you also have public/custom repos that you have retrieved/created that will be compatible with XenServer, append the following to the profile command:


    --repos='epel5 elff5 yum5'

where the repo names above represent the repos that you arbitrarily created (in the example above, the names represent Extra Packages for Enterprise Linux 5.x, Enterprise Linux Fast Forward 5.x, and Custom Yum Repo for RHEL/CentOS 5.x)


3. cobbler kickstart as answerfile

In normal Redhat kickstarts, the "tree" metadata define above serves as the location where the ISO can be retrieved via HTTP. For our XenServer process, we will utilize this HTTP method to retrieve the answerfile. So, instead of a normal $XNAME.ks "kickstart" file, the simplest "answerfile" would use DHCP:

<installation>
    <primary-disk>sda</primary-disk>
    <keymap>us</keymap>
    <root-password>topsecretword</root-password>
    <source type="url">http://$server/cblr/links/$distro</source>
    <post-install-script type="url">
        http://$server/cblr/aux/citrix/post-install
    </post-install-script>
    <admin-interface name="eth0" proto="dhcp" />
    <timezone>UTC</timezone>
    <hostname>$hostname</hostname>
</installation>

The $variables listed will be resolved by cobbler to the values within the profile report, and the post-install script is simply a file you place in /var/www/cobbler/aux/citrix. Obviously, sda and eth0 can be altered accordingly, depending on your preferences.

If you have detailed static interface info in your cobbler system, you may want to utilize that instead of DHCP, so that cheetah syntax would be:

<installation>
    <primary-disk>sda</primary-disk>
    <keymap>us</keymap>
    <root-password>topsecretword</root-password>
    <source type="url">http://$server/cblr/links/$distro</source>
    <post-install-script type="url">
        http://$server/cblr/aux/citrix/post-install
    </post-install-script>
    <admin-interface name="eth0" proto="static">
        #set $nic     = $interfaces["eth0"]
        #set $ip      = $nic["ip_address"]
        #set $netmask = $nic["subnet"]
        <ip>$ip</ip>
        <subnet-mask>$netmask</subnet-mask>
        <gateway>$gateway</gateway>
    </admin-interface>
    <timezone>UTC</timezone>
    <hostname>$hostname</hostname>
</installation>

4. cobbler synx/triggers

With what you have set up thus far, you would be able to create the PXE configuration just by running:

    cobbler sync

This basically clears out the old data files and regenerates all the dns, dhcp, distro images, and pxelinux configurations. As such, the following entry will appear in /tftpboot/pxelinux.cfg/default:

LABEL citrix560
    kernel /images/citrix560/mboot.c32
    MENU LABEL citrix560
    append initrd=/images/citrix560/xen.gz ksdevice=bootif lang=  kssendmac text  ks=http://1.2.3.4/cblr/svc/op/ks/profile/citrix560
    ipappend 2

As you know, the PXE configuration for Citrix XenServer is not identical to that of a Redhat kickstart, the append line should read:

    append /images/citrix560/xen.gz dom0_mem=752M com1=115200,8n1 console=com1,vga --- /images/citrix560/vmlinuz xencons=hvc console=hvc0 console=tty0 answerfile=http://1.2.3.4/cblr/svc/op/ks/profile/citrix560 install --- /images/citrix560/install.img

Those of you well-versed in standard kickstarts were probably wondering earlier why I did not set kernel=vmlinuz and initrd=install.img above, but now you see why. Those of you who are well-versed in cobbler are probably now considering adding those missing append fields into the kernel-options --- but I have already tried that and the results are not what you want. Basically, the options will be space-delimited, parsed as a list and rearranged alphabetically, which does not work properly with Citrix automated installs (believe me, I tested this extensively). The syntax of that append line is very specific, which is why the xen.gz was configured as the initrd option so that it would appear first. Also, the /images/ directory is missing the remaining files needed for PXE installation. Both of these factors need to be corrected AFTER cobbler syncs up the default PXE file, so we can easily make use of cobbler's post-sync triggers.

Just make a /var/lib/cobbler/triggers/sync/post/citrix.sh script that contains the following:

#!/bin/bash
libdir=/var/lib/cobbler
srcdir=/var/www/cobbler
dstdir=/tftpboot
for profile in $( grep -l citrix560.ks $libdir/config/profiles.d/* ); do
    json=${profile##*/}
    name=${json%.json}
    [[ -d $dstdir/images/$name ]] && \
    rsync -av $srcdir/ks_mirror/$name/{install.img,boot/vmlinuz} \
        $dstdir/images/$name/
done
for file in $( grep -l xen.gz $dstdir/pxelinux.cfg/* ); do
    sed -i'' -e 's!initrd=\(/images/.*/\)\(xen.gz \)ks.*ks=\(.*\)$!\1\2dom0_mem=752M com1=115200,8n1 console=com1,vga --- \1vmlinuz xencons=hvc console=hvc0 console=tty0 answerfile=\3 install --- \1install.img!;' $file
done

Basically, this post-sync trigger finds all occurrence of the specified .ks file in the cobbler profiles.d directory and clones all the necessary XenServer files to /tftpboot/images/$profile (this example assumes that my distro and profile share the same name, which they do). Then it locates all the PXE configurations that reference xen.gz and rewrites the Redhat append line into a Citrix append line. That's pretty much all you need to automate a Citrix XenServer installation. The next part is about the post-install scripts that the Citrix answerfile referenced above if you plan on running things after the XenServer comes up.


5. post-installation

Now the answerfile referenced http://$server/cblr/aux/citrix/post-install, which means that you will need a /var/www/cobbler/aux/citrix/post-install file. Since I use DHCP, I can locate the cobbler server name in the syslogs and use that to disable netboot (to prevent the host from PXE boot upon the next reboot):

#!/bin/bash
server=$( grep DHCPACK /var/log/messages | tail -1 | awk '{ print $NF }' )
system=$( hostname )
# Disable pxe (disable netboot)
wget -O /dev/null "http://$server/cblr/svc/op/nopxe/system/$system"

If you need access to certain system information during the post-install, you can make curl/wget references to cheetah templates within the /var/www/cobbler/aux/citrix directory to retrieve those variables in a script. For instance, let's assume you want to set the default gateway using the cobbler info. You can create /var/www/cobbler/aux/citrix/gateway.template:

#!/bin/bash
# retrieve cobbler variables
#set $gateway = $interfaces['eth1']['static_routes'][0].replace(':','/').split('/')[-1]
route add default gw $gateway

Then add it to the profile or the system using:

cobbler system edit --name=$system --template-files="/var/www/cobbler/aux/citrix/gateway.template=/alias"

Then you can reference it in your post-install script:

curl http://$server/cblr/svc/op/template/system/$system/path/_alias

If you want to add some post-install process that need to take place after the initial reboot, then have the curl destination drop them into /etc/firstboot.d as initrc scripts (of course, if you do this during the Citrix automated installation phase, you need to chroot it as /tmp/root/etc/firstboot.d/99-*). I create a lot of XE scripts in firstboot to handle bonding, default gateway, etc, but that is a topic for another day.

Friday, March 25, 2011

Zenoss: Automated LDAP Authentication via twill

Call me a nitpicker, but after getting my Zenoss core packages installed via an automated configuration management tool, like cfengine or puppet, I dislike having to manually click through the Zope UI in order to attached an external user validation module, like the LDAP Authenticator. Luckily, since I was using Zenoss Enterprise, the enterprise ZenPacks for Zenoss includes both the LDAP Authentication plugin as well as the Synthetic Web Transaction plugin. By stepping through the twill-sh steps, I was able to create a twill sequence that would allow me to add LDAP Authentication with the same script that I used to install Zenoss and the subsequent ZenPacks.

If you execute the twill commands before you go through the Getting Started UI, then you can also take advantage of the default zenoss username and password:

# Set all your custom variables here:
setlocal ldapserver1 <primary_ldap_server_hostname>
setlocal ldapserver2 <secondary_ldap_server_hostname>
setlocal ldapserver3 <tertiary_ldap_server_hostname>
setlocal ldapouuser <ldap_base_ou_users>
setlocal ldapougroup <ldap_base_ou_groups>
setlocal ldapgrouptype <group_object_class>
setlocal ldapzenmanager <ldap_group_zenmanager>

go localhost:8080
fv 1 __ac_name admin
fv 1 __ac_password zenoss
submit

go /zport/acl_users/manage_addProduct/LDAPMultiPlugins/addLDAPMultiPlugin
fv 1 id LDAP
fv 1 title "OpenLDAP Login"
fv 1 LDAP_server $ldapserver1
fv 1 users_base $ldapouuser
fv 1 groups_base $ldapougroup
fv 1 roles ZenUser
submit

go /zport/acl_users/LDAP/manage_activateInterfacesForm
fv 1 interfaces:list IAuthenticationPlugin
fv 1 interfaces:list ICredentialsResetPlugin
fv 1 interfaces:list IPropertiesPlugin
fv 1 interfaces:list IGroupsPlugin
fv 1 interfaces:list IRolesPlugin
fv 1 interfaces:list IUserEnumerationPlugin
fv 1 interfaces:list IGroupEnumerationPlugin
fv 1 interfaces:list IRoleEnumerationPlugin
submit

go /zport/acl_users/LDAP/acl_users/manage_main
fv 1 obj_classes $ldapgrouptype
submit
fv 3 host $ldapserver2
submit
fv 3 host $ldapserver3
submit

go /zport/acl_users/LDAP/acl_users/manage_grouprecords
fv 3 group_name $ldapzenmanager
fv 3 role_name Manager
submit

A simple bash snippet that I use in my installation script that will run this ldap_zenoss.tw from whatever version of Zenoss you are using:

twshell=$(find $ZENHOME/ZenPacks -iname twill-sh)
PYTHONPATH=${twshell%/bin/twill-sh}/lib $twshell $ZENHOME/bin/ldap_zenoss.tw

And voila, automated installation without GUI representation...

Thursday, March 24, 2011

Zenoss: Stairway to Events

Even though it is doable (and incredibly easy to replicate, especially when using zenossYAMLTool), I make it a practice not to create multiple alerting rules for each individual event. So, instead of creating numerous alerting rules for each severity for each event, I try to lump them together as much as possible. For instance, in an operations group, this is probably the normal escalation procedure:
  1. send an email to the entry level support
  2. send an email to higher level support
  3. send an SMS to higher level support
One way to handle this would be to create three separate alerting rules that will send out emails according to the counts that the event is duplicated. Another way would be to escalate the severity of the alert as well so that the above actions are also associated with 3 global alerting rules with the following severities:
  1. Warning
  2. Error
  3. Critical
Anything lower in severity just appears in the Event Console but does not sent out any alerts.

However, suppose I wanted the escalation to occur over certain periods of time instead of duplication counts (which may or may not occur in 5 minute intervals).Then I would need to utilize the python API within a Zenoss transform. During the course of researching how to do this, I came upon many similar posts with varying degrees of success. I am posting mine now in the hopes that it will benefit others from all the trial and error that it took to clean this up and make it work for me:

import time
em = dmd.Events.getEventManager()
mydedupid = '|'.join([ evt.device, evt.component, evt.eventClass, evt.eventKey, '2' ])
try:
    ed = em.getEventDetail(dedupid=mydedupid)
    first = int(time.mktime(time.strptime(ed.firstTime, '%Y/%m/%d %H:%M:%S.000')))
except:
    first = 0
for sev in range(5,1,-1):
    mydedupid = '|'.join([ evt.device, evt.component, evt.eventClass, evt.eventKey, str(sev) ])
    try:
        ed = em.getEventDetail(dedupid=mydedupid)
        mycount = ed.count
        last = int(time.mktime(time.strptime(ed.lastTime, '%Y/%m/%d %H:%M:%S.000')))
    except:
        mycount = 0
        last = 1
    if mycount > 0: break
diff = last - first
if first == 0: evt.severity = 2
elif diff > 3600: evt.severity = 5
elif diff > 1800: evt.severity = 4
elif diff >  900: evt.severity = 3

The original event should be set to Info (severity=2) and will escalate to Warning after 15 minutes, Error after 30 minutes, and Critical after an hour. Where you place this transform in the Event Class tree depends on how the events should be affected by this suppression/escalation logic.

Other tricks that I implement in combination with this are:
  • Combine loadbalanced devices into a single event --- using the DeviceClass shortname:
    evt.device = evt.DeviceClass.split('/')[-1]
  • For SNMP Traps, set the Event Key (and thus the deduplication id) to the second tab-delimited field to combine certain traps into the same event:
    evt.eventKey = evt.fullLogLine.split('\\t')[1]
Once they escalate, they will trigger the appropriate alerting rule and send out the proper notification.

Wednesday, March 23, 2011

Zenoss: Maintenance Windows

I have noticed that when I set a maintenance window for a device, it stops any related alerts for that device, but it does not stop the events from appearing with varying severity levels. In order to suppress the events to the same severity as the Maintenance Window notification, I use the following transform:

# Suppress any events during maintenance window
if evt.severity > 2:
    for mw in device.getMaintenanceWindows():
        if mw.started is not None:
            evt.summary = 'Maintenance: %s' % evt.summary
            evt.severity = 2

As you can see, the transform uses the same objects classes that the local python API uses, so you should be able to retrieve just about anything from the system. Stay tuned next time when I outline the time-based suppression/escalation code that I use for certain events.

Friday, March 4, 2011

Zenoss: The SNMP Chinese Wall, Separating Integers from Strings

In Zenoss, one of the issues that one finds when one devotes oneself to a pure SNMP implementation, avoiding SSH altogether, is that the Zenoss SNMP DataSource only really handles numerical outcomes. That means that it can handle the INTEGER value from nsExtendResult as well as convert the purely numerical STRING value from nsExtendOutLine, but cannot deal with the stdout from an entire nsExtendOutputFull OID.

To remedy this, just simply create a DataSource that uses one of the INTEGER OIDs, then a Threshold that triggers a specific Event Class, and finally a Transform within that Event Class that will replace the INTEGER OID with a STRING OID and snmpget the entire stdout from the SNMP Extend script. Assuming that you have read my SNMP Extend post, let us continue using the same OID for remote_command as well as the same exact template as before. All we need now is a transform for the /Perf/Snmp Event Class that will look something like this:

alert = re.search('threshold of (\w+)_failure_output', evt.message)
if alert:
    ds = evt.eventKey.split('|')[0]
    for t in device.getRRDTemplates():
        for s in t.getRRDDataSources():
            if ds == '%s_%s' % (s.id, s.id):
                 # evt.summary = snmpget of (nsExtendResult OID
                 # rewritten as nsExtendOutputFull OID)

As I was having trouble figuring out how to utilize the existing Zenoss python libraries (and not have to install one of the various proprietary pysnmp/snmppy modules) to create a pure python snmpget, I originally created a python-based subprocess call to the system's netsnmp tools:

from subprocess import *
outoid = s.oid.replace('8072.1.3.2.3.1.4','8072.1.3.2.3.1.2')
cmd = 'snmpget -Ov -%s -c%s %s %s' % (
    device.zSnmpVer, device.zSnmpCommunity,
    device.manageIp, outoid)
proc= Popen(cmd, shell=True, stdout=PIPE, stderr=PIPE)
evt.summary = proc.stdout.readlines()[0].replace('STRING: ','')
break

Eventually, after some research and speaking with Zenoss support, this was the shortest pure python implementation I was able to figure out:

from twisted.internet import reactor
from pynetsnmp import twistedsnmp

def snmpget(proxy, oids):
    data = proxy.get(oids)
    data.addCallback(snmpvalue)

def snmpvalue(result):
    global snmpdata
    snmpdata = result
    reactor.stop()

where the evt.summary would be parsed like this:

proxy = twistedsnmp.AgentProxy(
    community=device.zSnmpCommunity,
    snmpVersion=device.zSnmpVer,
    ip=device.manageIp)
proxy.open()
oids = [s.oid.replace('8072.1.3.2.3.1.4','8072.1.3.2.3.1.2')]
reactor.callWhenRunning(snmpget, proxy, oids)
reactor.run()
proxy.close()
evt.summary = snmpdata[snmpdata.keys()[0]]
break

However, this produces the following zenperfsnmp output:

yyyy-mm-dd HH:MM:SS,123 ERROR zen.zenperfsnmp: [Failure instance: Traceback (failure with no frames): : Connection was closed cleanly.
]
Traceback (most recent call last):
  File "/opt/zenoss/Products/ZenHub/PBDaemon.py", line 382, in pushEvents
    driver.next()
  File "/opt/zenoss/Products/ZenUtils/Driver.py", line 64, in result
    raise ex
PBConnectionLost: [Failure instance: Traceback (failure with no frames): : Connection was closed cleanly.
]

which results in killing zenhub. When posted to the Zenoss Support portal, the Zenoss "engineers agree that the deferred is a bad thing to use inside the event transform." So, even though I find the subprocess method distasteful, it is the optimal (or only?) transform that can be performed. If something knows otherwise, please feel free to comment.

So, either dump the transform into the GUI panel or inject it via the add_transforms feature in my zenossYAMLTool and you should be set. For the complete YAML, download my Result2Output.yaml and modify it accordingly --- it contains additional python code for the transform to handle SNMPv3 using zConfigurationProperties as well as some simple escalation parsing logic.

Thursday, March 3, 2011

Zenoss: Dr SNMP Extend or How I Learned to Stop SSHing and Love the OID

So there are all these scripts that you run on your remote hosts for monitoring. Since Zenoss has a built-in nagios parser, you basically run all of these scripts via SSH. What sucks is that establishing X number of SSH sessions for Y number of devices in Zenoss builds up a significant number of TCP connections and other wonderfully painful bottlenecks on your Zenoss system. Now, because all of these scripts already exist on the remote host, you can just as easily run it as root with a simple SNMP Extend line in your snmpd.conf. The line you would need to add would look something like this:

extend remote_command /customdir/customscript custom args -s 123

Do not forget to restart snmpd for the changes to take place. The next step would be to create a Zenoss DataSource for this --- bearing in mind that Zenoss works better with OID numbers than MIB names --- the straightforward approach would simply be to walk the SNMP tree and convert to OID:

snmpwalk -v2c -cpublic hostname 'NET-SNMP-EXTEND-MIB::nsExtendResult."remote_command"' -On

Two things to note in that command:
  1. If you want to see the MIB name, just simply remove the -On
  2. If you noticed that I used snmpwalk instead of snmpget, then you will understand that I do so because sometimes I like to walk the entire Extension tree using nsExtendObjects and snmpget would just barf on that
Rather than painstakingly reproduce all that using screenshots, I will illustrate the template creation steps using zenossYAMLTool syntax, which is how I normally make changes to my Zenoss system (to avoid tons of images here as well as tediously clicking through the GUI). Assuming that I will not be graphing in this template, the YAML needed to create the template looks like this:

- action: add_template
  description: Result Threshold retrieving Output Summary
  targetPythonClass: Products.ZenModel.Device
  templateName: Result2Output
  templatePath: /Server/Linux/TestCase
  GraphDefs: []

Now that we have the OID, adding a DataSource is fairly easy:

DataSources:
  - dsName: remote_command
    cycletime: 300
    enabled: true
    eventClass: /Cmd/Fail
    oid: 1.3.6.1.4.1.8072.1.3.2.3.1.4.x.114.101.109.111.116.101.95.99.111.109.109.97.110.100
    parser: Auto
    severity: 3
    sourcetype: BasicDataSource.SNMP
    DataPoints:
    - dpName: remote_command
      isrow: true
      rrdtype: GAUGE

You will find the red-highlighted number will increment as you add additional Extensions into the OID table. I had originally had a script that would generate the OID value for a specified extend_command, but since I cannot predict which position it would appear in the OID table, we will have to rely on snmpget/snmpwalk -On.

In order for it to trigger an alert, create a MinMax Threshold that will trigger an event (to keep it simply, we will be assuming 0=success and 1=failure here):

Thresholds:
  - thresholdName: remote_failure_output
    enabled: true
    escalateCount: 0
    eventClass: /Perf/Snmp
    maxval: '0'
    minval: ''
    severity: 3
    dsnames:
    - remote_command_remote_command

For the complete YAML file, please read my next blog on how to use transforms to extract the nsExtendOutputFull after the nsExtendResult triggers and event.

And that is it for now. Of course, for the die-hard nagios plugin fanatic, you may be apt to point out that using SSH enables you to pass DataPoints via the stdout as such:

STATUS: Some useful output message here|data1=100;;; data2=10;20;30

This would be added to Zenoss with a single DataSource that contains multiple DataPoints. For my zenossYAMLTool syntax, the YAML would look something like this:

DataSources:
  - dsName: remote_command
    cycletime: 300
    enabled: true
    eventClass: /Cmd/Fail
    parser: Auto
    severity: 3
    sourcetype: BasicDataSource.COMMAND
    usessh: true
    DataPoints:
    - {dpName: data1, isrow: true, rrdtype: GAUGE}
    - {dpName: data2, isrow: true, rrdtype: GAUGE}

To recreate the same effect using SNMP Extend, you would need to add multiple DataSources, each with a single DataPoint, as the OIDs are mapped directly to each DataSource. The script should output the message and the data values on separate lines:

STATUS: Some useful output message here
100
10

And you would make use of the multiple nsExtendOutLine."remote_command".# OIDs to gather your DataSources:

DataSources:
  - dsName: remote_command
    cycletime: 300
    enabled: true
    eventClass: /Cmd/Fail
    oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.12.114.101.109.111.116.101.95.99.111.109.109.97.110.100.1
    parser: Auto
    severity: 3
    sourcetype: BasicDataSource.SNMP
    DataPoints:
    - dpName: remote_command
      isrow: true
      rrdtype: GAUGE
  - dsName: remote_data1
    cycletime: 300
    enabled: true
    eventClass: /Cmd/Fail
    oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.12.114.101.109.111.116.101.95.99.111.109.109.97.110.100.2
    parser: Auto
    severity: 3
    sourcetype: BasicDataSource.SNMP
    DataPoints:
    - dpName: remote_data1
      isrow: true
      rrdtype: GAUGE
  - dsName: remote_data2
    cycletime: 300
    enabled: true
    eventClass: /Cmd/Fail
    oid: 1.3.6.1.4.1.8072.1.3.2.4.1.2.12.114.101.109.111.116.101.95.99.111.109.109.97.110.100.3
    parser: Auto
    severity: 3
    sourcetype: BasicDataSource.SNMP
    DataPoints:
    - dpName: remote_data2
      isrow: true
      rrdtype: GAUGE

Yes, the YAML appears longer but it is not more complex, merely an exercise in copy-and-paste repetition. The multiple DataSources can easily be written as a loop in a simply generator script. So you have to decide which way you want to handle it --- though, I suppose, these examples make it seem like just a choice between configuration vs performance.

Wednesday, March 2, 2011

Zenoss: The Straw That Broke...The YAMLs Back

I remember when my buddy over at Linux Dynasty first introduced me to Zenoss 1.x a few years back.  At the time, I knew nothing about python or any of the related open source projects related to python.  I only knew that I was not a fan of Zope only because it had caused me a lot of grief at a previous job.  After looking at it for a few minutes, I had decided that the reporting aspect was severely lacking, so I spent a week hacking up some python and TAL code in order to create what I had called zenoss-organized-graphs, a few dynamic reporting template that allowed you to select what to graph from standard combo dropdowns.  My employer decided not to use Zenoss, and when Zenoss launched 2.0, the whole backend (including the reporting engine) had changed completely, and my one-week-hack project died right then and there.  Fast forward to last year when, right before I started my new position, the Operations VP had already decided and purchased Zenoss Enterprise 2.5.x.  Thus began my reignited love/hate relationship with Zenoss.

Originally, because of the nature of the hosting provider, we had to run Zenoss using purely SSH/Nagios commands and scripts.  The performance and overhead of all those SSH connections, despite our attempts to keep all the scripts short and zippy, was a headache.  One day, while trying to update the default behavior of a the Solaris Device SSH templates, I discovered that deleting ZenPacks is BAD.  So bad that the only way to recover from the situation it caused was a ZODB Restore.  Mind you, this was not the first issue I had regarding ZenPacks, but it was the straw that broke the camel's back.  I made a solemn vow to myself that I would avoid ZenPacks as much as possible.

Let me clarify that last statement just a bit.  I believe that ZenPacks have a useful purpose, as I do install the Core and Enterprise ZenPacks.  That purpose is to add tabs/menu views, like the Zeus ZenPack, or to add global functionality, like LDAP-Authentication or the Holts-Winter Prediction.  But due to what I consider its volatile nature (of removal/undo), I simply do not think it is the best way to perform simple updates to templates, reports, devices, etc.  Again Linux Dynasty, a much more devout follower of Zenoss than I, had a solution --- probably based loosely on our earlier carefree Zenoss 1.x python discussions --- and it came in the form of the Zenoss_Template_Manager.  I am certain that it is a great tool, and if you have been using it and you love it, then read no further.  However, even though I am a big fan of getopts and passing parameters to a script, the prospect of having to pass multiple DataSources, Thresholds and GraphDefs did not readily appeal to me.  I needed a way to manipulate the fairly complex templates I had built using configuration files, which also serve (for me, at least) as a better backup technique than the ZenPack or the ZenBackup.   Zenoss itself uses XML to import changes into the ZODB but, while I believe in the power of XML, there had to be a better way for humans to process the data quickly and easily enough to affect rapid changes for template duplication, etc.  And thus began my conversion to the YAML religion.

Originally, I wrote my zenossYAMLTool to be for adding and removing devices and templates.  Throughout the course of the year, I would extend it to cover alerts, devices, event commands, event mappings, os processes, reports, templates, transforms, users/groups and maintenance windows.  As I needed more functionality to handle minor changes, reproductions and migrations, I added features as I needed them.  I had spoken about this tool with a few people here and there who I had conversations with when the topic of Zenoss came up and it occurred to me that perhaps, by sharing my tool, I could get some feedback and identify any bugs I overlooked.

So here it is: zenossYAMLTool.py

Zenoss has all the python dependencies that are required to use this tool, except for the python YAML plugin.  So install that and, if you want to make it convenient, place the script into $ZENHOME/bin and run it as the zenoss user.  In order to understand the YAML variables that are used, you should simply use the appropriate export feature on an existing object on your Zenoss instance.  Using my latest addition (-w) as an example, here is a snippet of what a YAML configuration for this tool looks like:

- action: add_window
  windowName: Overnight Maintenance
  windowPath: /Server/Linux/
  duration: 120
  repeat: Daily
  skip: 1
  start: 1299139200.0
  enabled: true

I actually use this tool to generate grouped Multi-Graph Reports from the existing DeviceClasses --- perhaps one day I will revitalize the zenoss-organized-reports into the 3.x release, as I am not so sure that I am 100% satisfied that the current Reporting engine managed to cover all the features I had with my zenoss-organized-graphs back then.

This Is Tech

In order to avoid munging all my thoughts together into a single blog, I have decided to create a tech blog where I will simply post up technobabble when I feel like sharing tidbits of technical issues that have caused my grief and how I resolved them.

This will mostly be stuff I come across moving forward.  I will dredge up stuff from the past should I re-encounter it or if I find myself reminiscing about past jobs and past lives.