Category Archives: Tips & Tricks

Automatically Restarting a Failed Windows Process

Whether it’s a critical process running on a server or an application on your desktop – sometimes processes terminate and need to be restarted – immediately.

With EventSentry & EventSentry Light you can do just that: Automatically restart processes immediately after they terminate.

In the past, one drawback of EventSentry launching a process was the side effect that any process started by the EventSentry agent would run under the same account as the EventSentry agent itself (usually a privileged domain account or LocalSystem).

In this post I’ll discuss how you can work around that limitation in a secure manner using a scheduled task. When the critical process fails, instead of launching the process directly through a process action, EventSentry will trigger a scheduled task instead. Why? Because scheduled tasks allow you to configure under which user a task will run – and the user’s password is securely stored in Windows.

The recipe for accomplishing this feat is as follows:

  • Process Monitoring monitors the process
  • An event log filter looks for the “failed process” event and triggers a process action
  • The process action starts a scheduled task

Let’s look at this in detail. First, on the host where the critical but unstable task is running, you create a schedule task in the Windows “Task Scheduler”. Under General, give the task a descriptive name (“Start Super Important App”) and change the user under which the program should be running under. In most cases you will also want to make sure that you configure the task to run whether a user is logged on or not. Then, under “Actions”, add a new action “Start a program” which points to the executable that should be launched. After you click “OK” you will be prompted for the password for the user.

Scheduled Task

Creating a scheduled task

The next step is to setup process monitoring in EventSentry. Right-click “System Health” and create a new package and assign it to the computer(s) in question. Right-click the newly added package and select “Add – Processes”. Click the newly added object and add the name of the process which should be monitored. You can configure how many instances of the processes are required, and with which severity the event will be logged when the process is inactive.

process monitoring

Configuring process monitoring

Now we create a new “Process” action. Right-click the “Actions” container, select “Add” and enter a descriptive name (e.g. “Trigger Super Important App”). In the Filename field specify:

%SYSTEMROOT%\system32\schtasks.exe

And for the Command Line Arguments enter:

/Run /TN “Start Super Important App”

This uses the built-in Microsoft utility schtasks.exe to run the task we created in our first step. At this point EventSentry will monitor the specified process and log an event if the process is inactive. And while we do have an action to trigger the scheduled task, we still need to tell EventSentry when to launch that action.

EventSentry Process Action

Configuring a process action to start a scheduled task

For the next step, right-click the “Event Logs” container, select “Add Package” and give that package a descriptive name. Then assign the package to the same host. Right-click the newly added package and add a filter by clicking “Add Filter”. In the filter dialog, add the “ Trigger Super Important App” action to the action list and configure the following fields:

Event Log Include Filter Rule

Setting up a rule to trigger the process action

Event Severity: Information
Event Log: Application
Event Source: EventSentry
Category: Process Monitoring
Event ID: 10401
Content Filter (wildcard): *critical_app.exe*

Important Notes: The event severity will need to match whichever severity you selected when adding the process monitoring object in the system health package. The content filter can also be configured to match insertion string #1, in which case the wildcards are not necessary.

And that’s all there’s to it, simply save the configuration when you are done. If the process is running on a remote host then don’t forget to push the configuration to that host.

CryptoLocker Defense for Sysadmins

It seems as if CryptoLocker has been making the rounds lately, much to the dismay of users who don’t have working backups of their precious office documents.

While I admire Cryptolocker’s simplicity and effectiveness from a purely technical and entrepreneurial standpoint, what the software is doing does appears to be illegal in most countries and so I’d like to offer some advise on how to tame the beast. If you’re looking for a 5-minute fix then I have bad news: implementing the CryptoLocker defense I have outlined below, while completely free, will take a little more than 5 minutes to implement. But knowing that you have an effective defense against CryptoLocker may very well be worth it. After all, CryptoLocker seems to find its way into a lot private networks these days.

CryptoLocker Screenshot

The ideas set forth in this blog post apply mostly to Windows-networks with file servers, but could be adapted for individual computers as well (though this is not covered here – let me know if you’d like me to include this scenario).

About CryptoLocker
For those who have not heard of CryptoLocker yet, it is a piece of software which encrypts pretty much all common office-type documents, including Microsoft Office, AutoCAD, PDFs, images and more. This blog article from MalwareBytes has a complete list of extensions. Once encrypted, CryptoLocker charges you to decrypt (your own files) again. It’s public key cryptography gone wrong; I wonder if Diffie & Hellman saw this one coming. And to make the whole spiel even more interesting, you only get a limited amount of time to pay before your files will remain encrypted. Forever. Oh – and the longer the wait, the more you have to pay. And with recent bit coin exchange rates in excess of USD 1000, the amount that needs to be paid can be uncomfortably high.

It is pretty difficult to defend against something like CryptoLocker other than through usual means of AntiSpyware software, user eduction and strict policies against opening and downloading files from the Internet, email attachments and such. In most cases CryptoLocker comes in form of a ZIP attachment disguised with a PDF icon.

One reason CryptoLocker is so effective – yet difficult to block – is because it exhibits the same behavior as users would: It “simply” accesses and modifies files like a user would. And infecting a machine isn’t all that difficult since CryptoLocker doesn’t require any elevated permissions to run. On the contrary, it wants to run in the same context the user does, so that it can access and see the same files a user does. As such, security features like UAC are utterly useless against ransomware like CryptoLocker – it’s a whole new type of software.

Backups
The most effective defense against CryptoLocker is to have a working, tested backup. Let me repeat this: A WORKING and TESTED backup. Users have lost all their data because they thought that they had a backup in place when their backup was broken in some way.

We’ve seen posts of users who deleted all the files CryptoLocker encrypted, thinking they had a working backup. They had a backup, but it was apparently not recently tested and as a result the user lost all of their data.

Naturally, CryptoLocker does not like backups. It dislikes them so much that when CryptoLocker runs, it even tries to delete any Windows Shadow Copy backups. Cloud backup services (including Dropbox, Skydrive and Google Drive etc.) which keep versions of your files offer some protection, but restoring older versions of your files may be a tedious process.

The Defense
The most obvious defense against CryptoLocker is AntiSpyware software, e.g. MalwareBytes. Most AntiSpyware & AntiVirus software still uses signatures however, so new versions of the ransom ware often remain undetected at least for a few days.

So instead of detecting CryptoLocker itself, we can sniff its tracks so to speak. CryptoLocker’s predictable behavior can be used against it. CryptoLocker’s objective is of course to encrypt and hold hostage as many files as possible, so to increase the likelihood of the user purchasing the decryption key from the thugs.

And it is that very pattern that we will try to exploit and use as a trigger to detect and take corrective measures. The approach consists of measuring how many files are being changed in a certain time interval, and if a certain threshold is being exceeded (say more than 10 files modified in 1 minute) we assume that CryptoLocker found its way into our castle. Even though users modify their documents on a regular basis, users can usually make only so many changes at a time and most likely at a much slower rate than any sort of script / software would.

Another approach would be to create one or more honeypot or canary files, which we know (or hope) a user would not modify. If a checksum change in one of those files were detected, we could (more or less) safely assume that CryptoLocker was on one of his rampages again and take corrective measures. The honeypot file would have to be modifiable by users (otherwise CryptoLocker would also not be able to modify it), which makes accidental modifications by users possible (although somewhat unlikely).

This 2nd approach isn’t quite as solid in my opinion, since CryptoLocker is most certainly adapting to changes, and may skip files that it may suspect are a trap. For example, it could skip small files or skip directories with a very small number of files and so forth.

A more sophisticated approach, where we detect an unusually large number of files changes in a small time period, is going to be harder to circumvent by CryptoLocker. The good news is that we have a free (it’s really free, not a trial) software tool available which can do just that. It can:

  • detect file changes
  • measure the rate of file changes (through event log alerts)
  • stop/start services or launch processes
  • send out alerts

EventSentry (Light) to the rescue
EventSentry Light is the free version of our full-spectrum monitoring & compliance solution EventSentry. The features we can utilize to come up with a defense are:

  • File Checksum Changes (part of System Health Monitoring)
  • Filter Thresholds (part of Event Log Monitoring)
  • Action (control services, send out emails)

File Checksum Monitoring
Monitors any folder and detects file size changes, checksum changes and file additions and deletions. EventSentry Light will log file checksum changes to the event log (it’s big brother can also log them to a database), which in turn is monitored by the real-time event log monitoring component.

Event Log Monitoring & Thresholds
This component supports a variety of sophisticated features, one of which are thresholds. The thresholds feature lets you essentially detect event log entries that occur at a certain pace. For example, if 10 specific events occur in 1 minute then let me know and/or take corrective action.

Service Action, Email Action
EventSentry supports a variety of action types to be triggered when an event occurs, with email usually being the most commonly used one. You can also control services, use REST APIs, launch processes and much more. We’ll use the former to stop the file sharing services (LanmanServer) when we have determined that CryptoLocker is on the loose.

I will go into step-by-step instructions on how to configure EventSentry at the end of the post.

The Baseline
The most difficult thing to determine is the maximum rate of file changes we deem normal, as we need to have a baseline in order to configure the threshold slightly above that. This number will vary from network to network, with file servers serving lots of users obviously requiring a larger threshold. I’d like to repeat that determining the right threshold is very important. If it is too low, then normal user activity will trigger an alarm; if it is too high then the alarm may never be triggered and CryptoLocker won’t be caught in time.

The best approach is to setup file monitoring and let it do its job for 1-2 days to determine a baseline. Once the baseline is established, we can increase it by a certain factor (say 1.5) and use that as the threshold.

Setting up the trap requires 3 steps. In this case we assume that EventSentry is either installed directly on the file server, or an agent is deployed on the file server (in which case you will need to make sure that configuration updates are pushed to the file server(s) in question).

Step 1: Monitoring the directory/ies
In EventSentry, right-click the system health packages and add a new package. Right-click the package, select “assign” and assign it to all file servers. Right-click the package again and add a “File Monitoring” package. Click the new object. Directories are monitored in real time by default, but EventSentry requires a recurring scan as well – in case Windows doesn’t send real time notifications. This is usually a good thing, but when you are monitoring large directories it’s best to set the interval very high (future versions will allow for this to be unchecked).

File Checksum Monitoring Settings

In the package, add all the folders which should be monitored and only check the “checksum change” check box. Do not check any of the other check boxes in the bottom left section at this time. Since we haven’t established a baseline yet, we’ll set the severity of the event log alerts to “Information”. If the monitored folders contain a lot of non-Office files then it may be a good idea to adjust monitoring so that only office files (e.g. .doc, .xls, etc.) are monitored. If you prefer to monitor all files, simply change the setting to the green PLUS icon and make sure the list of exclusions is empty (or specifies files that should be excluded, e.g. *.tmp). Below is a screenshot of how this can be configured.

File Checksum Monitoring Settings

When you save the configuration, EventSentry will enumerate all files in the folder and create an initial checksum for every file. The agent will log event 12215 when the scan starts, and event 12216 when the scan is complete. When that happens, EventSentry is essentially “armed” and will detect, and log, all checksum changes to any of the files in the monitored directories.

At this point we’ll want to let this run for at least 24 hours during a “normal” work day, as to determine how many file changes occur on average. You are going to be at a bit of an advantage if you are running the full or the trial version with database support, as it will be a lot easier to determine the number of file changes occurring through the web-based reporting.

Step 2: Setting up the trap
Now that we have established a baseline, we’re ready to setup a threshold. This time we’ll create a new event log monitoring package. Right-click “Event Log Packages” and add a new package and call it “CryptoLocker Rules”. Like before, assign it to the file servers we are monitoring. Right-click the package again and add a new event log filter. Configure the filter as shown in the screenshot below. Note that we are triggering an email action for now. The content filter can be used to restrict the filter further, e.g. to only match certain directories if you are monitoring several directories with EventSentry.

Event Log Filter Setup

Now things are getting interesting. The goal is to create an error event in the event log when X amount of file checksum changes occur in a given time period. To get there, we’ll start with the “General” tab where we tell the filter what type of event we are interested in (see below). Once that event is defined, we’ll move on to the “Threshold” tab which is where we specify the threshold parameters. For the purpose of an example, let’s assume that we have established a baseline of 100 file checksum changes per day, with a work day starting at 8am and ending at 7pm. Assuming that activity is somewhat spread throughout the day, this amounts to about 9 file changes per hour. Naturally we’ll have to assume that file changes aren’t always evenly spread out throughout the day, but setting up a “if 20 checksum changes occur in 1 minute shut file sharing down” is probably a reasonable threshold. Configure the threshold as shown in the screenshot below, with whichever threshold you came up with.

Event Log Filter Threshold Setup

Step 3: Triggering corrective action
When our threshold is reached, EventSentry will log an error to the event log with event id 10601 and trigger the specified action(s) from the “General” tab (Default Email) one time per threshold interval.

At this point we would merely receive an alert when we suspect that CryptoLocker is at it again. If you are cautious then you can retain this setup for a little while (e.g. a day or two) to ensure that you are not getting any alerts about the threshold being met (assuming that CryptoLocker is not active on your network in which case you should get the emails).

To go all in and trigger a server service shutdown, we’ll need to create a service action now. On Windows, file sharing services are provided by the “Server” service, which uses the internal name of “LanmanServer”. The service action allows you to control any service (start/stop/restart), and in this case we’ll obviously want to stop the server service, so that clients cannot access the file shares on your server anymore. We’ll trigger an email action at the same time of course, so that the sysadmin in charge is aware of what is going on. While shutting down all file services seems a bit extreme, it’s unfortunately the most effective way to prevent more files from becoming encrypted.

So for the next step, right-click the “Actions” container and select “Add Action”. At the selection dialog choose the “Service” action, enter a descriptive name (e.g. “Stop File Sharing”) and hit enter.

Selecting an EventSentry Notification

Then, configure the settings of the service as shown in the screenshot below.

Action to stop the LanmanServer service

The last step of our setup (congratulations if you’ve made it that far) is to assign the service action to the filter we previously created. After all, a service action which isn’t referenced anywhere doesn’t do much good. So head back to the Event Log Packages, find the “CryptoLocker Rules” package and edit the filter in the package. In the action list on top, click the “Add” button and add the action you just created.

Testing
If at all possible I’d recommend testing the EventSentry setup at a time when your users are not interrupted. Adding a few template files to one of the monitored folders and changing them in short succession (a script may be necessary depending on how short your threshold interval is) should trigger the file services shutdown procedure. Once verified, you can just start the “Server” service again.

Conclusion
Just like in the real world, network viruses come in all shapes and sizes – only limited by technology and the imagination of the cyber-evildoers.

I hope that this article gave you some insight into CryptoLocker and a good way to guard against it. As always, make sure that your company has the following in place:

  • Email Attachment scanning
  • Working, tested backups
  • User education
  • AntiSpyware software

With those in place, one should be able to keep future infections to a minimum.

Stay safe & decrypted.

Mobile Alerts: Pushing EventSentry alerts directly to your mobile devices with Prowl and NMA

When it comes to mobile alerts, email seems to still be the prevalent method of choice for many IT pros. There are many good reasons why network alerts delivered via email are convenient:

  • easy to configure
  • uses existing infrastructure
  • every smartphone and tablet supports email
  • supports attachments (e.g. performance charts from EventSentry’s performance alerts)
  • integrates into your existing environment – everybody already uses email!

What’s not to like? Well, of course it turns out that some of these advantages can also be a disadvantage:

  • emails are not real time
  • problems on the email server often don’t surface immediately
  • important alerts can be overlooked in the inbox jungle
  • you cannot be alerted about email problems via email (duh!)

Viable Email Alternatives
Thankfully, there are a number of alternatives that can be used as an email substitute or addition for mobile alerts. In this article I’ll focus on two affordable services: Prowl (for the iOS platform) and Notify My Android (you may have guessed it – for the Android platform) – subsequently referred to as “NMA”. Both of these services consist of apps for their respective platform and a web-based back end which will push the notifications to your device(s) in near real time.

Mobile Alert on iPhone 5 with Prowl

EventSentry event sent to Prowl on an iPhone

Both services offer an HTTP API which we can connect to with EventSentry’s HTTP action. If you have never used the HTTP action in EventSentry, then here is some background: the HTTP action allows you to POST any event (whether it be an event from the security event log or a heartbeat alert for example) to either Prowl’s or NMA’s web service. These notifications are then pushed to one or more mobile devices.

Neither service currently requires a monthly subscription, but both require a purchase (Prowl costs USD 2.99 whereas NMA costs USD 4.99) if you want to send unlimited notifications to your mobile device(s). NMA is a little more welcoming to strangers – it supports up to 5 notifications per day at no charge.

Prowl: Getting Started
The
iOS app costs USD 2.99, and supports up to 1000 API calls (=notifications) per hour. To get started:

  1. Purchase the Prowl: Growl Client from the Apple App Store and install it
  2. Register for free at prowlapp.com
  3. Login & create an API key

Notify My Android (NMA): Getting Started
The Notify My Android app is free in the Google Play store and supports up to 5 API calls (=notifications) per day  for free. Upgrading to a premium account will allow up to 800 API calls per hour. To get started:

  1. Download the Notify My Android app from the Google Play store
  2. Register for free at www.notifymyandroid.com
  3. Login & create an API key
  4. Optional: Upgrade to a Premium account to allow for unlimited notifications

Once you have the mobile app installed and the API key in hand, you can start setting up a HTTP action in EventSentry. Notifications essentially consist of three fields: The application name, a subject as well as a message field, all of which can be customized. As such, it’s up to you configure which part of an event log alert you will put in the subject, and which part you will put into the actual message. In our example, we generate a dynamic notification subject with the host name, event id, event source and event category. The notification body will simply consist of the event message text, though this can be customized as well.

Setting up a HTTP action
Right-click the “Actions” container (or, in v3.0, use the “Add” button in the Ribbon) and create a new HTTP action. The HTTP action requires a URL at minimum, optional credentials and the actual data fields to submit in the HTTP POST. Conveniently, both Prowl & NMA use the same field names. I suspect that they are adhering to some sort of standard, though I couldn’t find any references.

The first parameter to configure is the URL, which depends on the service you use:

Prowl: https://api.prowlapp.com/publicapi/add
NMA: https://www.notifymyandroid.com/publicapi/notify

Configuring authentication credentials is not necessary since you are essentially authenticating with the API key you generated. The last step is configuring the form fields. The bold field on the left is the name of the form element and the text to the right the value. A description follows in italic on the next line:

apikey: abdef123123abababafefefe
The API key you received

application: EventSentry
This will displayed as part of the notification. I use “EventSentry” here, but this can really be anything, so it could be the host name as well for example ($EVENTCOMPUTER)
Max Length: 256

event: $EVENTCOMPUTER-$EVENTID-$EVENTSOURCE-$EVENTCATEGORY
This is the subject of the notification. You can use any variables, including insertion strings $STR1, $STR2 etc.
Max Length: 1024 [Prowl], 1000 [NMA]

description: $EVENTMESSAGE
This is the main message body of the notification, the $EVENTMESSAGE seems like a pretty good candidate for this field
Max Length: 10000

priority: 0 (possible values are -2, -1, 0, 1 and 2)
This field is optional and doesn’t do anything with NMA. With Prowl however, a priority can be set to “2″ (indicating an “emergency”) which may then override quiet hours on the mobile app (if quiet hours are configured in the Prowl mobile app).

Take a look at the respective API documentation for Prowl and NMA as well; each show additional fields which can be configured, as well as additional information that you may or may not find useful.

The screenshot below shows a fully configured HTTP action for Prowl:

EventSentry HTTP Action for Prowl

EventSentry HTTP Action for Prowl

The HTTP action for NMA would look identical, with the exception of the URL which is different. You can click the “Test” button, which will submit the configured data to the specified URL and should, when configured correctly, immediately generate a notification on the mobile device. Please note that event variables will not be resolved when testing.

Threshold
To make sure that your mobile device doesn’t get flooded with alerts (some applications have the tendency to generate not one but hundreds of events in a short period of time), I highly recommend that you setup a threshold on the action or the event log filter referencing the action (I personally prefer the former). You can also setup a schedule so that notifications are only sent on certain days and/or during certain hours.

The last step would be to configure an event log filter to forward select events to the mobile device, something that is beyond the scope of this article. See the tutorials below for more information:

Reliability
It seems only natural to wonder whether alerts sent through these services can be used for mission critical systems. I’ve been using mostly Prowl as I’m an iPhone user, and have been very happy with it’s fast response times (which are almost instant), the service reliability and the stability of the iOS app. Nevertheless, both prowlapp.com and NMA state that you should not solely rely on their system for critical alerts and instead setup multiple channels for mission critical alerts. This sounds nice in theory, but suspect that most sysadmins will not want to dismiss alerts on more than one device – something that can get old if you get a fair number of alerts. Switching to a commercial system like PagerDuty with guaranteed up-time may be preferable in that case. I will talk more about PagerDuty in an upcoming post soon.

My experience with the NMA Android app wasn’t as good during the limited testing I performed. While it worked great when it worked, the app did crash on me a couple of times.

Conclusion
If you’re looking for a way to push alerts to your mobile device from EventSentry without using email and without spending big bucks, Prowl and NMA are worth looking into. They’re affordable, responsive and easy to configure.

Monitoring Windows Updates

Automatic Windows Updates are a wonderful thing when they are working as expected, and many organizations employ WSUS or patch management software to keep their infrastructure up to date with the latest Microsoft hot fixes and service packs.

While this works for many, not everybody can afford patch management software, and, while free, managing the disk-hungry WSUS can be a daunting task as well. This leaves some sysadmins to use the old-fashioned Windows Updates to install all the regular and out-of-band patches Microsoft releases.

If you don’t feel comfortable installing patches automatically however, configuring Windows Update to “download updates for later manual installation” is often safer and more predictable. But, if you’re not logging on to the server(s), you won’t know whether one or more updates are ready for installation or not. Even if you’re just managing one server, checking in on a regular basis can be a waste of time.

updates_are_ready.pngThis is where EventSentry and its log file monitoring feature comes in. It turns out that Windows, like a diligent ship captain, logs all activity to a log file. And with all, I really do mean ALL. The file I’m talking about is windowsupdate.log, and it tells you just about everything that’s going on with Windows Update. In 3-4 steps that don’t take longer than 5 minutes, you can setup real-time monitoring of the WindowsUpdate.log file, and be notified when updates are about to be downloaded to a monitored computer.

The screenshot below shows what such an email from EventSentry would look like:

email_approved_updates.png

From then on, you can either get email notifications when patches are downloaded, or use the web-based reporting to view a report from all of your monitored hosts. On a high level, the configuration works like this:

  • 1. Setup a log file (%systemroot%\windowsupdate.log in this case)
  • 2. Create & assign a new log file package
  • 3. Define a log file filter (this tells EventSentry what to look for in the file, and where to send it)
  • 4. Setup an email action (this is usually already setup)
  • 5. Optionally setup an event log filter to forward alerts to email (the default filter setup should automatically forward warnings)

The WindowsUpdate.log is useful for troubleshooting as well, and you can consolidate the content of this file from all of your servers in the central EventSentry database. This makes searching for text and/or comparing the log from multiple servers a breeze. Having the log file accessible through the web reports is also useful when a patch caused problems and the server is offline. You can view the most recent activity from the log file through the web-based reporting even when the server is unavailable.

So how do you set this up? Assuming you have EventSentry v2.93 installed (any edition will do, including the free “light” version), you can follow the steps outlined below. Note that all steps will need to be performed in the EventSentry management console.


1. Setup a file definition.
This tells EventSentry which file you want to monitor, and sets up a logical representation of that file in the EventSentry configuration. In the “Tools” menu, click “Log Files and File Types” and then click the “Add” button.

menu_tools.png

add_file_definition.png

tree_logfile_package.png

2. Create a package and add a filter. Right-click the “Log File Packages” container, select  ”Add Package” and choose a descriptive name. Since new packages are unassigned by default, right-click the newly created package, select “Assign” and assign the package to host(s) on which you want to monitor the Windowsupdate log file.

3. Setup a log file filter. This tells EventSentry which content of the monitored file you are interested in. In every log file filter you can configure a database as well as an event log filter.

Right-click the previously created package and select “Add File”. From the list, select the log file definition created in step 1, “WindowsUpdateLog”. Select the new log file.

The database tab determines which content goes to the database (in most cases you will write all file contents to the database), while the event log tab determines which log file contents are written to the event log. For this project, we are interested in the following wildcard matches:

*AU*# Approved Updates =*

*DnldMgr*Updates to download =*

The first wildcard match will tell you the total number of updates which have been approved and will be downloaded, whereas the 2nd line will fire for every individual update which will be downloaded. In most cases the first line is sufficient and the 2nd line can be skipped.

file_filter_eventlog.png

That’s it! With this setup, you will immediately get notified when patches are ready to be installed. The only thing I didn’t mention here is how to setup an email action and corresponding event log filter, since both of these are usually already setup by default. If you need help with this, please check out our documentation and/or
tutorials.

Please note that the full and evaluation version of EventSentry can inventory installed software and patches. This enables you to use the web interface for viewing/searching installed patches, and get (email) alerts when a patch has been successfully installed.

As always, happy monitoring!

How to make the Windows Software RAID log to the event log

Windows Server has long provided admins the ability to create a software RAID, enabling redundant disks without a (potentially expensive) hardware RAID controller. If you are already using Windows Server 2008′s software RAID capabilities, and think that Windows will somehow notify you when a disk in an array fails, then you can skip to “Just say something!” below.

Background
Creating a RAID can all be done from Disk Management view in the Computer Management console, without any scripting or command-line tools.

Software RAIDs are not as powerful and fast as their hardware counterparts, but are nevertheless a good way to enable disk redundancy. Software RAIDs make sense in a variety of scenarios:

•    When you are on a budget and don’t want to spend a few hundred $$ on a hardware RAID controller
•    When you need to enable redundancy on a server that wasn’t originally designed with redundancy in mind (as if that would ever happen!)
•    When you need to add redundancy to a server without reinstalling the OS or restoring from backup

Windows Server lets you do all this, and it’s included with the OS – so why not take advantage of it? The last point is often overlooked I think – you can literally just add a hard disk to any non-redundant Windows server and create a mirror – with less than dozen clicks!

Since this article is starting to sound like a software raid promotion, and for the sake of completeness, I am listing SOME of the advantages of a hardware RAID here as well:

•    Faster performance due to dedicated hardware, including cache
•    More RAID levels than most software RAIDs
•    Hot-plug replacement of failed disks

•    Ability to select a hot spare disk
•    Better monitoring capabilities (though this article will alleviate this somewhat)

But despite being far from perfect, software RAIDs do have their time and place.

Just Say Something Please!
Unfortunately, despite all the positive things about software RAID, there is a major pitfall on Windows 2008: The OS will not tell you when the RAID has failed. If the RAID is in a degraded state (usually because a hard disk is dead) then you will not know unless you navigate to the Disk Management view. The event logs are quiet, there are no notifications (e.g. tray), and even WMI is silent as a grave. Nothing. Nada. Nix.

What’s peculiar is that this is a step back from Windows 2003, where RAID problems were actually logged to the System event log with the dmboot and dmio event sources. What gives?

Even though a discussion on why that is (or is not) seems justified, I will focus on the solution instead.

The Solution

Fortunately, there is a way to be notified when a RAID is “broken”, thanks in part to the diskpart.exe tool (which is part of Windows) and EventSentry. With a few small steps we’ll be able to log an event to the event log when a drive in a software RAID fails, and send an alert via email or other notification methods.

Diskpart is pretty much the command-line interface to the Disk Management MMC snap-in, which allows you to everything the MMC snap-in does – and much more! One of the things you can do with the tool is to review the status of all (logical) drives. Since we’re interested as to whether a particular RAID-enabled logical drive is “Healthy”, we’ll be looking at logical drives.

So how can we turn diskpart’s output into an email (or other) alert? Simple: We use EventSentry‘s application scheduler to run diskpart.exe on a regular basis (and since the tool doesn’t stress the system it can be run as often as every minute) and generate an alert. The sequence looks like this:

•    EventSentry runs our VBScript (which in turn runs diskpart) and captures the output
•    When a problem is detected, EventSentry logs an error event 10200 to the application event log, including output from step 1.
•    An event log filter looks for a 10200 error event, possibly looking at the event message as well (for custom routing).

Diskpart
Diskpart’s output is pretty straightforward. If you just run diskpart and execute the “list volume” command, you will see output similar to this:


Volume ###  Ltr  Label        Fs     Type        Size     Status     Info   
----------  ---  -----------  -----  ----------  -------  ---------  --------
Volume 0         System Rese  NTFS   Simple       100 MB  Healthy    System  
Volume 1     C                NTFS   Mirror       141 GB  Healthy    Boot    
Volume 2     D   System Rese  NTFS   Simple       100 MB  Healthy           



disk_management_3_cropped.png

Notice the “Status” column, which indicates that our “BOOT” volume is feeling dandy right now.  However, when a disk fails, the status is updated and reads “Failed Rd” instead:


Volume ###  Ltr  Label        Fs     Type        Size     Status     Info   
----------  ---  -----------  -----  ----------  -------  ---------  --------
Volume 0         System Rese  NTFS   Simple       100 MB  Healthy    System  
Volume 1     C                NTFS   Mirror       141 GB  Failed Rd  Boot    
Volume 2     D   System Rese  NTFS   Simple       100 MB  Healthy           

Technically, scripting diskpart is a bit cumbersome, as the creators of the tool want you to specify any commands to pass to diskpart in a text file, and in turn specify that text file with the /s parameter. This makes sense, since diskpart can automate partitioning, which can certainly result in a dozen or so commands.

For our purposes however it’s overkill, so we can trick diskpart by running a single command:

echo list volume | diskpart

which will yield the same results as above, without the need of an “instruction” file.

The easy way out
The quickest way (though per usual not the most elegant) to get RAID notifications is to create a batch file (e.g. list_raid.cmd) with the content shown earlier

echo list volume | diskpart

and execute the script on a regular basis (e.g. every minute) which will result in the output of the diskpart command being logged to the event log as event 10200.

Then, you can create an include filter in an event log package, which will look for the following string:

*DISKPART*Status*Failed Rd*

If your EventSentry configuration is already setup to email you all new errors then you don’t even have to setup an event log filter – just creating the script and scheduling it will do the trick.

But surely you will want to know how this can be accomplished in a more elegant fashion? Yes? Excellent, here it is.

A Better Solution

One problem with the “easy way out” is that it will not detect all Non-Ok RAID statuses, such as:

•    At Risk
•    Rebuild

disk_management_resync.pngFurthermore, the output can be rather verbose, and will include any logical drive, include CD-ROMs, removable disks and others.

It is for this reason we have created a VBScript, which will parse the output of the diskpart command with a regular expression, and provide the following:

•    A filtered output, showing only drives in a software raid
•    Formatted output, showing only relevant drive parameters
•    Detecting any Non-Healthy RAID

Alas, an example output of the script is as follows:

Status of Mirror C: (Boot) is "Healthy"

Much nicer, isn’t it? If a problem is detected, then output will be more verbose:

**WARNING** Status of Mirror C: (Boot) is "Failed Rd"

   
WARNING: One or more redundant drives are not in a "Healthy" state!

The VBScript will look at the actual “Status” column and report any status that is not “Healthy”, a more accurate way to verify the status of the RAID.

Since the script has a dynamic ERRORLEVEL, it’s not necessary to evaluate the script output – simply evaluating the return code is sufficient.

Implementation
Let’s leave the theory behind us and implement the solution, which requires only three steps:

1.    Create an embedded script (we will include this script with v2.93 by default) through the Tools -> Embedded Scripts option, based on the VBScript below. Select “cscript.exe” as the interpreter. Embedded scripts are elegant because they are automatically included in the EventSentry configuration – no need to manage the scripts outside EventSentry.

monitor_raid_embedded_script.png2.    Create a new System Health package and add the “Application Scheduler” object to it. Alternatively you can also add the Application Scheduler object to an existing system health package. Either way, schedule the script with a recurring schedule.

application_scheduler_monitor_raid.pngNote that commands starting with the @ symbol are embedded scripts. The “Log application return code 0 to event log …” option is not selected here, since the script runs every minute and would generate 1440 entries per day. You may want to enable this option first to ensure that your configuration is working, or if you don’t mind having that many entries in your application log. It’s mainly a matter of preference.

3.    This step is optional if you already have a filter in place which forwards Errors to a notification. Otherwise, create an event log filter which looks for the following properties:

  Log: Application
  Severity: Error
  Source: EventSentry
  ID: 10200
  Text (optional): “WARNING: One or more redundant drives*”


The VBScript


‘ Lists all logical drives on the local computer which are configured for
‘ software RAID. Returns an %ERRORLEVEL% of 1 if any redundant drive is
‘ not in a “Healthy” state. Returns 0 otherwise.

‘ Supports Windows Vista/7, Windows 2008/R2

Option Explicit

Dim WshShell, oExec
Dim RegexParse
Dim hasError : hasError = 0

Set WshShell = WScript.CreateObject(“WScript.Shell”)
Set RegexParse = New RegExp

‘ Execute diskpart
Set oExec = WshShell.Exec(“%comspec% /c echo list volume | diskpart.exe”)

RegexParse.Pattern = “\s\s(Volume\s\d)\s+([A-Z])\s+(.*)\s\s(NTFS|FAT)\s+(Mirror|RAID-5)\s+(\d+)\s+(..)\s\s([A-Za-z]*\s?[A-Za-z]*)(\s\s)*.*”

While Not oExec.StdOut.AtEndOfStream
    Dim regexMatches
    Dim Volume, Drive, Description, Redundancy, RaidStatus
    Dim CurrentLine : CurrentLine = oExec.StdOut.ReadLine
    
    Set regexMatches = RegexParse.Execute(CurrentLine)
    If (regexMatches.Count > 0) Then
        Dim match
        Set match = regexMatches(0)
        
        If match.SubMatches.Count >= 8 Then
            Volume      = match.SubMatches(0)
            Drive       = match.SubMatches(1)
            Description = Trim(match.SubMatches(2))
            Redundancy  = match.SubMatches(4)
            RaidStatus  = Trim(match.SubMatches(7))
        End If

        If RaidStatus <> “Healthy” Then
            hasError = 1
            WScript.StdOut.Write “**WARNING** “
        End If
        
        WScript.StdOut.WriteLine “Status of ” & Redundancy & ” ” & Drive & “: (” & Description & “) is “”" & RaidStatus & “”"”
    End If
Wend

If (hasError) Then
    WScript.StdOut.WriteLine “”
    WScript.StdOut.WriteLine “WARNING: One or more redundant drives are not in a “”Healthy”" state!”
End If

WScript.Quit(hasError)



How to dynamically toggle your Wireless adapter with EventSentry

Most of the time I work on a Lenovo laptop running Windows 7, and I’m overall quite happy with the laptop (especially after the mainboard was replaced and it stopped randomly rebooting). However, a minor nuance had been bugging me for a while: If I plugged my computer into a LAN (I have a docking station at work and at home) while a wireless signal was also available (and configured on the laptop), Windows 7 would keep both connections active.

1. The Problem

So I’d have my laptop in the docking station, connected to a 1Gb Ethernet network, and yet the laptop would also be connected to a WiFi network. While not a big deal per se, it does have a few advantages to automatically disable the WiFi connection when already connected to Ethernet:

  • Avoid potential connectivity issues
  • Increase security by not transmitting data via Wifi when not necessary
  • Increase battery life when connected to a LAN
  • Because you can!

Now, Lenovo equips most (if not all) laptops with a software called “Access Connections”, a pretty nifty and free tool! One of the things it can do, is disable a Wireless adapter when the computer is connected to Ethernet. However, it never re-enables it when you disconnect from the wired network (at least I haven’t found a way), and besides not everybody has Lenovo laptop.

So how could I disable the WiFi connection automatically when I connected the laptop to an Ethernet, yet automatically re-enable it when there is no Ethernet connection?

2. The Research

After some intense brainstorming, I remembered two things:

  1. Most Ethernet NIC drivers log event to the System event log when a network port is connected/disconnected.
  2. A while back, I used the netsh command to configure DNS servers from the command line. Maybe one could toggle the state of network adapters with this tool as well?

3. Evidence Gathering

The first one was easy, a quick look at the system event log revealed the following event:

e1kexpress_event.pngA similar event is logged when the “network link” has connected. The event shown here is specific to the driver of my laptop’s network card (an Intel(R) 82577LM adapter), but most newer drivers will log events when a cable is disconnected or the link is otherwise lost. If you are already running EventSentry with its hardware inventory feature enabled, then you can obtain the name of the network adapter from any monitored host on the network through the hardware inventory page, an example is shown below.

all_nics.pngComing up with a way to enable and disable a particular network connection with netsh.exe was a bit more challenging, but I eventually cracked the cryptic command line parameters of netsh.exe.

Enable a connection
netsh interface set interface “Wireless Network Connection” ENABLED

Disable a connection

netsh interface set interface “Wireless Network Connection” DISABLED

And yes, you do need to specify the word “interface” twice. If you do find yourself wanting to automate network adapter settings with scripts and/or the command line frequently, then you should check out this link.

4. The Solution

So now that we have all the ingredients, let’s take a look at the recipe. In order to accomplish the automatic interface toggling, we need to create:

  • an embedded script called wifi_enable.cmd, using the command line from above
  • an embedded script called wifi_disable.cmd, again using the command line from above
  • a process action “Wifi Enable”, referencing the above wifi_enable.cmd embedded script
  • a process action “Wifi Disable”, referencing the above wifi_disable.cmd embedded script
  • an event log filter for event source “e1kexpress” and event id 27, triggering the “Wifi Enable” action
  • an event log filter for event source “e1kexpress” and event id 32, triggering the “Wifi Disable” action

A couple of clarifications: First, you do not need to use embedded scripts, you can create the scripts in the file system too and then point the process action to those files. I prefer embedded scripts since I don’t have to worry about maintaining the script, as it gets distributed to remote hosts automatically when needed. Second, the event source and event id will depend on the network card you have installed on your network, the above example will only work with Lenovo T410 laptops.

So what happens is pretty straightforward: When I connect my laptop to a LAN, the Intel NIC driver writes event id 32 with the event source e1kexpress to the system event log. EventSentry intercepts the event and triggers the Wifi Disable action, which in turns runs the netsh.exe process, disabling the WiFi connection.

5. Setting it up in the management console

Embedded Scripts

You can manage embedded scripts via Tools -> Embedded Scripts. Click “New”, specify a descriptive name (e.g. wifi_enable.cmd) and paste the command line netsh interface set interface “Wireless Network Connection” ENABLED into the script content window. Then, do the same for the wifi_disable.cmd script, but this time use the netsh interface set interface “Wireless Network Connection” DISABLED command line. You can leave the interpreter empty as long as you give the filename the .cmd extension.

embedded_script.pngActions

Create two process actions, one pointing to wifi_enable.cmd, and one pointing to wifi_disable.cmd. You can access these embedded scripts by clicking the pull-down – you should see the embedded script(s) you created in step one – each prefixed with the @ symbol. The resulting dialog should look like this:

process_wifi_enable.pngI recommend enabling both “Event Log Options”, as this will help with troubleshooting. Now we just need the event log filters, and we are all set.

Like I mentioned earlier, you can also reference any external process or .cmd file with the process action, if you’d rather not use embedded scripts.

Event Log Filters
Since we’ll need two filter, we’ll create a new event log package called “Toggle Wifi” by right-clicking the “Event Log Packages” container and selecting “Add Package”. Inside the package we can then add the two filters: One to trigger the “Wifi Enable” action when the NIC drivers logs its event indicating that the network cable was unplugged, and one that will trigger the “Wifi Disable” action when the NIC drivers logs that the network cable was plugged in. The filter will look similar to this, but note that the event source as well as event id will depend on the network card and driver.

filter_wifi_enable.pngThat’s pretty much it. If you enabled the event log options in the process action earlier, then you can see the output from the process action in the event log, as shown below:

event_action.pngHere are some links to the official EventSentry documentation regarding the features used:

Do not trust thee RAID alone

I’m assuming that most readers are familiar with what RAID, the “Redundant Array of Inexpensive Disks”, is. Using RAID for disk redundancy has been around for a long time, apparently first mentioned in 1987 at the University of California, Berkeley (see also: The Story So Far: The History of RAID). I’m honestly not sure why they chose the term “inexpensive” back in 1987 (I suppose “RAD” isn’t as catchy of a name), but regardless of the wording, a RAID is a fairly easy way to protect yourself against hard drive failure. Presumably, any production server will have a RAID these days, especially with hard drives being as inexpensive as they are today (unless you purchase them list price from major hardware vendors, that is). Another reason why RAID is popular, is of course the fact that hard drives are probably the most common component to break in a computer. You can’t really blame them either, they do have to spin an awful lot.

burnt_server.jpg

Lesson #1: Don’t neglect your backups because you are using RAID arrays
That being said, we recently had an unpleasant and unexpected issue in our office with a self-built server. While it is a production server, it is not a very critical one, and as such a down-time of 1-2 days with a machine like that is acceptable (albeit not necessarily desired). Unlike the majority of our “brand-name” servers, which are under active support contracts, this machine was using standard PC components (it’s one of our older machines), including an onboard RAID that we utilized for both the OS drive as well as the data drive (it has four disks, both in a RAID 1 mirror). Naturally, the machine is monitored through EventSentry.

Well, one gray night it happened – one of the hard drives failed and a bunch of events (see myeventlog.com for an example) were logged to the event log, and immediately emailed to us. After disappointingly reviewing the emails, the anticipated procedure was straightforward:

1) Obtain replacement hard drive
2) Shut down server
3) Replace failed hard drive
4) Boot server
5) Watch RAID rebuilding while sipping caffeinated beverage

The first 2 steps went smoothly, but that’s unfortunately how far our IT team got. The first challenge was to identify the failed hard drive. Since they weren’t in a hot-swappable enclosure, and the events didn’t indicate which drive had failed, we chose to go the safe route and test each one of them with the vendors supplied hard drive test utility. I say safe, because it’s possible that a failed hard drive might work again for a short period of time after a reboot, so without testing the drives you could potentially hook the wrong drive up. So, it’s usually a good idea to spend a little bit of extra time in that case, to determine which one the culprit is.

Eventually, the failed hard drive was identified, replaced with the new (exact and identical) drive, connected, and booted again. Now normally, when connecting an empty hard drive, the raid controller initiates a rebuild, and all is well. In this case however, the built-in NVidia RAID controller would not recognize the RAID array anymore. Instead, it congratulates us on having installed two new disks. Ugh. Apparently, the RAID was no more – it was gone -  pretty much any IT guys nightmare.

No matter what we tried, including different combinations, re-creating the original setup with the failed disks, trying the mirrored drive by itself, the RAID was simply a goner. I can’t retell all the things that were tried, but we ultimately had to re-create the RAID (resulting in an empty drive), and restore from backup.

We never did find out why the RAID 1 mirror that was originally setup was not recognized anymore, and we suspect that a bug in the controller firmware caused the RAID configuration to be lost. But regardless of what was ultimately the cause, it shows that even entire RAID arrays may fail. Don’t relax your backup policy just because you have a RAID configured on a server.

Lesson #2: Use highly reliable RAID levels, or configure a hot spare
Now I’ll admit, the majority of you are running your production servers on brand-name machines, probably with a RAID1 or RAID5, presumably under maintenance contracts that ship replacement drives within 24 hours or less. And while that does sound good and give you comfort, it might actually not be enough for critical machines.

Once a drive in a RAID5 or RAID1 fails, the RAID array is in a degraded state and you’re starting to walk on very thin ice. At this point, of course, any further disk failure will require a restore from backup. And that’s usually not something you want.

So how could a RAID 5 not be sufficiently safe? Please, please: Let me explain.

Remember that the RAID array won’t be fully fault tolerant until the RAID array is rebuilt – which might be many hours AFTER you plug in the repaired disk depending on the size, speed and so forth. And it is during the rebuild period that the functional disks will have to work harder than usual, since the parity or mirror will have to be re-created from scratch, based on the existing data.

Is a subsequent disk failure really likely though? It’s already pretty unlikely a disk fails in the first place – I mean disks don’t usually fail every other week. It is however much more likely than you’d think, somewhat depending on whether the disks are related to each other. What I mean with related, is whether they come from the same batch. If there was a problem in the production process – resulting in a faulty batch – then it’s actually quite likely that another bites the dust sooner rather than later. It happened to a lot of people – trust me.

But even if the disks are not related, they probably still have the same age and wear and, as such, are likely to fail in a similar time frame. And, like mentioned before, the RAID array rebuild process will put a lot of strain on the existing disks. If any disk is already on its last leg, then a failure will be that much more likely during the RAID array rebuild process.

raid6.pngRAID 6, if supported by your controller, is usually preferable to a RAID5, as it includes two parity blocks, allowing up to two drives to fail. RAID 10 is also a better option with potentially better performances, as it too continues to operate even when two disks fail (as long as it’s not the disks that are mirrored). You can also add a hot spare disk, which is a stand-by disk that will replace the failed disk immediately.

If you’re not 100% familiar with the difference between RAID 0, 1, 5, 6, 10 etc. then you should check out this Wikipedia article: It outlines all RAID levels pretty well.

Of course, a RAID level that provides higher availability is usually less efficient in regards to storage. As such, a common counterargument against using a more reliable RAID level is the additional cost associated with it. But when designing your next RAID, ask yourself whether the savings of an additional hard drive is worth the additional risk, and the potential of having to restore from a backup. I’m pretty sure that in most cases, it’s not.

Lesson #3: Ensure you receive notifications when a RAID array is degraded
Being in the monitoring business, I need to bring up another extremely
important point: Do you know when a drive has failed? It doesn’t help much to have a RAID when you don’t know when one or more drives have failed.

Most server
management software can notify you via email, SNMP and such – assuming
it’s configured. Since critical events like this almost always trigger
event log alerts as well though, a monitoring solution like EventSentry can simplify the notification process.
Since EventSentry monitors event logs, syslog as well as SNMP traps, you can take a uniform approach to notifications. EventSentry can notify you of RAID failures regardless of the hardware vendor you
use – you just need to make sure the controller logs the error to the
event log.

Lesson #4+5: Test Backups, and store backups off-site
Of course one can’t discuss reliability and backups without preaching the usual. Test your backups, and store (at least the most critical ones) off-site.

Yes, testing backups is a pain, and quite often it’s difficult as well and requires a substantial time commitment. Is testing backups overkill, something only pessimistic paranoids do? I’m not sure. But we learned our lessen the hard way when all of our 2008 backups were essentially incomplete, due to a missing command-line switch that recorded (or in our case did not) the system state. We discovered this after, well, we could NOT restore a server from a backup. Trust me: Having to restore a failed server and having only an incomplete, out-of-date or broken backup, is not a situation you want to find yourself in.

My last recommendation is off-site storage. Yes, you have a sprinkler system, building security and feel comfortably safe. But look at the picture on top. Are you prepared for that? If not, then you should probably look into off-site backups.

So, let me recap:

1. Don’t neglect your backups because you are using RAID arrays.
2. Use highly reliable RAID levels, or configure a hot spare.
3. Ensure you receive notifications when a RAID array is degraded
4. Test your backups regularly, but at the very least test them once to ensure they work.
5. Store your backups, or at least the most critical, off-site.

Stay redundant,
Ingmar.

Continue reading

Creating your very own event message DLL

If you’ve ever wrote code to log to the Windows event log before (e.g. through Perl, Python, …), then you might have run into a similar problem that I described in an earlier post: Either the events don’t look correctly in the event log, you are restricted to a small range of event ids (as is the case with eventcreate.exe) or you cannot utilize insertion strings.

In this blog post I’ll be showing you how to build a custom event message DLL, and we’ll go about from the beginning to the end. We’ll start with creating the DLL using Visual Studio (Express) and finish up with some example scripts, including Perl of course, to utilize the DLL and log elegantly to the event log.

Let’s say you are running custom scripts on a regular basis in your network – maybe with Perl, Python, Ruby etc. Your tasks, binary as they are, usually do one of two things: They run successfully, or they fail. To make troubleshooting easier, you want to log any results to the event log – in a clean manner. Maybe you even have sysadmins in other countries and want to give them the ability to translate standard error messages. Logging to the event log has a number of benefits: It gives you a centralized record of your tasks, allows for translation, and gives you the ability to respond to errors immediately (well, I’m of course assuming you are using an event log monitoring solution such as EventSentry). Sounds interesting? Read on!

Yes, you can do all this, and impress your peers, by creating your own event message file. And what’s even better, is that you can do so using all free tools. Once you have your very own event message file, you can utilize it from any application that logs to the event log, be it a perl/python/… script or a C/C++/C#/… application.

To create an event message file, you need two applications:

The reason you need the platform SDK, is because Visual Studio Express does not ship with the Message Compiler, mc.exe, for some reason. The message file compiler is essential, as without it there will be no event message file unfortunately. When installing the platform SDK, you can deselect all options except for “Developer Tools -> Windows Development Tools -> Win32 Development Tools” if you want to conserve space. This is the only essential component of the SDK that’s needed.

An event message file is essentially a specific type of resource that can be embedded in either a DLL file or executable. In EventSentry, we originally embedded the message file resources in a separate DLL, but eventually moved it into the executable, mostly for cleaner and easier deployment. We’ll probably go back to a separate message DLL again in the future, mostly because processes (e.g. the Windows Event Viewer) can lock the event message file (the executable in our case), making it difficult to update the file.

Since embedding an event message file in a DLL is more flexible and significantly easier to accomplish, I’ll be covering this scenario here. The DLL won’t actually contain any executable code, it will simply serve as a container for the event definitions that will be stored inside the .dll file. While it may sound a little bit involved to build a DLL just for the purpose of having an event message file (especially to non-developers), you will see that it is actually surprisingly easy. There is absolutely no C/C++ coding required, and I also made a sample project available for download, which has everything setup and ready to go.

In a nutshell, the basic steps of creating an event message file are as follows:

1. Create a message file (e.g. messagefile.mc)
2. Convert the message file into a DLL, using mc.exe, rc.exe and link.exe

Once we have the message file, we will also need to register the event message file in the registry, and associate it with an event source. Keep in mind that the event source is not hard-coded into the message file itself, and in theory a single event message file could be associated with multiple event sources (as is the case with many event sources from Windows).

So let’s start by creating a working folder for the project, and I will call it “myapp_msgfile”. Inside that directory we’ll create the message file, let’s call it myapp_msgfile.mc. This file is a simple text file, and you can edit it with your favorite text editor (such as Ultraedit, Notepad2 or Notepad++).

The file with the .mc extension is the main message file that we’ll be editing – here we define our event ids, categories and so forth. Below is an example, based on the scenario from before. Explanations are shown inline.


MessageIdTypedef=WORD

LanguageNames=(
    English=0×409:MSG00409
    German=0×407:MSG00407
)

Here we define which languages we support, and by which files these languages will be backed. You will have to look up the language id for other languages if you plan on supporting more, and you can remove German if you only plan on supporting English.


MessageId=1
SymbolicName=MYTOOL_CATEGORY_GENERAL
Language=English
Tasks
.
Language=German
Jobs
.

Our first event id, #1, will be used for categories. Categories work in the exact same way as event ids. When we log an event to the event log and want to include a category, then we only log the number – 1 in this case.


MessageId=100
SymbolicName=TASK_OK
Language=English
Task %1 (%2) completed successfully.
.
Language=German
Job %1 (%2) war erfolgreich.
.

This is the first event description. The “MessageId” field specifies the event id, and the symbolic name is a descriptive and unique name for the event. The language specifies one of the supported languages, followed by the event message text. You end the event description with a single period – that period has to be the only character per line.


MessageId=101
SymbolicName=TASK_ERROR
Language=English
Task %1 (%2) failed to complete due to error “%3″.
.
Language=German
Job %1 (%2) konnte wegen Fehler “%3″ nicht abgeschlossen werden.
.


MessageId=102
SymbolicName=TASK_INFO
Language=English
Task Information: %1
.
Language=German
Job Information: %1
.

Since we’re trying to create events for “custom task engine”, we need both success and failure events here. And voila, our event message file now has events 100 – 102, plus an id for a category.

So now that we have our events defined, we need to convert that into a DLL. The first step now is to use the message compiler, mc.exe, to create a .rc file as well as the .bin files. The message compiler will create a .bin file for every language that is defined in the mc file. Open the “Visual Studio Command Prompt (2010)” in order for the following commands to work:


mc.exe myapp_msgfile.mc

will create (for the .mc file depicted above):


myapp_msgfile.rc
msg00407.bin
msg00409.bin

With those files created, we can now create a .res (resource) file with the resource compiler rc.exe:


rc.exe /r myapp_msgfile.rc

which will create the


myapp_msgfile.res

file. The “/r” option instructs the resource compile to emit a .res file. Now we’re almost done, we’re going to let the linker do the rest of the work for us:


link -dll -noentry -out:myapp_msgfile.dll myapp_msgfile.res

The myapp_msgfile.res is the only input file to the linker, normally one would supply object (.obj) files to the linker to create a binary file. The “-noentry” option tells the linker that the DLL does not have an entry point, meaning that we do not need to supply a DllMain() function – thus the linker is satisfied even without any object files. This is of course desired, since we’re not looking to create a DLL that has any code or logic in it.

After running link.exe, we’ll end up with the long awaited myapp_msgfile.dll file.

The end. Well, almost. Our message file is at this point just a lone accumulation of zeros and ones, so we need to tell Windows that this is actually a message file for a particular event log and source. That’s done through the registry, as follows:

Open the registry editor regedit.exe. Be extremely careful here, the registry editor is a powerful tool, and needs to be used responsibly :-).

All event message files are registered under the following key:


HKLM\System\CurrentControlSet\Services\eventlog

Under this key, you will find a key for every event log as well as subkeys for every registered event source. So in essence, the path to an event source looks like this:


HKLM\System\CurrentControlSet\Services\eventlog\EVENTLOG\EVENTSOURCE

I’m going to assume here that we are going to be logging to the application event log, so we’d need to create the following key:


HKLM\System\CurrentControlSet\Services\eventlog\Application\MyApp

In this key, we need to following values:


TypesSupported (REG_DWORD)
EventMessageFile (REG_EXPAND_SZ)

TypesSupported is usually 7, indicating that the application will log either Information, Warning or Error events (you get 7 if you OR 1[error], 2[warning] and 4[information] together).

EventMessageFile is the path to your message DLL. Since the type is REG_EXPAND_SZ, the path may contain environment variables.

If you plan on utilizing categories as well, which I highly recommend (and for which our message file is already setup), then you need two additional values:


CategoryCount (REG_DWORD)
CategoryMessageFile (REG_EXPAND_SZ)

CategoryCount simply contains the total number of categories in your message file (1, in our case), and the CategoryMessageFile points to our message DLL. Make sure that your message file does not contain any sequence gaps, so if your CategoryCount is set to 10, then you need to have an entry for every id from 1 to 10 in the message file.

We could create separate message files for messages and categories, but that would be overkill for a small project like this.

Now that we have that fancy message DLL ready to go, we need to start logging. Below are some examples of how you can log to the event log with a scripting language. I’ll be covering Perl, Kix, and Python. Me being an old Perl fand and veteran, I’ll cover that first.

PERL
The nice thing about Perl is that you can take full advantage of insertion strings, so it can support event definitions containing more than one insertion string.


use strict;
use Win32::EventLog;


# Call this function to log an event


sub logMessage
{
    my ($eventID, $eventType, @eventDetails) = @_;
   
    my $evtHandle = Win32::EventLog->new("Your Software Application");

    my %eventProperties;

   # Category is optional, specify only if message file contains entries for categories

   $eventProperties{Category}      = 0;
    $eventProperties{EventID}       = $eventID;
    $eventProperties{EventType}     = $eventType;
    $eventProperties{Strings}       = join("\0", @eventDetails);

    $evtHandle->Report(\%eventProperties);
   
    $evtHandle->Close;
}


# This is what you would use in your scripts to log to the event log. The insertion strings
# are passed as an array, so even if you only have one string, you would need to pass it
# within brackets (“This is my message”) as the last parameter

logMessage(100, EVENTLOG_INFORMATION_TYPE, (“Database Backup”, “Monitoring Database”, “Complete”));
logMessage(102, EVENTLOG_INFORMATION_TYPE, (“Step 1/3 Complete”));


PYTHON

Python supports event logging very well too, including multiple insertion strings. See the sample code below:


import win32evtlogutil
import win32evtlog


# Here we define our event source and category, which we consider static throughout
# the application. You can change this if the category is different


eventDetails = {'Source': 'MyApp',    # this is id from the message file
                'Category': 1}        # which was set aside for the category


# Call this function to log an event


def logMessage(eventID, eventType, message, eventDetails):
    if type(message) == type(str()):
        message = (message,)
    win32evtlogutil.ReportEvent(eventDetails['Source'], eventID, eventDetails['Category'], eventType, tuple(message))

logMessage(100, win32evtlog.EVENTLOG_INFORMATION_TYPE, ("Database Backup", "Monitoring Database"), eventDetails)
logMessage(102, win32evtlog.EVENTLOG_INFORMATION_TYPE, ("Step 1/3 complete"), eventDetails)


KIXTART
The pro: Logging to the event log using KiXtart is so easy it’s almost scary. The con: It only supports message files that use one insertion string.


LOGEVENT(4, 102, "Database Backup", "", "MyApp")

UNICODE – ONE code to rule them all

If you live in an English-speaking country like the United States, United Kingdom or Australia, then you are in the lucky position where every character in your language can be represented by the ASCII table. Many other languages aren’t as lucky unfortunately, and it is no surprise given the fact that over 1000 written languages exist. Most of these languages cannot be interpreted by ASCII, most notably Asian and Arabic languages.

Take the text below for example, ASCII would be struggling with this a bit (to say the least):

النمسا

Understanding UNICODE is no easy feat however – just the mere abbreviations out there can be mind-boggling: UTF-7, 8, 16, 32, UCS-2, BOM, BMP, code points, Big-Endian, Little-Endian and so forth. UNICODE support is particularly interesting when dealing with different platforms, such as Windows, Unix and OS X.

It’s not all that bad though, and once the dust settles it can all make sense. No, really. As such, the purpose of this article is to give you a basic understanding of UNICODE, enough so that the mention of the word UNICODE doesn’t give you cold shivers down your back.

Unicode is essentially one large character set that includes all characters of written languages, including special characters like symbols and so forth. The goal – and this goal is reality today – is to have one character set for all languages.

Back in 1963, when the first draft of ASCII was published, Internationalization was probably not on the top of the committee member’s minds. Understandable, considering that not too many people were using computers back then. Things have changed since then, as computers are turning up in pretty much every electrical device (maybe with the exception of stoves and blenders).

The easiest way to start is, of course, with ASCII (American Standard Code for Information Interchange). Gosh were things simple back in the 60s. If you want to represent a character digitally, you would simply map it to a number between 1 and 127. Voila, all set. Time to drive home in your Chevrolet, and listen to a Bob Dylan, Beach Boys or Beatles record. I won’t go in to the details now, but for the sake of completeness I will include the ASCII representation of the word “Bob Dylan”:


String:      B    o    b         D    y    l    a    n
Decimal:     66   111  98   32   68   121  108  111  110
Hexadecimal: 0×42 0x6F 0×62 0×20 0×44 0×79 0x6C 0x6F 0x6E
Binary:      01000010 01101111 01100010 00010100
             01000100 01111001 01101100 01101111 01101110

Computers, plain and simple as they are, store everything as numbers of course, and as such we need a way to map numbers to letters, and vice versa. This is of course the purpose of the ASCII table, which tells our computers to display a “B” instead of 66.

Since the 7-bit ASCII table has a maximum of 127 characters, any ASCII character can be represented using 7 bits (though they usually consume 8 bits now). This makes calculating, how long a string is for example, quite easy. In C programs for example, ASCII characters are represented using chars, which use 1 byte (=8 bits) of storage. Here is an example in C:


char author[] = “The Beatles”;
int authorLen = strlen(author);        // authorLen = 11
size_t authorSize = sizeof(author);    // authorSize = 12

The only reason the two variables are different, is because C automatically appends a 0×0 character at the end of a string (to indicate where it terminates), and as such the size will always one char(acter) longer than the length.

So, this is all fine and well if we only deal with “simple” languages like English. Once we try to represent a more complex language, Japanese for example, things start to get more challenging. The biggest problem is the sheer number of characters – there are simply more than 127 characters in the world’s written languages. ASCII was extended to 8-bit (primarily to accommodate European languages), but this still only scratches the surface when you consider Asian and Arabic languages.

Hence, a big problem with ASCII is that is essentially a fixed-length, 8-bit encoding, which makes it impossible to represent complex languages. This is where the Unicode standard comes in: It gives each character a unique code point (number), and includes variable-length encodings as well as 2-byte (or more) encodings.

But before we go to deep into Unicode, we’ll just blatantly pretend that Unicode doesn’t exist and think of a different way to store Japanese text. Yes! Let us enter a world where every language uses a different encoding! No matter what they want to make you believe – having countless encodings around is fun and exciting. Well, actually it’s not, but let’s take a look here why.

The ASCII characters end at 127, leaving another 127 characters for other languages. Even though I’m not a linguist, I know that there are more than 127 characters in the rest of the world. Additionally, many Asian languages have significantly more characters than 255 characters, making a multi-byte encoding (since you cannot represent every character with one byte) necessary.

This is where encodings come in (or better, “came” in before Unicode was established), which are basically like stencils. Let’s use Japanese for our code page example. I don’t speak Japanese unfortunately, but let’s take a look at this word, which means “Farewell” in Japanese (you are probably familiar with pronunciation – “sayōnara”):

さようなら

The ASCII table obviously has no representation for these characters, so we would need a new table. As it turns out, there are two main encodings for Japanese: Shift-JIS and EUC-JP. Yes, as if it’s not bad enough to have one encoding per language!

So code pages serve the same purpose as the ASCII table, they map numbers to letters. The problem with code pages – opposed to Unicode – is that both the author and the reader need to view the text in the same code page. Otherwise, the text will just be garbled. This is what “sayōunara” looks like in the aforementioned encodings:



EUC-JP

0xA4 B5 A4 E8 A4 A6 A4 CA A4 E9

Shift_JIS
0×82 B3 82 E6 82 A4 82 C8 82 E7

Their numerical representation between EUC-JP and Shift_JIS is, as is to be expected, completely different – so knowing the encoding is vital. If the encodings don’t match, then the text will be meaningless. And meaningless text is useless.

You can imagine that things can get out of hand when one party (party can be an Operating System, Email client, etc.) uses EUC-JP, and the other Shift_JIS for example. They both represent Japanese characters, but in a completely different way.

Encodings can either (to a certain degree) be auto-detected, or specified as some sort of meta information. Below is a HTML page with the same Japanese word, Shift_JIS encoded:


<HTML>
    <TITLE>Shift_JIS Encoded Page</TITLE>

    <META HTTP-EQUIV=”Content-Type” CONTENT=”text/html; charset=Shift_JIS”>
    <BODY>
            さようなら
    </BODY>
</HTML>

You can paste this into an editor, save it has a .html file, and then view it in your favorite browser. Try changing “Shift_JIS” to “EUC-JP”, fun things await you.

But I am getting carried away, after all this post is about Unicode, not encodings. So, Unicode solves these problems by giving every character from every language a unique code point. No more “Shift_JIS”, no more “EUC-JP” (not even to mention all the other encodings out there), just UNICODE.

Once a document is encoded in Unicode, specifying a code page is no longer necessary – as long as the client (reader) supports the particular Unicode encoding (e.g. UTF-8) the text is encoded with.

The five major Unicode encodings are:

UTF-8
UCS-2
UTF-16 (an extension of UCS-2)
UTF-32
UTF-7

All of these encodings are Unicode, and represent Unicode characters. That is, UTF-8 is just as capable as UTF-16 or UTF-32. The number in the encoding name represents the minimum number of bits that are required to store a single Unicode code point. As such, UTF-32 can potentially require 4 x as much storage as UTF-8 – depending on the text that is being encoded. I will be ignoring UTF-7 going forward, as its use is not recommended and it’s not widely used anymore.

The biggest difference between UTF-8 and UCS-2/UTF-16/UTF-32 is that UTF-8 is a variable length encoding, opposed to the others being fixed-length encodings. OK, that was a lie. UCS-2, the predecessor of UTF-16, is indeed a fixed length encoding, whereas UTF-16 is a variable length encoding. In most use cases however, UTF-16 uses 2 bytes and is essentially a fixed length encoding. UTF-32 on the other hand, and that is not a lie, is a fixed-length encoding that always uses 4 bytes to store a character.

Let’s look at this table which lists the 4 major encodings and some of their properties:

Encoding   Variable/Fixed   Min Bytes   Max Bytes
UTF-8 variable 1 4
UCS-2 fixed 2 2
UTF-16 variable 2 4
UTF-32 fixed 4 4



What this means, is that in order to represent a Unicode character (e.g. さ), a variable length encoding might require more than 1 byte, and in UTF-8′s case up to 4 bytes. UTF-8 needs potentially more bytes, since it maintains backward-compatibility with ASCII, and as such loses 7 bits.

Windows uses UTF-16 to store strings internally, as do most Unicode frameworks such as ICU and Qt‘s QString. Most Unixes on the other hand use UTF-8, and it’s also the most commonly found encoding on the web. Mac OSX is a bit of a different beast; due to it using a BSD kernel, all BSD system functions use UTF-8, whereas Apple’s Cocoa framework uses UTF-16.

UCS-2 or UTF-16
I had already mentioned that UTF-16 is an extension of UCS-2, so how does it extend it and why does it extend it?

You see, Unicode is so comprehensive now that it encompasses more than what you can store in 2 bytes. All characters (code points) from 0×0000 to 0xFFFF are in the “BMP“, the “Basic Multilingual Plane”. This is the plane that uses most of the character assignments, but additional planes exist, and here is a list of all planes:

•    The “BMP”, “Basic Multilingual Plane”, 0×0000 -> 0xFFFF
•    The “SMP”, “Supplementary Multilingual Plane”, 0×10000 -> 0x1FFFF
•    The “SIP”, “Supplementary Ideographic Plane”, 0×20000 -> 0x2FFFF
•    The “SSP”, “Supplementary Special-purpose Plane”, 0xE0000 -> 0xEFFFF

So technically, having 2 bytes available is not even enough anymore to cover all the available code points, you can only cover the BMP. And this is the main difference between UCS-2 and UTF-16, UCS-2 only supports code points in the BMP, whereas UTF-16 supports code points in the supplementary planes as well, through something called “surrogate pairs“.

Representation in Unicode
So let’s look at the above sample text in Unicode, shall we? Sayonara Shift_JIS & EUC-JP! The site http://rishida.net/tools/conversion/ has some great online tools for Unicode, one of which is called “Uniview“. It shows us the actual Unicode code points, the symbol itself and the official description:

eventlogblog_unicode_uniview.pngThe official Unicode notation (U+hex) for the above characters uses the U+ syntax, so for the above letters we would write:


U+3055 U+3088 U+3046 U+306A U+3089

With this information, we can now apply one of the UTF encodings to see the difference:


UTF-8

E3 81 95 E3 82 88 E3 81 86 E3 81 AA E3 82 89

UTF-16
30 55 30 88 30 46 30 6A 30 89

UTF-32
00 00 30 55 00 00 30 88 00 00 30 46 00 00 30 6A 00 00 30 89

So UTF-8 uses 5 more bytes than UCS-2/UTF-16 to represent the same exact characters. Remember that UCS-2 and UTF-16 would be identical for this text since all characters are in the BMP. UTF-32 uses yet 5 more bytes then UTF-8 and would be require the most storage space, as to be expected.

What you can also see here, is that UTF-16 essentially mirrors the U+ notation.

Fixed Length or Variable Length?
Both encoding types have their advantages and disadvantages, and I will be comparing the most popular UTF encodings, UTF-8 and UCS-2, here:

Variable Length UTF-8:
•    ASCII-compatible
•    Uses potentially less space, especially when storing ASCII
•    String analysis/manipulation (e.g. length calculation) is more CPU-intensive

Fixed Length UCS-2:
•    Potentially wastes space, since it always uses fixed amount of storage
•    String analysis/manipulation is usually less CPU intensive

Which encoding to use will depend on the application. If you are creating a web site, then you should probably choose UTF-8. If you are storing data in a database however, then it will depend on the type of strings that will be stored. For example, if you are only storing languages that cannot be represented through ASCII, then it is probably better to use UCS-2. If you are storing both ASCII and languages that require Unicode, then UTF-8 is probably a better choice. An extreme example would be storing English-Only text in a UCS-2 database – it would essentially use twice as much storage as an ASCII version, without any tangible benefits.

One of the strongest suits of UTF-8, at least in my opinion, is its backward compatibility with ASCII. UTF-8 doesn’t use any numbers below 127 (0x7F), which are – well – reserved for ASCII characters. This means that all ASCII text is automatically UTF-8 compatible, since any UTF-8 parser will automatically recognized those characters as being ASCII and render them appropriately.

The BOM
And this brings us to the next topic – the BOM (header). BOM stands for “Byte Order Mark”, and is usually a 2-4 byte long header in the beginning of a Unicode text stream, e.g. a text file. If a text editor does not recognize a BOM header, then it will usually display the BOM header as either the þÿ or ÿþ characters.

The purpose of the BOM header is to describe the Unicode encoding, including the endianess, of the document. Note that a BOM is usually not used for UTF-8.

Let’s revisit the example from earlier, the UTF-16 encoding looked like this:


30 55 30 88 30 46 30 6A 30 89

If we wanted to store this text in a file, including a BOM header, then it could look also look like this:


FF FE 55 30 88 30 46 30 6A 30 89 30

“FF FE” is the BOM header, and in this case indicates that a UTF-16 Little Endian encoding is used. The same text in UTF-16 Big Endian would look like this:


FE FF 30 55 30 88 30 46 30 6A 30 89

The BOM header is generally only useful when Unicode encoded documents are being exchanged between systems that use different Unicode encodings, but given the extremely little overhead it certainly doesn’t hurt to add it to any UTF-16 encoded document. As such, Windows always adds a 2-byte BOM header to all Unicode text documents. It is the responsibility of the text reader (e.g. an editor) to interpret the BOM header correctly. Linux on the other hand, being a UTF-8 fan and all, does not need to (and does not) use a BOM header – at least not by default.

Tools & Resources
There are a variety of resources and tools available to help with Unicode authoring, conversions, and so forth.

I personally like Ultraedit, which lets me convert documents to and from UTF-8 and UTF-16, and also supports the BOM headers. GEdit on Linux is also very capable, and supports different code pages (if you ever need to use those) as well. Babelpad is an editor designed specifically for Unicode, and seems to support every possible encoding. I have not actually used this editor though.

A nifty online converter that I already mentioned earlier can be found at http://rishida.net/tools/conversion/, and also check out UniView: http://rishida.net/scripts/uniview/.

The official Unicode website is of course a great resource too, though potentially overwhelming to mere mortals that only have to deal with Unicode occasionally. The best place to start is probably their basic FAQ: http://www.unicode.org/faq/basic_q.html.

I hope this provides some clarification for those who know that Unicode exists, but are not entirely comfortable with the details.

さようなら!