Processing the input - the big picture

This section will tell you how a log message transform in an incident, what the delay and pending queues are for, how logpecker eliminates multiple problem reports, and the detailed modus operandi of logpecker.

First Stage: Message Parsing and Matching

As soon as logpecker has finished its initialization, it is watching its input files. When a new line arrives, it is parsed and matched against the rules you have defined. Based on the rules, a so-called incident is created from the input line. If several rules "fire", you get several incidents.

An incident has certain properties:

severity: How important this incident is. Can be one of ignore, info, notice, warn, error, crit, or the special "ok" (which indicates a resolved problem). Each rule you define includes the severity.
name: a symbolic name like "NFS.server.servername.unreachable". If several incidents have the same name, logpecker considers them to refer to the same problem, so it is important to include all relevant information in the name. The name is assigned by matching the input line to your rule definition.
time, host, syslog facility and priority, message-string: these are parsed from the input line. You will see this information in the reports.

If there is no rule that matches an input line, logpecker creates a special "unknown" incident type.

Now, when an input line has been parsed and transformed into one or several incidents, the following can happen to them:

If it has the severity "ignore", it is silently ignored. That's probably what you have expected.
If another incident with the same name already exists, the further processing depends on the stage this incident has reached. This is explained below.
If the incident has priority "ok" (i.e. problem resolved) and there is no other incident with the same name, it is also ignored.
Else, the incident is put into the "delay queue" for a certain, configurable period (20 seconds by default).

Second Stage: Delay Queue

Now, we have created an incident and put it in the delay queue. It hold incidents for a short period to catch message storms and see if this problem would be removed by a "problem resolved" incident that follows directly.

During the delay period, the following can happen:

Another incident with the same name arrives.
To deal with message storms, where the same message is repeated over and over for a short period, all the repeating ones will be silently dropped.
A "problem resolved" incident with the same name arrives.
To accomodate for messages like these notorious "NFS server does not respond" / "NFS server OK", you can define special rules that create "problem resolved" incidents. If such an incident arrives, all other incidents with the same name are silently removed from the delay queue without any fuss about it.
The incident times out
If the incident is still in the queue after 20 seconds, it is finally reported as "initial occurance" through your configured reports and moved to the "pending queue". For details on the reports, please see the separate report reference section.

Third stage: Pending Queue

After the incident has been reported it is held in the pending queue for typically 6 hours. This queue holds all "active" incidents and allows to identify re-occuring problems.

During this time, the following can happen to it:

Another incident with the same name arrives: This incident is then reported as "follow-up occurance". For details, please take a look at the report reference section. The timeout (6 hours) is restarted for this incident.
A "problem resolved" incident with the same name arrives: This is reported as "problem solved" (see report reference), and the incident is removed from the pending queue.
Incidents are defined in groups. If the number of incidents in a group that have the same severity exceeds a configurable constant (default is 30), a "group overflow" is reported (see report reference) and further incidents of this group with same or lower severity are ignored for a certain period. This mechanismn is a kind of self-protection against masses of messages and keeps the memory usage low under all circumstances. (Well, I have cheated in the previous section: the same mechanismn exists in the delay queue, too.)
Eventually, if nothing removes the incident, the pending period times out. The incident is then removed from the pending queue.

Processing the input - the big picture

First Stage: Message Parsing and Matching

Second Stage: Delay Queue

Third stage: Pending Queue

Further readings