Imagine we have a small network with one NFS server, called zappa, and several clients, named beethoven, bach and bruch. We want to use logpecker to process messages like the following:
Sept 19 20:13:15 bruch nfs: server zappa not responding Sept 19 20:13:17 bruch nfs: server zappa OK
These are the necessary rules that teach logpecker to recognize the messages; this has to go into the file "quicktour" in your rule search path.
group nfs.server new crit tag nfs prio kern.* match server $server not responding name $server.no-response new ok tag nfs prio kern.notice match server $server OK name $server.no-responseThis defines two rules that match these messages. The first creates a "critical" incident, the second tells logpecker that the other incident is now resolved. The following stripped-down configuration tells logpecker to use this rules to create a "ticker report":
report quicktour-ticker { type ticker; file /var/log/quicktour-ticker; }; process { rules { quicktour; }; reports { quicktour-ticker; }; };
Now here is an example log (the first number is the line number just for reference)
01 Sept 19 20:13:15 bruch nfs: server zappa not responding 02 Sept 19 20:13:17 bruch nfs: server zappa OK 03 Sept 19 20:18:00 bach nfs: server zappa not responding 04 Sept 19 20:18:01 bruch nfs: server zappa not responding 05 Sept 19 20:18:03 bruch nfs: server zappa OK 06 Sept 19 20:20:00 bach nfs: server zappa not responding 07 Sept 19 20:20:00 bruch nfs server zappa not responding 08 Sept 19 20:20:03 beethoven nfs: server zappa not responding ... many more of these from varying clients ... 09 Sept 19 21:15:00 bruch nfs: server zappa OK 10 Sept 19 21:15:01 beethoven nfs: server zappa OK 11 Sept 19 21:17:03 bach nfs: server zappa is OKIn line 1, an incident with the name "nfs.server.zappa.no-response" is created. Please note that the name of the client is not included in the name since it is irrelevant. The clause "name $server.from.&host.no-response would have created the name "nfs.server.zappa.from.bruch.no-response", you get the idea?
logpecker always waits for a short period (20 seconds, but this is configurable) before an incident is actually reported. This means, that in line 2 it recognizes that the "critical" condition is over, before it even has considered telling anybody about it. So, it is silently dropped.
In line 3, another incident with the same name is created. In line 4, the message would create another incident with the same name, which is "lumped together". After line 5, there is no incident left. Again, nothing has been reported.
At line 6, something really bad has happened. All clients log error messages. Logpecker again lumps these together and waits until 20:20:20; then the grace period is over and the incident is reported; at 21:15:00 (line 9), the incident is over.
Line 11 contains a message that is actually not configured (note the unguilty looking word "is"!), and assume it's logged with facility/priority kern.notice. Software is that way, sometimes the log message just looks a little bit different for no apparent reason.
Sept 19 20:20:00 crit nfs.server.zappa.no-response bach nfs: server zappa not responding Sept 19 21:15:00 ok nfs.server.zappa.no-response bruch nfs: server zappa OK Sept 19 21:17:03 notice unknown.notice.0.342324 bach nfs: server zappa is OKThis is a report of type ticker. I like to have root-tail throw this on my screen. There are other types, too:
This is probably the best next thing to read, it will tell you how logpecker really digs through the messages in more detail. Please be prepared that this documentation tends to get more and more boring the further you progress.
Also available upon popular demand (but better read the above first):
Or: