Mole: A flexible operational log analyzer.

Mole is a log analyzer with parse your logs file (any kind of log), using specified definitions (usually as regular expressions) and magically interpret some fields (numbers, dates ...). Mole provide you a set of functions to analyze that data.

Installation

Just as usual for each python package:

pip install mole

Getting started

In this example we will use an access log file generated by apache (or any other HTTP server). Let’s suppose that this file is located in /var/log/apache/access.log.

Note

Don’t worry about log rotations, mole can handle it.

1. Configure mole

Edit the /etc/mole/input.conf, just adding

[apache_log]
type   = tail
source = /var/log/apache/access.log

We are defining a new input called apache_log, of type tail (that means that we read the new lines in the file when written and handle rotate logs), pointing to our log file in /var/log/apache/access.log

Edit the /etc/mole/index.conf, just adding

[apache_log]
path = /var/db/mole/apache_log

We are defining a new index. The index is the mole database where logs will be stored in a proper format, so we can perform faster searches.

2. Start daemons

$ mole-indexer -C /etc/mole
$ mole-seeker -C /etc/mole

3. Enjoy some searches

For example, get the top IP addresses which requested more traffic

$ mole 'input apache_log | sum bytes by src_ip | top'

Understanding Mole Components

The mole pipeline is the responsible to read log items from a source, process then (and transform them if required) and, finally, return an output. If output is not explicitly defined, use the best output format for current console (serialize in network, just an printf in console).

http://yuml.me/diagram/scruffy;/class/[element]++-0..*%3E[input],%20[element]++-0..*%3E[index],%20[element]++-0..*%3E[parser],%20[index]-%3E[schema]

There are a few components which are interesting to know:

input: The input are the responsible to read the log source, sources can be of different kinds, such normal files, network stream, index file and so on.

plotter: The plotter main function is to split the source in logical lines. In a normal log file, each line in log is usually a new log entry, but some other logs could be use a couple of lines to define the same logical entry (i.e. java exceptions are usually in a number of lines).

parser: Once the logical line is got, you need to known what is the meaning of each field. The parser just assign names to fields using regular expressions for that.

actions: The actions are transformations, filters and in general any other action to take over the log dataset.

output: The output just encapsulate the results of the actions in a human (or machine) readable form. You can think the output as some kind of serialization.

So, the final pipeline in mole is something like that:

<input> | <plotter> | <parser> | <action> | <action> ... | <output>

Daemons

Mole is composed by three different daemons (for now):

mole-indexer: is the responsible to get the log files and index it,
using an index back-end (just whoosh right now).
mole-seeker: is the daemon responsible to lookup into the index,
receiving queries using TCP port.

mole: is the client which can query the mole-seeker.

Running

To start mole, you need to configure the server. You have an example in the configuration directory of the source code. The configuration directory will contains one file per mole component.

Once your server is configured, start both mole-indexer and mole-seeker.

Finally perform your query using mole.

Configuration

Into the configuration directory, you can find a different file per each mole component, i.e:

input.conf for configure inputs. An input is a reader over a file,
a network stream or everything else that can use to retrieve data to be analyzed.

index.conf for set up indexes. The indexes are special stpra

Examples

Count the lines of a input (in this case the input will be an access_log of apache server):

$ mole 'input apache_log | count *'
count(*)=3445

Perform the same query, but grouping by source ip:

$ mole 'input apache_log | count * by src_ip'
src_ip=127.0.0.1 count=121
src_ip=192.168.0.21 count=1203

Calculate the average transfer size in apache log, sorted by URL and get only the top three:

$ mole 'input apache_log | avg bytes by path | top 3'
path=/ avg(bytes)=12343
path=/login avg(bytes)=6737
path=/logout avg(bytes)=2128

Search for an expression and count occurrences:

$ mole 'input apache_log | search path=*login* | count *'
count(*)=3838

Development

The Mole code is stored in github, and you can download it using git, as usual too:

$ git clone git://github.com/ajdiaz/mole

Design

The basic design of mole is a linear pipeline which includes, the following components:

  • The input, is the responsible to read the data source byte-to-byte (or line to line, but it’s agnostic to the format).
  • The plotter, which breaks the logical lines of the input. A logical line can be a text line or a number of text lines or a binary block.
  • The parser, is the responsible to get fields into the lines, for example using a regular expression or a comma separated pattern.
  • The actions, which are a number of transformations over the fields.

Inputs can be normal files (or tails of files) or special files called “indexes”. An index contains the raw data plus time pointer.

Bugs, feedbacks, comments et spam

To open bugs or enhanced proposals, please use the github issues tool. If you have any suggestions, do not hesitate to contact me.

Mole is a log analyzer with parse your logs file (any kind of log), using specified definitions (usually as regular expressions) and magically interpret some fields (numbers, dates ...). Mole provide you a set of functions to analyze that data.

Share


Feedback

Feedback is greatly appreciated. If you have any questions, comments, random praise, or anonymous threats, shoot me an email.

Useful Links

Fork me on GitHub