Mole is a log analyzer with parse your logs file (any kind of log), using specified definitions (usually as regular expressions) and magically interpret some fields (numbers, dates ...). Mole provide you a set of functions to analyze that data.
Just as usual for each python package:
pip install mole
In this example we will use an access log file generated by apache (or any other HTTP server). Let’s suppose that this file is located in /var/log/apache/access.log.
Don’t worry about log rotations, mole can handle it.
Edit the /etc/mole/input.conf, just adding
[apache_log] type = tail source = /var/log/apache/access.log
We are defining a new input called apache_log, of type tail (that means that we read the new lines in the file when written and handle rotate logs), pointing to our log file in /var/log/apache/access.log
Edit the /etc/mole/index.conf, just adding
[apache_log] path = /var/db/mole/apache_log
We are defining a new index. The index is the mole database where logs will be stored in a proper format, so we can perform faster searches.
$ mole-indexer -C /etc/mole $ mole-seeker -C /etc/mole
For example, get the top IP addresses which requested more traffic
$ mole 'input apache_log | sum bytes by src_ip | top'
The mole pipeline is the responsible to read log items from a source, process then (and transform them if required) and, finally, return an output. If output is not explicitly defined, use the best output format for current console (serialize in network, just an printf in console).
There are a few components which are interesting to know:
input: The input are the responsible to read the log source, sources can be of different kinds, such normal files, network stream, index file and so on.
plotter: The plotter main function is to split the source in logical lines. In a normal log file, each line in log is usually a new log entry, but some other logs could be use a couple of lines to define the same logical entry (i.e. java exceptions are usually in a number of lines).
parser: Once the logical line is got, you need to known what is the meaning of each field. The parser just assign names to fields using regular expressions for that.
actions: The actions are transformations, filters and in general any other action to take over the log dataset.
output: The output just encapsulate the results of the actions in a human (or machine) readable form. You can think the output as some kind of serialization.
So, the final pipeline in mole is something like that:
<input> | <plotter> | <parser> | <action> | <action> ... | <output>
Mole is composed by three different daemons (for now):
mole: is the client which can query the mole-seeker.
To start mole, you need to configure the server. You have an example in the configuration directory of the source code. The configuration directory will contains one file per mole component.
Once your server is configured, start both mole-indexer and mole-seeker.
Finally perform your query using mole.
Into the configuration directory, you can find a different file per each mole component, i.e:
index.conf for set up indexes. The indexes are special stpra
Count the lines of a input (in this case the input will be an access_log of apache server):
$ mole 'input apache_log | count *' count(*)=3445
Perform the same query, but grouping by source ip:
$ mole 'input apache_log | count * by src_ip' src_ip=127.0.0.1 count=121 src_ip=192.168.0.21 count=1203
Calculate the average transfer size in apache log, sorted by URL and get only the top three:
$ mole 'input apache_log | avg bytes by path | top 3' path=/ avg(bytes)=12343 path=/login avg(bytes)=6737 path=/logout avg(bytes)=2128
Search for an expression and count occurrences:
$ mole 'input apache_log | search path=*login* | count *' count(*)=3838
The Mole code is stored in github, and you can download it using git, as usual too:
$ git clone git://github.com/ajdiaz/mole
The basic design of mole is a linear pipeline which includes, the following components:
Inputs can be normal files (or tails of files) or special files called “indexes”. An index contains the raw data plus time pointer.