RSS Feed
2013-11-19 16:17:09 UTC
Ron

When writing system scripts, sometimes you need a bit more power than normal; why not use Haskell to help out? I wanted to count how many unique IP addresses have accessed latermuse.com according to my access_logs, so I wrote a quick Haskell program to help facilitate my needs. I will take you through my program and talk about how it works in detail.

Lets start by popping out our imports and naming our module Main.

> module Main where
> import qualified Data.Set as S
> import Control.Applicative

We know that each line in our input will start with an IP number, then be delimited with a space. This makes things easy. We can just run the 'words' function over the string and take the 'head' of that to extract the IP address.

> parseIP = head . words 

Since 'getContents' lazily reads from stdin and continues reading until EOF, we can use 'calculate' to split the content from stdin into lines, then parse each line for the IP address. After grabbing the IP address, 'S.fromList' throws the IP address into a Set. Each member of a Set needs to be unique, so grabbing the size of our set will give us the total amount of unique IP addresses.

> calculate = S.size . S.fromList . map parseIP . lines

Using Control.Applicative's sequential application function (<$>), we can send our input directly into 'calculate'. We use (=<<) to bind the output from 'calculate' into 'print'. 'Print' will pipe the final output of our program into stdout, displaying it in the terminal.

> main = print =<< calculate <$> getContents

Now we just run some quick commands in the terminal, and we can quickly count how many unique IP addresses have been logged as visitors to our website.

[latermuse httpd]# ls
    access_log    access_log.3  error_log.1  error_log.4
    access_log.1  access_log.4  error_log.2  analyzelogs.hs
    access_log.2  error_log     error_log.3

[latermuse httpd]# wc -l access*
    303787 access_log
    754845 access_log.1
    750058 access_log.2
    810321 access_log.3
    837264 access_log.4
    3456275 total

[latermuse httpd]# grep "latermuse.com" access* | runhaskell analyzelogs.hs
209

So there you have it. According to my access_logs, there were 209 unique IP addresses that have accessed latermuse.com.

You can grab the source code used in this post by clicking here.

View comments on Reddit

Archives