Haskell Do-Its: Fetch email from IMAP

This is, what I hope, will be the first in a line of posts about building things with Haskell. The plan is to try and write a small program that does something using Haskell and Haskell libraries. This post will specifically be about fetching messages from IMAP and parsing them. Since I’m just a neophyte Haskell hobbiest myself, I’ll be learning as I go. Keep that in mind if you plan of using any of this in your own code.

This post is written in Literate Haskell, which means the source file is actually an executable Haskell program. I’m simply annotating it with all of blog post fluff.

Let’s start off with some language pragmas. This just tells Haskell to use some language extensions that aren’t part of the default.

{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE OverloadedStrings #-}

Then comes our module declaration. Since this is going to be executable, This module will be named Main. I could leave the module declaration off.

module Main where

Next we have our imports. I’m going to be very specific about what I import, if only to make it easier for me to learn where functions live.

import System.Environment (getEnv)
import Control.Monad (forever)
import Control.Concurrent (threadDelay)
import Data.Text (Text)
import Data.Text.Encoding (decodeUtf8)
import Data.List (find)
import Data.Maybe
import Network.HaskellNet.IMAP.Types (UID)
import Network.HaskellNet.IMAP ( SearchQuery (ALLs)
                               , login
                               , select
                               , search
                               , logout
                               , fetch
                               )
import Network.HaskellNet.IMAP.Connection (IMAPConnection)
import Network.HaskellNet.SSL (Settings (..))
import Network.HaskellNet.IMAP.SSL ( connectIMAPSSLWithSettings
                                   , defaultSettingsIMAPSSL
                                   )
import Codec.MIME.Type  (MIMEValue (..), MIMEParam (..))
import Codec.MIME.Parse (parseMIMEMessage)

import qualified Data.Text.IO as TIO
import qualified Data.Text as T

This is the signature of the main function (which drives our program). Basically, all Haskell programs boil down to a function doing IO.

main :: IO ()

Since our program will poll an IMAP server for email, we want it to run forever. Conveniently, Haskell has a function for this.

main = forever $ do

Next we establish our connection to the server. This example uses Gmail, so we need to connect over SSL.

  conn <- connectIMAPSSLWithSettings imapServer imapCfg

Then we authenticate to the server. The username and password functions are pulling our credentials from the environment, which is in IO, so we need to unwrap those values before we can pass them to login, which is only expection Strings.

  user <- username
  pass <- password
  login conn user pass

Now that we’ve established a connection, we are going to grab a list of messages from the INBOX. We select the INBOX, and then we use an IMAP query to fetch a list of UIDs. These are ids that uniquely identify messages for our current imap session. We’ll use the UIDs shortly to fetch the actual message content.

  select conn "INBOX"
  uids <- search conn [ALLs]

This next line of code is just mapping over the uids. It’s structure is simply map _some_function_ uids. The some_function in this case is a composition of three functions. The function composition does quite a few things, but from the names, its pretty easy to see what’s going on; reading the compostion “backwards”, it is fetching the message over the imap connections, grabbing the message id out of the message, and then puting the message to standard out.

  mapM_ (putMessageID . getMessageID . fetchMessage conn) uids

When we are done with our work, we logoff this connection.

  logout conn

And print out a message so we know when things are happening (maybe we don’t have a high volume INBOX).

  putStrLn "Fetch complete"

This last line puts our program thread to sleep for a minute. When it wakes up, it will poll the IMAP server again.

  threadDelay (10^6 * 60)

fetchMessage grabs the entire message from the server and converts it to Text, assuming the bytestring is UTF-8 encoded. The imap library actually has functions for pulling down just headers, or subsets or headers, or whatever, but let’s just assume that we have some grand plan that requires the entire message.

fetchMessage :: IMAPConnection -> UID -> IO Text
fetchMessage conn uid = do
  content <- fetch conn uid
  return $ decodeUtf8 content

Every email message should have a Message-ID header that uniquely identifies that message in the universe of all email. getMessageID parses the message content and then hands it to messageID, which extracts the Message-ID content from the headers.

I use a compostion trick later with >>=, but I couldn’t figure out how to make that work with the types here. That’s why I’m just using the ‘do’ syntax to unwrap the content from IO. ¯_(ツ)_/¯

getMessageID :: IO Text -> IO Text
getMessageID raw = do
  content <- raw
  return $ pluckMessageID (parseMIMEMessage content)

pluckMessageID looks through the parsed email message for the Message-ID header. When it finds it, it returns it. If there isn’t a Message-ID header, it returns a dummy value message. This is what the Maybe type provides; a data type for representing a computation that may not return a value. It is actually implemented in terms of the more general pluckHeaderValue.

pluckMessageID :: MIMEValue -> Text
pluckMessageID = pluckHeaderValue messageIDHeader

messageIDHeader :: Text
messageIDHeader = "message-id"

pluckHeaderValue :: Text -> MIMEValue -> Text
pluckHeaderValue headerName MIMEValue{..} =
  valueOrDefault $ find headerMatch mime_val_headers
  where
    headerMatch :: MIMEParam -> Bool
    headerMatch (MIMEParam headerName' _) = headerName' == headerName

    valueOrDefault :: Maybe MIMEParam -> Text
    valueOrDefault Nothing = T.concat ["No ", headerName]
    valueOrDefault (Just (MIMEParam _ value)) = value

putMessageID just spits the message out to standard out. It has to be called in such a way that it can handle the fact that our Text was the result of an IO operation. That’s what the >>= is all about; it basically unwraps the text from the IO (someone will HATE that description)

putMessageID :: IO Text -> IO ()
putMessageID msgID = msgID >>= TIO.putStrLn

Most of the hard work is done. The rest of these functions are simply for binding our configuration to names we can use in the program.

Here we are hard coding our imap server, but you could certainly pull this from the environment or from the command line arguments if you prefer.

imapServer :: String
imapServer = "imap.gmail.com"

I was having trouble connecting to Gmail until I copied this configuration from the example code. We’re using this for our configuration, rather then just the default settings.

imapCfg :: Settings
imapCfg = defaultSettingsIMAPSSL { sslMaxLineLength = 100000 }

As noted earlier, we are pulling username and password from the environment, if these environment varaibles are missing the program will fail with an exception. We’re notedealing with exceptions in this post, so we’re ok with that for now.

Note the method signature; since these values come from the environment, they are IO String values. That’s why we need to process them as part of IO monad before we can pass the String values on. There are probably more Haskell-y ways to do this, but this was most clear to me.

username :: IO String
username = getEnv "IMAP_USER"

password :: IO String
password = getEnv "IMAP_PASS"

There we have it! A small Haskell program that fetches email over IMAP and then parses the content.

There are plenty of ways that we can make this program better. For one thing, I’m sure my Haskell code could be much better. Also, this program isn’t very reliable. If our IMAP connection goes away, for example, the program will crash with a broken pipe exception. Writing a more reliable version of this program would be a good topic for another post, so I’ll leave this code where it is for now.