eDiscovery Update Email file types and how to handle them

By Peter Coons

The Daily Record Newswire

ROCHESTER, NY - OST, PST, MBOX, EML, what the heck!

What do all these acronyms mean? If you are a litigation support professional or an attorney dealing with eDiscovery, you have likely encountered email of various types. You may know that a PST file is an Outlook Exchange mail file, but how does it relate to an MSG and an OST? What about some other common mail files with extensions such as EML and MBOX? How are they handled? Do they have to be converted prior to processing and loading to a hosting platform?

Not all email files are created equal and it is important to understand the differences between each type so they can be handled properly during discovery. This article should help to alleviate some of the confusion.

Outlook and Exchange

Let's first tackle the most common email platform in eDiscovery, Microsoft Exchange.

MS Exchange (EDB) files store the individual mail stores (PST) for custodians. One EDB file may contain one or thousands of PST files. Typically, a litigation support professional would be handling individual PST files and not working with the EDB file. A collection or forensic specialist would usually work with an organization's IT department to extract the PST files for the custodians who are key players in the litigation. However, it is possible to receive an entire EDB and corresponding files that must be loaded into specialized software allowing an operator to extract only the selected PST/Custodians.

PST Files

Once the PST files are extracted they are usually ready to be processed by the ESI processing application. Files within PSTs can be de-duplicated on a global or custodian basis. PST files contain one or hundreds of thousands of messages. It is possible for someone to create PST files that comprise messages for multiple custodians, although this is not typical. A PST file is a container file for multiple messages for one or more custodians or sources. That is the important point.

An archived PST is one that was likely created by the user for organizing messages or reducing the size of the mail file on the Exchange server. For example, a user may create a PST file called MyEmail_2014.PST and move all of their messages from 2014 to this archive. A user can have multiple archived PST files open at one time in Outlook and they can be stored on local drives, USB drives or server shares for backup purposes.

MSG Files

An MSG file is an individual message that was likely extracted from a PST file. It is one record from a larger database email store (PST). In eDiscovery some parties may deliver MSG files instead of an entire PST. This may occur when the selection or culling was done prior to delivery to the vendor.


An EML file is, well, think of this as a stripped down MSG. It was used by Microsoft Express and other clients. EML files contain the body of the message, to, from and can also contain the attachments. Most eDiscovery processing applications can handle EML files natively, which negates the need for conversion or special handling.

OST Files

An OST file is similar to a PST file. It contains messages and metadata. Many ESI processing applications can ingest and process an OST just like a PST. OST, or offline storage, files are typically found on laptops and allow a user to use Outlook without being connected to the Exchange server. When a user goes back online the syncing occurs. The OST and Exchange serve sync and any changes are recorded. It is possible that an OST contains different information from a corresponding PST contained in an EDB if a user has failed to sync the databases.


MBOX stands for mail box, which makes a lot of sense to me. It is a single file that contains multiple mail messages in a linear text sequence with each message prefaced by a separator line and concluding with an empty line. Basically, if you opened up an mbox file in WordPad, you would see messages from top to bottom, each divided by a delimiter. This is different than a PST file, which would be difficult if not impossible to read in a text editor.

Another difference between the two formats is that mbox has separate files for each email folder, which are stored in a single mail directory. For example, one might stumble across inbox.mbox, calendar.mbox or sent.mbox in a folder called "Pete's Mail". In contrast, a PST file contains all the folders, calendar entries, tasks and messages in a single file. Applications that can access mbox files, such as Mozilla Thunderbird, would be pointed at the "Pete's Mail" folder and display all the disparate mbox files seamlessly to the end user.

How widespread is mbox? It is the most common format for storing messages on Unix operating systems (i.e. Linux). But, mbox is found on both MAC and Windows based OS so chances are you have reviewed items from an MBOX mail store. However, it may have been converted to a PST prior to processing and review. While converting mbox files is standard, the possibility exists that some metadata or header information may be omitted or altered.

MAC Mail

I could write an entire article on email found on MAC computers. So, here is my best attempt in as few words as possible.

It is possible to use Outlook on a MAC. However, there is a big difference between Outlook on a MAC and a Windows computer. Outlook for MAC messages are not stored in PST files, rather they are stored as individual messages with the extension of "OLK14message". These messages are stored in a documents folder under the user's profile and attachments are stored separately.

The native email application on MAC OSs is called "Mail" and messages are stored in mbox folders similar to the manner described above (i.e. inbox.mbox), but instead of a singular mail file there are individual messages in the mbox folders. These files have an EMLX extension and attachments are stored separately as well.

Conversion is typically, but not always, necessary when processing MAC messages for review.

My advice if you are dealing with a user who has email stored on a MAC is to ask a lot of questions and find out which client(s) they are using.


All mail files, regardless of format, contain metadata related to the message: To, From, CC, BCC, Subject, Date/Time sent, Body, Attachments, Internet header information, internal message ID, etc.

Those metadata elements are contained within the email files. So one can copy or move these files without altering potentially important metadata elements. Typically, the windows time and date (MAC dates: modified, accessed, and created) associated with the files are not germane to litigation. Remember, the important metadata is inside the file! Another important fact - copying or moving an MSG or PST may change the MAC dates, but it should not alter the hash value.


An EDB contains multiple PSTs and a PST contains multiple MSGs. EML is similar to MSG.

An OST file is an offline PST.

MBOX is a common mail storage file format with messages and other items stored in a single file. MBOX is often converted to PST prior to processing and review.

MAC mail? Ask a lot of questions and find out the client. Really that goes for Windows users too.

All email files contain metadata related to the message: To, From, CC, BCC, Subject, Date/Time sent, Body, Attachments, Internet header information, internal message ID, etc.

Outlook is an email client used to access data stored on an Exchange server or a MAC. However, there are dozens of email clients that can open and utilize various formats.

Outlook is NOT a review tool! I had to throw that in.


Peter Coons is a senior vice president at D4, providing eDiscovery and digital forensics consulting services to clients. He is a Certified Information Systems Security Professional, an EnCase Certified Examiner, an Access Data Certified Examiner, and a Certified Computer Examiner. He belongs to various digital investigation and information security based organizations.

Published: Wed, Jul 06, 2016