InsideTech » July 2008

E-Discovery

Technology

Dispelling Doubts about De-Duplication

The practical implications of de-duplication can have a significant impact on your litigation costs.

No one likes to duplicate work. In the e-discovery world, the daunting term of “de-duplication” embodies the act of removing duplicate documents from a collection of data to be reviewed. It would be a waste of time and money for someone to review five exact copies of an e-mail or document.

While the concept sounds simple enough, the practical implications of de-duplication can have a significant impact on your litigation costs.


De-Duplication Basics

The term “de-duplication” is not new. The concept has been used in the IT world for many years as a convenient and resourceful way to conserve precious storage space.

Consider this example: You receive a memo attached to an e-mail from your outside law firm. You need to respond, but first you forward the memo to 4 or 5 co-workers for their feedback. Each person you forward it to now has a copy of that memo in their e-mail. Instead of backing up 4 or 5 duplicate copies of that memo, your IT department likely uses a process to identify the duplicates so they only back up one copy of that file while maintaining an index of where each file was found. The index allows the IT department to restore that file for each user if that is ever necessary … in response to a production request, for example.

To compare two (or more) documents, each file is assigned a unique identifier based on its content. This identifier is called a “hash” value probably because it looks like an indecipherable mess of letters and numbers. The hash value is generated from a precise mathematical equation, so only two completely identical files will have the same hash value. If a comma or space is added to one document, the hash value of that file will be completely different. (For more information on hash values, see the e-Discovery Team blog.

E-discovery vendors apply this technology when they process electronically stored information in preparation for review. This process can be invaluable in culling down a large document collection, and obviously saves time on the actual review. The amount of data by which the de-duplication process shrinks the relevant universe of documents to can be surprising.


Duplicates on the Horizon

De-duplication is usually applied in two different scenarios. The first is “vertical” de-duplication, sometimes called “custodian de-duplication.” In this scenario, the de-duplication technique is only applied to a single custodian’s data collection. This would catch two duplicate documents that existed on that person's computer hard drive, but it would not eliminate duplicates of that same document that existed on other people's computers.

Vertical de-duplication is commonly used with backup tapes. If an individual kept an e-mail or a document on their system for 6 months, and there were 6 monthly backup tapes, then de-duplication would eliminate 5 redundant copies of that e-mail or document.

Advanced
lawyer network powered by www.martindale.com