• Join our Communities!

  • Twitter Updates

    Error: Please make sure the Twitter account is public.

  • Disclaimer:

    The information in this weblog is provided "AS IS" with no warranties, and confers no rights. The opinions and interests expressed on this employee blog are the employees' own and don't necessarily represent EMC's positions, strategies or views. Inappropriate comments will be deleted at the authors discretion.
  • Advertisements

Big Data: Big Value or Big Trouble?

Like “the Cloud”, the term “Big Data” has many different definitions.  But no matter how you define it, Big Data is not a fad.

Some use the term to denote the incredible variety, velocity and volume of data that we are creating and using every day.  (Here is a very interesting infographic on that point).

Others use the term to represent huge data sets from which we can intelligently extract useful trends and business information.  In fact, the promise of Big Data is not just the ability to mine data for sales purposes, but also for customer and employee sentiment, and even the idea of “predictive compliance”.

Regardless, as with the Cloud, there is enormous potential value in Big Data — but there are also costs and risks that need to be weighed in the process.  Among these are the eDiscovery and security risks associated with keeping a significant amount of data past its (normally) useful life.  Our friend Barclay Blair has published some interesting thoughts on Big Data, the law and eDiscovery.

As in so many other areas, business will drive the need for big data initiatives; but compliance and legal need a voice in the process to adequately cover potential risks and issues.


eDiscovery and Sharepoint

I am consistently surprised that the eDiscovery of Microsoft Sharepoint repositories does not strike more fear into organizations.  Sharepoint is complex, contains different types of documents/objects, can have rich metadata and is a key repository for business content.  Yet most organizations that we talk with state that they are not concerned with their ability to handle eDiscovery work on Sharepoint sites.

There are several potential reasons for this hands-off attitude:

– There are no significant reported cases where a party was sanctioned for failing to properly preserve or collect content from Sharepoint.  I did some of my own research in a few eDiscovery caselaw databases, and none of my searches located the word “sharepoint” in connection with a sanctions motion;

– Few litigants seem to be asking for Sharepoint content during discovery.  (Of course this is not a valid reason for organizations to ignore it.  The duty to preserve and produce ESI is not tied to whether the other party asks for the content.  But in reality, if both sides bury their heads in the Sharepoint sand, then no one knows whether relevant content is being ignored).

– Most organizations lack the tools and capabilities to discover from Sharepoint, at least beyond basic Office documents that might be stored in a site.  Whether Legal is aware that IT is not undertaking discovery of Sharepoint sites is a good question to ask.

What makes Sharepoint more complex than a fileshare, at least in eDiscovery?  Many different types of content can be stored in a site:  documents, email messages, OneNote files, webpages, community posts, microblogs, Lync IMs, and more.  Not all of this content is readily accessible, so eDiscovery teams may have difficulty in locating relevant content.  Even when found, the preservation and collection of that content can be difficult.

Metadata in eDiscovery is often a misunderstood issue, and Sharepoint has a lot of metadata.  For example, each user can define a set of metadata tags for use with documents.  This information is arguably not relevant in many cases, but it may be useful or important in locating relevant documents.  And since one cannot rule out relevancy before a case even begins, organizations need a plan to capture this information when necessary.

A more advanced but still important concern is with authentication and admissibility of the Sharepoint content.  The creator of a document can often be difficult to determine, even on a fileshare where the “owner” of that document may be clear (based on the directory structure).  In Sharepoint, the situation can be far murkier due to its collaboration capabilities.  For example, multiple parties may have contributed to a document but the identified owner and creator may not be part of that group.  (For some great background on these issues, download The Sedona Conference Commentary On ESI Evidence & Admissibility).

What can you do?

– Legal and IT should get together to discuss the organization’s Sharepoint deployment and determine whether it is (or should be) on the Data Map; and if so, how content can best be located, preserved and collected when necessary.  Microsoft has added some eDiscovery capabilities to Sharepoint 2013 but whether those features are sufficient, and how to handle prior versions of Sharepoint, remain a concern;

– The organization should consider (now!) policies relating to the retention of Sharepoint content.  This is a great step to take before the situation becomes too difficult to handle because Sharepoint adoption tends to grow very rapidly.

Machine Learning For Document Review: The Numbers Don’t Lie

Jim Shook

Jim Shook

In light of Magistrate Judge Andrew Peck’s recent decision in Da Silva Moore v. Publicis, much has been written and discussed about the idea of using machine learning techniques to automatically classify documents during review, a process sometimes known as “predictive coding” or even “computer assisted review”. (Although these terms may actually imply different technologies and processes this article adopts Judge Peck’s umbrella use of the term “predictive coding”). This article explores some of the key issues around this promising intersection of law and technology.

What Is Predictive Coding? How is It Used?

At a simple level, predictive coding is just a technological “lever” that allows a (relatively) small amount of review work – usually by humans — to be leveraged across a much larger set of documents. Let’s say Continue reading

Getting Legal to Support Your Email Management Project

Electronic Archives are one of the least understood – and yet one of the best – technologies available to the enterprise for improved operations, compliance and eDiscovery.  Yet while most IT Jim Shookprofessionals are familiar with the benefits of Email Archiving, many see only the operational improvements that an archive can bring.  So when they need to enlist in-house counsel’s assistance to approve the policies for the archive, they often miss the benefits to the legal department, making it more difficult to convince legal to help.  In fact, discussing the benefits of the archive is a critical step.  Many lawyers still misunderstand the purpose of Email Archiving, incorrectly viewing it as a tool to save everything forever – something they are almost always against.

If you’re having difficulty getting legal on board with your archiving project (or if you’re a lawyer and want to better understand how archives can help you), here are three significant areas that are improved with a good email archive deployed as part of an overall Email Management initiative.

Electronic Discovery

Electronic discovery is the process of identifying, holding, collecting, analyzing and producing electronic stored information (“ESI”) to meet the requirements of litigation, investigation or open records / FOIA requests.  Email messages are the most frequent – and arguably the most important – locations for ESI.  Email is also one of the most expensive and risky sources of ESI because most companies do not effectively manage their email.  This often forces enterprises, under the risk of sanctions for deleting data that is relevant to a lawsuit (a penalty known as “spoliation”), to search in virtually unlimited locations for email and then to process and review huge volumes of messages.  Some enterprises have no established process for eDiscovery and are forced to retain backup tapes of email servers and fileshares as a stopgap measure, at enormous risk and expense.  Worse still – some companies simply pretend to meet their obligations through a quick search of a few mailboxes on the email server, knowing that email is stored in other locations they cannot efficiently search, and then cross their fingers to hope for the best.

Almost all of this difficult and risky process can be avoided with an effective Email Archive operating as part of an overall Email Management initiative.  With a good archive, the enterprise’s eDiscovery team can quickly search through just one location for all email, substantially reducing cost and risk.  An effective Email Management program can further cut downstream cost and risk by enabling the defensible deletion of email messages that do not need to be retained.  (If you have not already begun, you will also want to consider how you handle other ESI repositories for eDiscovery).


Good compliance programs today include processes related to the company’s electronic information, especially email.  Companies of any size or reach are subject to anti-corruption legislation such as the Foreign Corrupt Practices Act (FCPA) in the US and the UK Bribery Act.  Regulators also place demands on ESI retention and review, in addition to normal records retention requirements.  And regardless of whether we like it, email is a location where many of our records are received and maintained.

A centralized archive for email – with the ability to enforce company mandated retention policies – can be a big win for compliance. An effective email sampling process can help to insure compliance with high-risk requirements like the FCPA, UK Bribery Act and even Sarbanes-Oxley.  An archive with user-directed archiving capabilities enables a strong foundation for complying with records management and regulatory retention requirements.  Similarly, with these controls in place, the enterprise can feel more comfortable with deleting expired content, knowing that it’s not subject to any further retention requirements.  And for concerns on privacy and sensitive data, an issue that grows each day, an archive can help to insure that sensitive data does not leave the company’s firewalls without being encrypted.


Companies without archives often retain all of their email on their server, and the archive will drive savings through reducing top-tier storage requirements, shrinking backup windows and sizes, and substantially improving the efficiency of the email servers.  Companies with mailbox size quotas have different issues – and with an archive they can quickly move to eliminate local email caches (typically PSTs or NSFs) that are unmanaged, insecure and can lead to disaster in eDiscovery matters.  Users can have virtually unlimited mailbox sizes with no noticeable impact on their day-to-day work – even when working remotely or on an airplane.

Although operational improvements may not be your legal department’s main focus, your lawyers want the company to be more efficient, and helping them to understand these improvements is also an important step.

What’s Next

If your enterprise does not yet have an Email Management initiative, get legal and IT together to talk about the benefits.  You will need help from legal in drafting and authorizing appropriate policies.  (In determining best practices for policies and archiving, your legal counsel might be interested in The Sedona Conference’s guidance on Email Management).  If you have already started, check to make sure that legal fully understands the benefits, has provided appropriate retention policies, is actively part of an efficient eDiscovery process, and that someone is verifying that users are maintaining information subject to regulatory frameworks.

Adversary Case Assessment: Putting Your ESI To Good Use

In eDiscovery, we tend to focus most of our attention internally, on our own electronically stored information (ESI).  This makes sense because the data is under our control, and if we cannot get this work done properly, we significantly raise the risk (and cost) of handling eDiscovery.

But what about the other side – what should we do when the other parties in litigation produce their ESI to us?  This is an issue that seems to be discussed very little.  Most companies just have their outside litigation counsel handle this data – but that’s what most of us did just a few years ago with our own ESI.  For companies using an eDiscovery solution for in-house collection and early case assessment, shouldn’t there be a matching process for the data received from other parties?

ACA – Adversary Case Assessment

There’s a lot of value that can be derived from analyzing the other side’s ESI, especially when it is juxtaposed against our own data.  If you plan ahead in your eDiscovery process, you can insure that you’re able to “view” the data in a few different groupings – your data; their data (by party if there’s more than one) and together.  Let’s look at some of the leverage that we can get from using our in-house solution in this manner.

File types.  How many different ESI file types did the other side produce?  In most cases, you should expect a good mix of email, spreadsheets, “productivity” files such as Microsoft Office, Excel and Powerpoint, image files (e.g. jpg/gif) and maybe even various log files, possibly in text form (.txt, .log, etc.).  You might probe a little more deeply:  did they produce any NSF or PST files (the local caches of email that many users keep on their desktop or fileshares)?

If you didn’t receive at least a few items representing these file types – why not?  There may be good reasons – you may have agreed to limit eDiscovery, maybe none of those file types contained relevant information, etc.  But ask the question – first of yourself, and then, if necessary, of the other side.  In many cases, parties frequently focus on email – largely ignoring laptops, fileshares and other repositories of relevant information.  Also, because these files are frequently produced as attachments to emails, it may give the appearance that these repositories were searched.  Thus, run another filter check — are the non-email items just attachments to emails, or were they produced on their own?

Volume.   Overall, does it seem like a fair amount of ESI that’s been produced, i.e. does the number of items seem right?  Again, this will vary greatly from case-to-case but you should have a good idea of how much “stuff” you are receiving.  Back in the paper days, we might question the other side if we produced a warehouse of boxes and they sent us a slim manila folder.  How does their production compare to your production?  Better yet – start to filter the produced ESI by custodian.  Is there a significant amount of information produced from key players?   How does it compare to your key people?  Interactive charts and graphs can go a long way here in helping you to understand what you’re seeing.

Date ranges.   Take a look at the date of the information and see how the volume of information varies over time.  Email will normally be grouped by its date, but files could be grouped by date of creation, modification or last access date.  Is there a high volume of information during the time that you would expect to be most relevant?  What items, in each file type category, are the oldest and most recent by date – and does that fit announced data retention policies and the scope of eDiscovery?  Do the dates and volumes fit with your understanding of the case?   Do this work first by using filters to exclude your data, and then include your own for a second review.  How much does that change the picture, if at all?  Does the other side seem to think that a different range of dates is more important than you did?

Email Domains.   Look at all of the email domains (e.g. emc.com, cnn.com, espn.com) that are represented in the production as either senders or recipients.  Are there any “new” companies of interest?  Maybe there’s a third party show in email that could have important information available by subpoena.  Did the other side include any information sent to or from their law firm?  If not, was every item really privileged — and did they produce a privilege log?

Email Threading.  Because of its nature, email can be “threaded” into conversations so that you can view a nicely ordered chain of emails that has gone back and forth between parties.  Even one or two message “side conversations” became very noticeable when a group of emails has been properly threaded.  Using your own key email messages as a starting point, thread the messages to include the other side’s production.  Are there new “back channel” or side conversations that the other side held internally, which you never saw?  Were key messages re-forwarded well after the fact  – say weeks or months later as “reasonable anticipation of litigation” began to occur?  Did you receive another copy of emails representing conversations with the other party (which you already produced) – or did they not produce those messages (and if not – why not?).

Wrapping Up

These are just a few very basic ideas of how you can begin to evaluate the other side’s ESI production.  Leveraged properly, in-house eDiscovery solutions can be another powerful tool for corporate (and law firm) counsel to rapidly get their arms around a case and begin to evaluate the other side’s production, too.  Happy ACA-ing!

Are you moving your data smartly?

Bryant Bell, eDiscovery Expert, EMC Information Intelligence GroupIn my last posting I wrote about what you can do to protect your company assets if you decide to move your ESI (electronically stored information) into the cloud. I pointed out that you should be sure that your cloud provider adheres to or is at least aware of US – EU Safe Harbor. This has been a topic of concern for multinational or at least transatlantic corporations. But now with the advent of the cloud your data could be stored in Dublin, Ireland or Stuttgart, Germany even though you may be a medium-sized business in Laredo, TX. The cloud will now essentially force you to start thinking about your data as if you were a multinational even if your business doesn’t expand past Texas. This is because you have now tossed your ESI into the cloud and it will reside in any country your provider finds fit. So as you take that “Journey to the Cloud’ I want to share some suggestions from Greg Buckles from the Discovery Journal, http://ediscoveryjournal.com/2011/06/moving-your-esi-to-the-cloud/

You need to understand and ask the questions to your cloud provider about the basic infrastructure and data flow process that your ESI will experience:

  • How is it transferred to the cloud?
  • Where does it physically reside?
  • Is it transformed for storage?
  • How is it kept separate from other customers?
  • Does the company own all the infrastructure outright?
  • What is the disaster recovery or co-location arrangement?
  • What are your guarantees on uptime, accessibility and Service Level Agreements (SLAs) for issues?
  • What are the company policies on data privacy, subpoenas and security?
  • How can your ESI be accessed, searched and retrieved?
  • What are reasonable restoration rates for retrievals?
  • Is there an established migration/transfer mechanism in case you want to change providers?

From a regulatory, internal investigation and litigation perspective, the points to pay particular attention to are: Where does your data reside, Company policies on data privacy, subpoenas and security, and how can your ESI be accessed, searched and retrieved?

Moving to the cloud may be inevitable but just make sure you have a plan and are taking safeguards.

The Ghosts of eDiscovery Past, Present and Future

This is the time of year when many make predictions for 2011.  But while we try to look forward, the reality is that as an industry, we have not yet conquered our eDiscovery challenges from 2010 – or even 2009 or earlier!  In the spirit of the season and with a nod to Charles Dickens’ A Christmas Carol, I decided to take a Scrooge-based approach to eDiscovery.  Without further ado, I present the ghosts of eDiscovery Past, Present and Future.

eDiscovery Past

In the early days of eDiscovery, even before the amendments to the FRCP in December 2006, we all made plenty of mistakes as we learned about this challenging new area.  Many of our problems resulted from collecting and preserving electronically stored information (ESI) from backup tapes; artificially segmenting the eDiscovery process into three stages known informally as “collect stuff”, “throw stuff over the wall” and “review stuff”; and pretending that eDiscovery either was a passing fad, or just could not be as difficult as we had heard.

While the list of mistakes and challenges from the past is virtually limitless (see Ralph Losey’s recent blog entry on this issue), many of these mistakes really boiled down to a few fundamental issues:  a lack of coordination and communication between Legal and IT (and Records Management or “RM”); and a lack of basic knowledge on IT systems from people working in legal roles.

If these ghosts of eDiscovery past continue to plague you, next year resolve to:

  • Have your legal team learn at least the basics about your IT infrastructure;
  • Insure that Legal, IT (and RM) coordinate, communicate and interact on a regular basis; and
  • Have a basic plan, prepared in advance, for what to do when eDiscovery hits.

eDiscovery Present

Over the last year, we continued to struggle with the concept of when sanctions should be awarded for eDiscovery blunders, and how we should determine the severity of those sanctions.  In fact, these are such difficult issues that there is currently disagreement even within the same jurisdiction (compare Pension Committee of the University of Montreal Pension Plan, et al. v. Banc of America Securities, et al., 2010 WL 184312 (S.D.N.Y. Jan. 15, 2010) (Amended Order) with Orbit One Commc’ns, Inc. v. Numerex Corp., 2010 WL 4615547 (S.D.N.Y. Oct. 26, 2010)).

But there were several other trends that rang through loud and clear.  One of the clearest trends is that there is significant risk in relying upon employees to preserve and collect their own data for eDiscovery.  (See our “Weekend At Bernie’s” post).  While there is still no absolute prohibition, the problem with “custodian-based eDiscovery” is that employees can be self-interested or uninterested in a case, making it risky to assume that they will do what they are asked.  Even for those who are sufficiently motivated, many will still fail because they are under-educated on both legal and IT issues.  This makes it exceptionally difficult for them to determine what ESI should be retained as relevant to a case, and how to properly find and preserve that ESI.

Another clear trend is that unintentional – and even seemingly minor and understandable—eDiscovery blunders can cascade into prejudicing a case and result in severe sanctions.  (See Harkabi v. Sandisk Corp., 08 Civ. 8203 (WHP) (S.D.N.Y. Aug, 23, 2010).

A trend that has been around for a while, but seems to finally be gaining momentum, is enforcing the point that litigation holds do not begin upon receipt of the first Request For Production of Documents, or even upon being served with a Complaint.  Instead, the hold duty attaches when one can reasonably anticipate litigation, which typically occurs before the data of service (and for plaintiffs, will certainly occur before filing the Complaint).  Courts are beginning to take a closer look at when a party’s preservation process actually began, so companies need to get legal informed about litigation threats so that decisions on holds can be made at the right time.

If these ghosts have the chance of haunting you, next year resolve to:

  • Rely more upon your eDiscovery team of investigators and counsel, and arm them with useful technologies to complete their work.  Merely hoping that your employees are handling the preservation and collection of critical ESI is no longer a viable option;
  • Review your eDiscovery processes to insure that litigation holds are integrated into your business processes.  This will insure that holds can be recognized at the appropriate time and not just after litigation has already commenced.

eDiscovery Future

There are two main roads that the ghost of eDiscovery Future can take.  The first is the obvious road of emerging and future technologies.  For 2011, emerging issues will clearly include the Cloud and social media technologies such as Facebook and Twitter, and we will certainly see some new technologies that we have not yet even worried about.

The second road in the future is more sinister, and relates to issues that we should already be aware of but have failed to adequately address because they have not yet risen to the right level.  These issues are actually riskier because we should be prepared, and mistakes with these technologies may not be viewed in a forgiving light because we should know better.  As a few examples, this group would include legal issues around international data privacy, data stored in Sharepoint repositories, and structured databases.

It is difficult to predict what you should do about the ghosts of eDiscovery Future, but consider a few possible resolutions for the new year:

  • At minimum, update your ESI Map to include basic information about data that may be outside your firewall (such as outsourced Email and other Cloud technologies, Facebook, Twitter, etc.);
  • If you transact business outside the U.S., understand the basics of privacy law and determine whether and how they may impact you in normal litigation matters; and
  • Subscribe to a publication that will keep you updated on the latest legal and technology developments (Law Technology News and its Daily Alert are terrific, free resources).

Good luck in 2011!