• Join our Communities!

  • Twitter Updates

  • Disclaimer:

    The information in this weblog is provided "AS IS" with no warranties, and confers no rights. The opinions and interests expressed on this employee blog are the employees' own and don't necessarily represent EMC's positions, strategies or views. Inappropriate comments will be deleted at the authors discretion.

Big Data: Big Value or Big Trouble?

Like “the Cloud”, the term “Big Data” has many different definitions.  But no matter how you define it, Big Data is not a fad.

Some use the term to denote the incredible variety, velocity and volume of data that we are creating and using every day.  (Here is a very interesting infographic on that point).

Others use the term to represent huge data sets from which we can intelligently extract useful trends and business information.  In fact, the promise of Big Data is not just the ability to mine data for sales purposes, but also for customer and employee sentiment, and even the idea of “predictive compliance”.

Regardless, as with the Cloud, there is enormous potential value in Big Data — but there are also costs and risks that need to be weighed in the process.  Among these are the eDiscovery and security risks associated with keeping a significant amount of data past its (normally) useful life.  Our friend Barclay Blair has published some interesting thoughts on Big Data, the law and eDiscovery.

As in so many other areas, business will drive the need for big data initiatives; but compliance and legal need a voice in the process to adequately cover potential risks and issues.

EMC SourceOne 7 Archiving and eDiscovery: A Key Pillar of any Data Protection Strategy

As you may have already read in this morning’s EMC Data Domain and SourceOne blog,  EMC is taking data protection to the next level, and to that point, the version 7 release of EMC SourceOne is now available.  SourceOne 7 represents the next generation archiving platform for email, file systems and Microsoft SharePoint content and includes SourceOne Discovery Manager 7, which together, enhance an organization’s ability to protect and discover their data. In fact, in ESG’s recent  “Data Protection Matters” video featuring Steve Duplessie, ESG’s founder and Senior Analyst, it was stated that  “backup and archive are different but complimentary functions” that both are “key pillars of a data protection strategy”, and that “backup without archive is incomplete”. I find this to be in perfect alignment with EMC’s strategy for data protection, and with this in mind, I’d like to review some of the great features of EMC SourceOne 7.

Let’s take File System data as an example. Have you ever needed to locate file system data in your infrastructure without a purpose built archive to assist? Perhaps searching data for business reuse, an eDiscovery request, an audit or investigation? How did that work for you? Often, it’s a time consuming exercise in futility, or at best, an incomplete exercise with non-defensible results. Well, with the latest release of SourceOne for File Systems, a quick search can produce an accurate result set of all files that meet your search criteria AND you never had to physically archive that content.  That’s because this release offers “Index in Place” which enables organizations to index the terabytes (or petabytes!) of data that exists “in the wild” without having to move that data to the archive. How cool is that? Users and applications continue to transparently access that data as needed, yet sitting on top is a layer of corporate compliance.  Now you can apply retention and disposition policies to this data, discover information when required and place only the data that needs to be put on “legal hold” into the physical archive.

SourceOne 7 uniquely addresses each form of content. Since our next gen archiving family was built from platform level up, all content types are managed cohesively, yet each type of content is archived in such a way that compliments the content itself. For instance, archiving MS SharePoint you can:

  • Externalize active data to:
    • Save on licensing and storage costs
    • Increase  SharePoint’s performance
    • Provide transparent access to the content
  • Archive inactive content to:
    • Further decrease storage and licensing requirements
    • Make data available for eDiscovery and compliance
    • Set consistent retention and disposition policies
    • Provide users with easy search and recall from the MS SharePoint Interface

When it comes to the IT administrator, there are plenty of advantages to SourceOne, as well. Our entire archive is managed from a single console; all email, MS SharePoint, and File System data is captured into one archive that eases administrative management burden and decreases the margin of error when creating and executing policies against all types of content. The IT admin can also monitor and manage the overall health of the archive server using their existing monitoring tools, such as MS SCOM.   Improved ROI of monitoring tools, marginal learning curve, and IT efficiency are all part of SourceOne 7.  And of course, for the IT admin there’s the comfort in knowing that the data is being protected while transparently available to end user.

EMC's Source Once

The shift in IT infrastructure certainly encompasses virtualization, and most organizations take advantage of our ability to virtualize SourceOne. This next gen architecture allows the “snap on” of worker servers (either virtually or physically) with no disruption to the processes running on the existing archive servers, allowing  for expansion and contraction of servers and services as necessary. And, with all the new auditing and reporting capabilities in SourceOne 7, it’s a breeze determining when you may need to consider either adding or subtracting servers/virtual machines to handle the workload, to examine trends, and to ensure compliance.

Every good archive deserves its own discovery tool, and with SourceOne Discovery Manager 7, you’ll find just that, an easy to use, intuitive interface that allows for discovery of all email, SharePoint and File System content within the archive.

Source One User Interface

With Discovery Manager you can:

  • Collect archived data (even that “indexed in place” data) to be managed as part of a matter
  • Place into hold folders
  • Perform further review, culling, and tagging
  • Export to industry standard formant such as EDRM XML 1.0/1.2   and others

Data protection based on growth and recovery requirements are changing and “one size recovery fits all” is no longer a viable option – to address all the data protection challenges takes a holistic approach to managing this business critical information.  EMC solutions which include SourceOne 7 for archiving and eDiscovery, in conjunction with our best of breed backup and hardware platforms, make this happen. On that note, please make sure to read about the new Data Domain 5.3 capabilities for backup and archive, in their supporting blog, here.   To find more information on EMC SourceOne, please visit our EMC.COM SourceOne Family and Archiving websites.

Archiving: The Secret Sauce to IT Transformation (Part 2)

Lady Backup asserts that there is a key enabler in IT transformation that EMC hasn’t paid enough attention to: archiving.
To understand why, let’s look at the 3 key benefits of archiving:
Benefit 1: Archiving increases operational efficiency.
How old are the emails stored in your email system? How frequently are files older than a year accessed in your file servers? How many sites are sat untouched in SharePoint?
Archiving allows you to be smart in how you retain content by storing aged content outside of your production environment. First, this reduces the storage capacity required. But also a lean production environment improves backup and recovery, increases application performance, and eases application maintenance/upgrades.
Benefit 2: Archiving improves end user productivity.
Data growth is not just a challenge for the infrastructure – it is also a challenge for end users to find content.
Take this scenario: you are trying to find a Word document created a year ago. Was it sent to by email? Did you save it to your PC hard drive? Or did you store on a network drive? Or maybe it was uploaded into a SharePoint site? Where do you look first??
Your archive can be the first stop for users to do granular searches for content, saving time hunting around for the file or worse, recreating it because it can’t be found.
Benefit 3: Archiving consistently manages retention policies.
Retention management not only keeps your data volumes under control, but from a corporate governance perspective you can consistently enforce retention policies.
Archiving allows you to consistently and automatically execute policies that meet your company’s policies and/or your regulatory requirements.
Let’s face it – data volumes are challenging a “keep everything forever” mentality.
Next week, we’ll look at considerations for an archiving solution. LB

Archiving: The Secret Sauce to IT Transformation (Part 1)

Lady Backup is making her debut to EMC SourceOne Insider.  But don’t let my name fool you.

I have many years of archiving experience dating back to EmailXtender. Fortunately EMC had the wherewithal to invest in a next generation architecture that resulted in EMC SourceOne, whereas the rest of the competitors are still stuck on first generation.

And if my own experiences weren’t enough, I also married Mr. Archive last year. This union in fact sets the foundation for future discussions we’ll have about the intersection of backup and archiving.

Notice I said “intersection.”

I continue to champion that a backup is NOT an archive. But the underlying architecture that EMC is developing allows for the consolidation of both backup and archive, positioning us uniquely in the market.

But that’s a point for a future conversation.

Let’s talk about archiving.

Given my history, I think EMC so far has missed the opportunity to include archiving as a key enabler to the IT transformation discussion.

Don’t get me wrong – we need to transform our IT Infrastructure from a static, physical model to one that is dynamic, agile and infinitely scalable. But the question in my mind is whether you are transforming your infrastructure to store content that is outdated, no longer of value, or potentially damaging to your organization.

The way I see it, we need to transform information management as part of IT transformation. Archiving is an enabler to manage the volume of data that is collecting in your production environments – allowing you to systematically manage what you are storing, where and for how long.

You will find that my blogs are short and sweet. Next week I’ll give you my views on the three benefits of archiving. LB

Activating Your Information Management Shield

We talk with companies every day about how they can be better at managing their enterprise information.  Good policies, with technology to enable and enforce them, can help insure that records and compliance information are retained for the right amount of time, while also enabling the deletion of stale and useless information which has outlived its retention period.  Good information management processes insure that protected information is stored in the right place, operational efficiencies are enhanced by focusing on useful information and the e-Discovery process is easier and more efficient.

Many organizations know that they should implement information management initiatives, but often have difficulty in providing concrete reasons to the business.  If your organization is looking for more reasons why good information management is valuable, two recent cases provide some great reasons:

  • If you have an information governance policy, it may help you to defeat a claim for sanctions even if data has been deleted; and
  • If you don’t have an information governance policy, and you delete data that was subject to compliance requirements, the lack of a policy can help to establish the bad faith necessary to award sanctions.

Diligence As A Shield

In Danny Lynn Electrical & Plumbing, LLC v. Veolia Es Solid Waste Southeast, Inc., 2012 U.S. Dist. LEXIS 62510 (M.D. Ala. May 4, 2012), the plaintiff requested sanctions for the defendants’ alleged failure to properly implement a litigation hold.  Specifically, the plaintiff claimed that defendants had deleted nine email accounts and kept in place an auto-delete function which removed email from the trash after 10 days.  They also alleged that the defendants improperly sent notifications to employees on legal hold that they should continue to delete email messages to comply with email account size limitations.

The court found it significant that the defendants had deployed an email archive to capture all of its email messages.  (Interestingly, the court did not discuss or make any findings about how the archive had been setup, configured or managed).  In addition, in finding that there was no bad faith (a requirement in the 11th Circuit), the court found it important that defendants “began using a software system that archives all emails”:

The court’s impression is that the defendants have expended great effort to insure that the plaintiffs receive information from both their live and archived email system by providing document review technology and allowing access to its database.  All of these factors added up to the court finding that no sanctions were warranted.

Lack of Diligence Can Be A Final Straw

The flip side to the protection offered by information management can be found in FDIC v. Malik, 2012 U.S. Dist. LEXIS 41178 (E.D.N.Y. Mar. 26, 2012) where the court also considered a spoliation motion for the deletion of emails.  The email messages related to a law firm’s prior representation of a mortgage company.

In determining whether bad faith was present to enable sanctions, the court noted that the subject email messages were required to have been preserved not initially for litigation hold, but under compliance requirements — professional responsibility and ethical rules.  The court found that retention under the compliance requirement was especially important to this case:

A regulation requiring retention of certain documents can establish the preservation obligation necessary for an adverse inference instruction where the party seeking the instruction is ‘a member of the general class of persons that the regulatory agency sought to protect in promulgating the rule.  The court held off on a final decision pending an evidentiary hearing.

Being Proactive With Information Management

We all know that litigation holds are difficult to implement and are almost never perfect.  Sometimes something bad actually does occur– a custodian is inadvertently omitted, a handful of emails are lost.  But more often, nothing bad happens at all.  Still, even in those cases it can be difficult (and time-consuming and expensive) to fight off the other side’s claim that something “must have been lost.”  A good information management policy, with tools and education to enable it, can go a long way towards showing good faith and protecting your organization from harm.

Open Records and FOIA – Pushing Government Technology into the 21st Century

At a recent a conference for compliance and IT professionals working in the state government sector, it quickly become evident that one of their main concerns was the tremendous increase in the number of open records requests that they have to deal with.   Both the federal and state governments give much lip service to the theory of transparency but few have made the necessary changes to properly deal with the onslaught of requests that appear almost daily.  Wisconsin’s Governor, Scott Walker’s administration has already produced 60,586 pages of open records in response to 222 requests in 13 months.  Compare that to 312 requests filled during the previous governor’s first 4 years[1].  It’s not just Wisconsin that is dealing with an explosion of open records and FOIA requests.  The U.S. Department of Defense received 67,434 in 2009 compared to 74,573 in 2010 and the National Archives and Records administration received 14,075 in 2008 compared to 18,129 in 2011[2].  Most government entities handle open records requests the same as they handle eDiscovery for litigation, manually and on an ad hoc basis.  Unfortunately for government agencies, the turnaround for a response is much quicker than for litigation.  Federal agencies have a statutory requirement to respond to requests within 20 business days[3].  State agencies have time limits ranging from 10-30 days or within “a reasonable time.”  For this reason, IT departments are struggling to keep up and there is a substantial backlog at most agencies. Continue reading

Machine Learning For Document Review: The Numbers Don’t Lie

Jim Shook

Jim Shook

In light of Magistrate Judge Andrew Peck’s recent decision in Da Silva Moore v. Publicis, much has been written and discussed about the idea of using machine learning techniques to automatically classify documents during review, a process sometimes known as “predictive coding” or even “computer assisted review”. (Although these terms may actually imply different technologies and processes this article adopts Judge Peck’s umbrella use of the term “predictive coding”). This article explores some of the key issues around this promising intersection of law and technology.

What Is Predictive Coding? How is It Used?

At a simple level, predictive coding is just a technological “lever” that allows a (relatively) small amount of review work – usually by humans — to be leveraged across a much larger set of documents. Let’s say Continue reading

[Infographic] A Proactive Approach to eDiscovery

2011 eDiscovery Year End Wrap-up

It has certainly been a banner year in eDiscovery.  Judge Scheindlin kicked things off with a bang with her decision in National Day Laborer Organizing Network v. U.S. Immigration and Customs Enf. Agency[1], that the federal government must include metadata in Freedom of Information Act (FOIA) products because certain key metadata fields are an integral part of public records.  This ruling struck fear into every government agency and would have created the need for massive changes to the way they kept and produced records.  However, Judge Scheindlin withdrew the opinion in June explaining that, “as subsequent submissions have shown, that decision was not based on a full and developed record.”  She further stated that “[b]y withdrawing the [previous] decision, it is the intent of this Court that the decision shall have no precedential value in this lawsuit or any other lawsuit.”  I guess we are left to draw our own conclusions from that statement.

2011 also saw the rise in importance of machine based classification and coding.  This was emphasized by the keynote speech given by Judge Andrew Peck at the Carmel Valley eDiscovery Retreat in July.  Continue reading

Finding Key Players in Legal Hold Notification, Preservation and Collection

As a practitioner, I have had many conversations and discussions recently on leading practices and trends related to Ted O'Neillitigation hold notifications and preservation orders.  Organizations routinely have the need to effectively manage preservation for litigation, internal investigations and for varying regulatory purposes.

Since the amendments to the Federal Rules of Civil Procedure there has been much discussion on this topic, but limited practical solutions to the problem.  The Pension Committee decision has made notifying & managing custodians & “key players” effectively a core requirement for most legal departments.

The challenge with the legal hold notification, preservation & collection processes for most organizations is the “ad hoc” nature of defining systems of record, ESI & custodians & executing preservation in a defensible manner.  Notifying custodians a timely manner and keeping an audit trail to defend Continue reading