For records management, archival and e-discovery purposes, text is a beautiful thing. Whether it comes in the form of a Word document, Excel spreadsheet or an e-mail, it can be cataloged, classified, retained and searched easily using most conventional archival and retrieval solutions.

Even if the file name itself doesn’t point explicitly to the contents of the file, it’s still relatively easy to glean the context of text-based files based on keywords. This enables text files to fit neatly within the confines of electronic records retention systems for automatic archival based on established retention rules. As a result, it’s fairly easy to separate the wheat from the chaff: the business record that must be retained from a personal note of the “Honey, can you pick up milk on your way home?” variety.

The ability to distinguish retention-worthy files from miscellaneous junk also makes text-based records retention much more efficient and economical. Dumping that which is of no value makes searching faster and prevents wasted dollars spent on excessive capacity to store unnecessary junk.

However, as digital technology and mobile hardware have permeated the business world and continue to grow at staggering rates, a new challenge has emerged for legal, records management and compliance professionals: the multimedia file. Audio, video, image and app files have become as much a part of the corporate technology landscape as the text-based files we love so much.

The problem is, multimedia files are far more difficult to recover from archives within the conventional parameters of records retention. The issue is becoming a major challenge that is only going to get worse as mobile devices and multimedia files continue to infiltrate the corporate environment. The industry is thirsty for a solution that overcomes the major barriers of multimedia content.

The Deluge of Content

Audio, video, photo and app/gaming files are becoming increasingly prevalent in the workplace as companies take greater advantage of modern technology for a variety of purposes such as sales presentations that mix PowerPoint slides with audio/video clips, video blogs and photo streams and productivity apps that integrate with customer management software.

The sheer volume of multimedia content is increasing in large part due to the rapid adoption of mobile devices in the workplace. In January 2011, Apple confirmed to investors that more than 80 percent of the Fortune 100 were using or testing the iPad, up from 65 percent just three months prior. And, this is likely only the beginning. Forrester Research has predicted that tens of millions of tablets will be in the workplace by 2015.  

The fact that these devices make multimedia content readily available, easy to access and download is contributing substantially to the “pile up” of data. Compounding the problem is the fact that multimedia files are typically quite large — multiple gigabytes isn’t uncommon for video files — which consumes valuable, costly storage space. In addition, mobile devices are often used for both business and personal activities, making it virtually impossible to distinguish business files from personal files. Is that a corporate training video or the latest Hollywood blockbuster the employee downloaded to watch while on a flight? Are those .jpg images from the most recent product photo shoot or the marketing manager’s recent beach vacation?

Lack of Metadata

The deluge of content would be considerably more manageable if it weren’t for the single biggest limiting factor of multimedia files: the lack of reliable metadata. In some cases, file names can give you a hint of the actual content of the file, but these can be changed so easily that to rely on the filename for determining the business value of a file is futile. And, serial number file names obviously offer no clue about content.

Time and date stamps can indicate when the file was downloaded and last modified, but the instant that image or audio file is forwarded to another user, the file is essentially cloned with new metadata, making it impossible to determine that the two are duplicates. Absent that determination, most companies retain both, wasting valuable bandwidth and storage capacity to archive untold numbers of duplicate files with varying metadata.

When in Doubt, Keep it All?

In short, multimedia files take up more space and are accumulating faster than their text-based counterparts, and it’s nearly impossible to distinguish a legitimate business file that must be archived from one that is not. At this point, the most commonly accepted solution is to err on the side of caution and archive everything. This approach is certainly the most costly, with exorbitant sums being spent on server capacity to store it all.

It’s also incredibly inefficient. Searching through terabytes of content can take an eternity, and just about the only way to comb through multimedia files is to do so manually — opening and viewing/listening to each file until the needle finally emerges from the haystack. There are some technologies that can convert audio files to text files phonetically, but this assumes you already know that the file has value as a business record — it does nothing to help you determine whether it should actually be retained in the first place.

Keeping everything can also be riskier than you might think. What if that movie file labeled “SexualHarassmentTraining” is actually an adult video downloaded by an employee to his or her iPad? This clever trick by the employee to fly under the radar could be a disaster for your company should it be discovered.

The Wish List

Now that we’ve identified the challenges with multimedia records management, what might the solution look like? First, it would likely use metadata as a starting point, but must somehow have the ability to deduplicate the retention database based upon some parameter that would remain constant across each iteration of the file.

Next, content analysis must be incorporated to provide an automated, software-based litmus test to confidently determine what qualifies for retention and what does not. Voice recognition as well as phonetic interpretation and analysis would likely come into play. File compression is a vital component to help mitigate the massive storage capacity such an archival system would require. Finally, a dynamic search engine that can penetrate the compression protocol and understand the multidimensional file data would enable timely and efficient e-discovery across these new archives.

Until such a solution is available, most companies have no choice but to hoard every photo and every video and hope they’ll be able to unearth the right nugget from the digital heap should the need arise. Data centers will certainly reap the benefits of ever-growing capacity needs as the multimedia phenomenon rolls on. Meanwhile, you should know that the industry is aware of the problem and working on a solution. When it arrives, you can bet there will be a YouTube video explaining it in great detail.

This feature originally appeared at  Information Management.

Linda G. Sharp is associate general counsel at ZL Technologies. An experienced attorney and ediscovery consultant, she is a frequent presenter on the topics of records management, compliance and ediscovery. Ms. Sharp has her MBA and is licensed to practice law in the state of California.

 

Register or login for access to this item and much more

All Health Data Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access