Rating: +0

Positive Negative

John,

Lots of depends. How many documents are you dealing with? 100's, 1000's, millions, 100's of millions? Is this a static set or is it added to daily, or other? Where are the documents originated? MS Word for example, or SAP Invoices for example? And what formats are the documents in? There was a mention of storing files in TIFF, but a spreadsheet stored as a TIFF file would be useless if there were any formulas in the spreadsheet, so there are many instances where embedded macros, hyper links, or programming make storing a picture of the record not practical. For long-term retention there are a few practical concerns that you want to address, many depend on the actual retention period. If you are retaining these documents for <15 years, there is a lot less you have to worry about vs. >15. So you can use >15 years as the dividing line for “long term” preservation. At a minimum you want to ensure read-ability and accessibility.

This means you want to make sure that you have a program and an operating system to open and view the records when you access them 10 years from now. Note that certain well known formats (.doc) are being deprecated over time. And many of us would be challenged to open a .WK1 or a Wordperfect 4.2 file today, and if the .WK1 file contained macros with a 3rd party program (i.e. Symphony) the spreadsheet would more than likely be useless, not to mention that if it was stored on a 5 ¼” 360K floppy, you’ll have to resort to eBay to find a floppy disk drive to read the file. The answer to these for long-term preservation is virtual and physical migrations. Yes, over the very long-haul electronic data needs care and feeding in order to survive. In very large instances, petabyte and yottabyte archive stores this is automated and nearly a continual process to also ensure authenticity of the bits. The worst thing that can happen to an archive is to try and access a file to find that the bits are corrupt and no longer readable.

So although this may seem bigger than a bread box type of stuff, it’s really applicable even for storing a few files. If you burn your files to a few CD/DVDs and store them in a cabinet, and then try to open them 10 years from now, first your CD/DVD technology may be significantly different, next what you stored that was originated on Vista using Powerpoint 2007, may not be render-able using Vista 2021 and Powerpoint 2017, so you’ll have the bits of the electronic file sitting on your screen, and on your media, but you can ‘t open the file.

Some of the suggestions of storing files on the cloud is very practical today, especially since it is low cost, easy, and you generally have all of the redundancy and DR built in, so you address the worst case scenario of the office burning down. The cloud certainly addresses that concern at a low cost. If you’re <15 years, and much smaller than the proverbial bread box use what you have at your disposal and store your files on a cloud provider like Amazon S3 for redundancy and call it a night. If you’re much bigger, it’s a different story.

Lastly, if you have to search the text of these documents in the event of litigation? Or ever prove them to be authentic, to show that they have never been altered or tampered with, you have a completely different set of requirements all together.

Good Luck,
Peter

September 2010

http://www.csi1000.com
Links:

* http://www.csi1000.com/docs/100YrATF_Archive-Requirements-Survey_20070619.p...
* http://www.csi1000.com/docs/SNIADMFArticleNov08.pdf
* http://www.csi1000.com/docs/SNIA-DMF_Building-a-Terminology-Bridge.pdf


Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>