Photo Archiving: In the Digital Age, Longevity Is No Sure Thing

April 20, 2015

By Greg Scoblete

Wiki Commons

 Inside the Mivitec datacenter in Munich, Germany. To keep digital files safe, preservation experts advise treading cautiously with the cloud and instead trusting images to open formats and hard disk storage.

For photographers with an archive brimming with slides, contact sheets, negatives and prints, the future of that work looks bright. Basic defenses against the corrosive agents of time, a dollop of good fortune and those photos have an excellent chance of surviving for potentially a hundred years or more.

Digital photographs? Not so much.

“We’re living in an unprecedented time,” says Henry Wilhelm, co-founder of Wilhelm Imaging Research and a photo permanence expert. For all of human history, we’ve created “human-readable” records—like your archive of prints—that require nothing more than a working set of eyes to be accessed and understood. Now we are in an age of “machine readable” records. In such an age, it’s a question of whether future software and devices will be able to able to access, read and display the bits of digital information that represent your images. As anyone who’s held an 8-inch floppy disk knows, that’s no sure thing.

To shepherd your digital photos through generations, three major factors have to be considered.

The Security of Your Files

The ideal solution, says stock photographer and image preservation expert David Riecks, is known as the “3-2-1 approach.” That’s three copies of a file, stored in two different places with one of those locations off-site. Wilhelm prefers hard drives as cloud since services like Dropbox are only good archival solutions as long as they’re in business. As the history of technological disruption attests, today’s well-capitalized empires may collapse into tomorrow’s business school case studies.

The life of any given hard drive is variable, but a study from the cloud provider BackBlaze based on a comprehensive examination of drives used in their servers, found the average to be six to seven years. Migrating data to new drives on a five-year basis may provide extra insurance.

Moving large amounts of data back-and-forth introduces plenty of opportunities for file corruption. Fortunately you can run the equivalent of a “DNA test” on files that are being transferred from one media to another to ensure they arrive intact. Riecks uses a hashing program that creates a checksum file that examines the bit structure of all his archived files. If there were any errors introduced during the transfer process, the checksum file will flag them. AV Preserve offers a variety of file-checking tools, some of them free, to verify the integrity of image collections across storage platforms.

The recently introduced Mylio software is another very effective method to keep files synchronized multiple platforms, including offline devices, Wilhelm said. While Mylio is new, it is “cloud agnostic” and preserves the file structure in a Mac or Windows environment so that in the event it is shut down, a user would still be able to easily locate images on stored devices, says company founder David Vaskevitch. Unlike other image organizers, “Mylio has protection baked in,” since it also adheres to the 3-2-1 approach and offers a visual guide alongside each image to alert users to its protection status, Vaskevitch says.

One shouldn’t overlook the value of printing as an archival method either, Wilhelm says. Properly produced and cared for, printed images can last for hundreds of years—far longer than any contemporary digital storage media. What’s more, a signed and dated print is the currency of the fine art photographic market, Wilhelm says. “You don’t see people spending $5.6 million for a file.”

The Readability of Your Files

Second, and more challenging, is choosing an image file format that has the best chance of being machine-readable in the future. “A primary characteristic for an archival format is that the structure and nature of the format be openly documented and understandable by any reasonable software engineer in the imaging field,” says Tom Hogarty, Director of Product Management at Adobe.

JPEGs and the Adobe-championed DNG format fall under this umbrella. “Support for JPEG codecs will remain part of photographic software years from now,” says Peter Schelkens, PR chair for the JPEG Committee. “I do not see main photo software vendors discontinuing support in the near and not so near future. Moreover, the fact that JPEG is an official ISO/IEC and ITU-T standard guarantees that it is well documented and generally accepted. Such standard specifications are well archived and should in principle allow you to decode these images even thousands of years from now, assuming of course we did not move out of the digital era.” 

A “plain vanilla TIFF” file, while proprietary, is also very well documented and is Riecks’ archival format of choice. For Riecks, “plain vanilla” means an 8-bit TIFF file in the AdobeRGB color space.

The Library of Congress uses bitmaps in a TIFF wrapper to preserve their digital photo files, says Carl Fleischhauer, project panager in the Office of Strategic Initiatives at the Library of Congress. This approach prioritizes long-term readability over the ability to make future edits to a file, but photographers may want the emphasis on the latter. In that case, formats like DNG make sense, Fleischhauer says.

What our experts were less confident in was the ability of proprietary, camera-specific RAW files to offer as high a degree of longevity and readability.

Preserving the Metadata

Ensuring a photo has accurate and thorough metadata is critical to digital photo preservation, argues Riecks, because it enables future programs to find and organize a photographic collection. It also ensures critical copyright data travels with the images as they migrate from old storage solutions (like hard drives) to new ones that haven’t even been conceived of yet. It also protects images that are discovered online from being deemed “orphaned” works.

A first step to preserving metadata is to make sure it’s input early in your workflow, otherwise you’re “throwing images into a black hole,” Riecks says. Given the sheer number of images you’re likely managing, it makes sense to discriminate. Riecks will batch process all photos with a minimum amount of metadata to ensure basics like location, copyright, time and date, and captioning details are represented. For images he cares more about, he’ll input even more information such as tags, ratings and keywords. Riecks offers a series of detailed keywording guidelines on his site, Controlled Vocabulary.

Like image files, metadata files come in a variety of flavors and not all will be viable as a long-term preserve of valuable image information. Adobe, for instance, backs the open-source XMP (for Extensible Metadata Platform) format. News and photo agencies use the IPTC Photo Metadata Standard, which is built off of Adobe’s XMP and an older IPTC standard to ensure long-term interoperability (it also comes with its own manifesto). Another virtue of using widely available image file formats as opposed to proprietary RAW files is that they support embedding metadata into the image file.

Whatever method you adopt, it’s clear that for images to survive for decades in a machine-readable age, image-makers have to keep preservation at the forefront of their workflow.

“There’s a paradigm shift in that preservation used to be the end of the process,” Fleischhauer says. “Now, preservation actions have to start at the very beginning of your workflow.”

See also: The Best High Capacity Hard Drives for Your Digital Archive