Digital+Formats+Research




 * 794: Internship in Library and Information Science **

(78 KB)

Created July 2011 by Jennifer Thompson for use within the University of South Carolina's Scholar Commons
 * Standard Digital File Formats for use within Scholar Commons **

I. Text

A. PDF: Standing for Portable Document Format, PDF is the “global standard for capturing and reviewing rich information from almost any application on any computer system and sharing it with virtually anyone” (“Adobe PDF History,” para 1). The PDF format proves to be an excellent choice in digitally storing text within an institutional repository as it is an ISO (International Organization for Standardization) 32000 open standard. ISO 32000 “...specifies a digital form for representing electronic documents to enable users to exchange and view electronic documents independent of the environment in which they were created...viewed or printed” (“ISO 32000-1:2008,” para 1). ISO 32000 continues (and will continue) to be maintained to insure the longevity and reliability of the PDF. This format boasts multiplatform access across Windows and Mac OS and searchable text. PDFs can also be used in conjunction with assistive technology to provide information to those with disabilities, with greater ease.

B. PDF/A: There are currently two standard archival based PDF formats approved by the ISO – PDF/A-1, in 2005 and PDF/A-2, in 2011. ISO 19005-1 (PDF/A-1) “...identifies a 'profile' for electronic documents that ensures the documents can be reproduced in years to come” (Reeves, et al., para 7.) PDF/A strives to be completely self contained as it is not allowed to rely on any information that is externally sourced such as file images or fonts not embedded within the PDF/A itself (Reeves, et al., para 8.) PDF/A-2 is a recent addition to the archival based PDF formats, approved in 2011, therefore not adopted by industry standards as of yet. PDF/A-2 does, however, allow “...JPEG2000 compression, supports transparency effects and layers, embedding of Open Type fonts, and digital signatures” (“PDF/A-2 Standard Published..,” n.d.). Benefits of JPEG2000 compression include allowing for the same image resolution in a smaller file size and allowing for regions of interest, which show different areas of a single image at varying degrees of quality.

II. Audio

A. WAV: Standing for Waveform Audio File Format, this file format is commonly used as first generation archival storage because it is a lossless format. WAV is compatible on both PCs and MACs. While WAV file formats are larger than others formats, the size guards against loss of sound quality. WAVs can be tagged with metadata as they are a derivative of the Resource Interchange File Format (RIFF) making them able to store data in “chunks.” More specifically, metadata can be stored in the INFO chunk (“Wave Audio File Format,” 2010). The preferred file format at the University of South Carolina’s Music Library is lossless WAVat 44.1 kHz/16 bit, which is equivalent to that of CD quality. Some institutions aim to digitize at a lossless WAV at 96 kHz/24 bit, believing that this will eventually become the standard, but as of now, the norm is at least above 40 kHz (N. Homenda, personal communication, July 27, 2011). The standard of 40kHz stems from the Nyquist-Shannon sampling theorem. This theorem states that since the human ear can detect sound in a range from 20 Hz to 20Khz, a sound sample should be at least 40kHz “…in order for the reconstructed sound signal to be acceptable to the human ear” (“Practical Issues,” para 4).

B. AIFF: Audio Interchange File Format is most commonly used on MACs but it is also compatible with PCs. The AIFF file format also allows for a high quality of storage, comparable to that of WAV. Again like WAV, AIFF can also store metadata within chunks, such as name, author, copyright and annotation (“AIFF (Audio Interchange File Format)”, 2010). AIFF, though, is also proprietary, making it an undesirable format within an institutional repository.

C. BWF: Uncompressed Broadcast Wave Format is another acceptable file deemed by the National Archives (“Digital Audio, n.d.). An extension of the popular WAV file, the BWF file format allows for more embedded data at the beginning of the file. “This header file contains its own descriptive metadata …and digital file format metadata potentially useful for preservation purposes” (Young, et al., pg. 26). Because of it being uncompressed and lossless, the BWF has a file size limitation from 2 to 4 GB, the same limitation as a WAV file.

III. Still Images

A. TIFF: Tagged Image File Format is the preferred archival file format within the University of South Carolina’s Digital Collections (K. Boyd, personal communication, July 29, 2011) and the Library of Congress’s American Memory Project. Within the American Memory Project, TIFFs are the master format for papers, photographs and negatives. This non-compressed file format is widely adopted as it is supported by most image manipulation software, making it an industry standard among photographers and graphic artists. TIFFs are viewed as an excellent format for images with high spatial resolution, which helps to determine an image’s clarity (“TIFF, Revision 6.0,” 2011). Because “[m]etadata is an essential component of the TIFF format; many TIFF images dating back to the 1980s still can be displayed in modern TIFF readers through the interpretation of the tagged metadata set” (“Guidelines for TIFF Metadata”, 2009). It is because of this metadata set, that the TIFF file format proves to be excellent choice within a digital library. And while TIFFs are used as an archival format, JPEGs prove to be useful for viewing online as they are compressed and are of a smaller file size.

IV. Video When deciding on what file formats would be best in digitally storing video long term, it is important to take into consideration the following criteria: (Taken from Larry Jordan “Final Cut” Monthly Newsletter, July 2011)

1). The video format must be lossless.

<span style="font-family: Arial,Helvetica,sans-serif;">2). The video format must be relatively well-adopted.

<span style="font-family: Arial,Helvetica,sans-serif;">3). The video format must have a wide variety of migration tools for file conversion.

<span style="font-family: Arial,Helvetica,sans-serif;">Like with still images, video is made available using a tiered approach, with an archival format to house the file and a smaller, compressed file format for online viewing (M. Cooper, personal communication, July 26, 2011).

<span style="font-family: Arial,Helvetica,sans-serif;">Archival Format: <span style="font-family: Arial,Helvetica,sans-serif;">A. DPX: Digital Moving-Picture Exchange is known as a preservation master used for the archival of film based sources. DPX files contain a series of headers (generic and industry specific), along with a block of user-defined data and then the image data itself (“File Organization,” para. 1). This file format is designed for storage and for providing the input to place digital images back to film for projection (“Digital Moving-Picture Exchange (DPX), Version 2.0,” 2007). Because of its size and quality, it is not for playback on PC or MAC applications.

<span style="font-family: Arial,Helvetica,sans-serif;">B.Motion JPEG2000: (Joint Photographic Experts Group) MJ2 is another format widely used as an archival format because it is considered “truly” lossless because it has a “...reversible, mathematically-lossless mode...” (Pearson, et al., pg. 237). MJ2 also offers a more flexible placement of metadata than DPX, which some prefer. But criticism of MJ2 comes from its relative newness; especially in comparison to DPX, which has been around since the 90s and which was stemmed from the Kodak Cineon format (“Digital Moving-Picture Exchange (DPX), Version 2.0,” 2007).

<span style="font-family: Arial,Helvetica,sans-serif;">Access Copies: <span style="font-family: Arial,Helvetica,sans-serif;">A. H.264/MPEG-4 AVC: (Moving Picture Experts Group Advanced Video Coding) This file format offers a higher resolution of viewing with a smaller file size, making it a convenient way to stream videos online. H.264 also can be wrapped as a Flash file, making it an industry standard because of easy accessibility (M. Cooper, personal communication, July 26, 2011). This format is also widely used for online streaming at sites such as YouTube and Vimeo.

<span style="font-family: Arial,Helvetica,sans-serif;">V. Helpful Resources

<span style="font-family: Arial,Helvetica,sans-serif;">A. Open Vault: The Moving Image Research Collections is currently looking to adopt this form. This is a guide from the ground up. <span style="font-family: Arial,Helvetica,sans-serif;"> -[] <span style="font-family: Arial,Helvetica,sans-serif;"> -[]

<span style="font-family: Arial,Helvetica,sans-serif;">B. The Association of Moving Image Archivists (AMIA): Recommended (especially the LISTserv) by Dr. Mark Cooper and Ben Singleton of MIRC. And me as well! <span style="font-family: Arial,Helvetica,sans-serif;"> -[]

<span style="font-family: Arial,Helvetica,sans-serif;">C. “File or Copy Type” from The National Archives: <span style="font-family: Arial,Helvetica,sans-serif;"> -[]

<span style="font-family: Arial,Helvetica,sans-serif;">D. South Carolina Digital Library Guidelines: <span style="font-family: Arial,Helvetica,sans-serif;"> -[]

<span style="font-family: Arial,Helvetica,sans-serif;">E. Papers: <span style="font-family: Arial,Helvetica,sans-serif;"> -”[|Sound Directions: Best Practices for Audio Preservation]” by Mike Casey and Bruce Gordon.

<span style="font-family: Arial,Helvetica,sans-serif;">VI. References

<span style="font-family: Arial,Helvetica,sans-serif;">Adobe PDF History. (2011). Adobe Acrobat (par. 1). Retrieved July 12, 2011, from Adobe Systems Incorporated website: [|http://www.adobe.com/products/acrobat/adobepdf.html]

<span style="font-family: Arial,Helvetica,sans-serif;">AIFF (Audio Interchange File Format). (2010, April 5). Retrieved July 20, 2011, from Library of <span style="font-family: Arial,Helvetica,sans-serif;"> Congress website: []

<span style="font-family: Arial,Helvetica,sans-serif;">Digital Audio. (n.d.). Frequently Asked Questions (FAQ) About Digital Audio and Video Records. <span style="font-family: Arial,Helvetica,sans-serif;"> Retrieved July 22, 2011, from U.S. National Archives and Records Administration website: <span style="font-family: Arial,Helvetica,sans-serif;">[]

<span style="font-family: Arial,Helvetica,sans-serif;">Digital Moving-Picture Exchange (DPX), Version 2.0. (2007, July 3). Retrieved July 31, 2011, from Library of Congress website: []

<span style="font-family: Arial,Helvetica,sans-serif;">File Organization. (n.d.). DPX File Format Summary (par. 1). Retrieved July 30, 2011, from File Format website: []

<span style="font-family: Arial,Helvetica,sans-serif;">Guidelines for TIFF Metadata Recommended Elements and Format Version 1.0. (2009, February 10). Retrieved July 29, 2011, from Federal Agencies Digitization Guidelines Initiative website: [|http://www.digitizationguidelines.gov/guidelines/TIFF_Metadata_Final.pdf]

<span style="font-family: Arial,Helvetica,sans-serif;">ISO 32000-1:2008. (2011). TC 171 Document management applications (par. 1). Retrieved July 12, 2011, from International Organization for Standardization website: []? csnumber=51502

<span style="font-family: Arial,Helvetica,sans-serif;">Jordan, L. (2011, July). Picking the right format for archiving video. In Larry Jordan’s Monthly Final Cut Newsletter — July, 2011. Retrieved July 31, 2011, from Larry Jordan and Associates website: http://www.larryjordan.biz/larry-jordans-monthly-final-cut-newsletter-july-2011/#archive

<span style="font-family: Arial,Helvetica,sans-serif;">Pearson, Glenn and Michael Gill, “An Evaluation of Motion JPEG 2000 for Video Archiving”, Proc. Archiving 2005 (April 26-29, Washington, D.C.), IS & T (www.imaging.org), pp. 237-243.

<span style="font-family: Arial,Helvetica,sans-serif;">PDF/A-2 Standard Published by ISO! The New Standard Includes Great Technical Enhancements. Free Informative Webinar to Be Scheduled! (2011). Retrieved July 13, 2011, from PDF/A Competence Center website: []

<span style="font-family: Arial,Helvetica,sans-serif;">Practical Issues. (2011). Sampling Theorem (par. 4). Retrieved July 27, 2011, from eFunda website: <span style="font-family: Arial,Helvetica,sans-serif;">[|__http://www.efunda.com/designstandards/sensors/methods/dsp_nyquist.cfm__]

<span style="font-family: Arial,Helvetica,sans-serif;">Reeves, R., & Barfuss, H. (n.d.). PDF/A - A new Standard for Long-Term Archiving. Retrieved July 13, 2011, from PDF/A Competence Center website: [|http://www.pdfa.org/doku.php?id=pdfa:en:pdfa_whitepaper]

<span style="font-family: Arial,Helvetica,sans-serif;">TIFF, Revision 6.0. (2011, July 1). Retrieved July 29, 2011, from Library of Congress website: <span style="font-family: Arial,Helvetica,sans-serif;">[]

<span style="font-family: Arial,Helvetica,sans-serif;">WAVE Audio File Format. (2010, September 8). Retrieved July 20, 2011, from Library of Congress <span style="font-family: Arial,Helvetica,sans-serif;"> website: []

<span style="font-family: Arial,Helvetica,sans-serif;">Young, A., Olivieri, B., Eckler, K., & Gerontakos, T. (2010, November). Building Digital Audio <span style="font-family: Arial,Helvetica,sans-serif;"> Infrastructure and Workflows. Computers in Libraries, 30(9), 24 - 28.