Document Actions

Review-12

by Ananth Rao last modified 2007-05-25 06:37

NASA DAAC

Review of HDF5 operational readiness:

  1. NASA's Earth Science Data Systems Standards Process Group (SPG) is considering the HDF5 for adoption as a community standard. This is the second review of HDF5, this one focusing on its readiness for operational use. The questions below are provided to guide feedback from data systems, application providers, instrument teams and others. You only need to answer questions applicable to you. Please send comments to spg-rfc-007@lists.nasa.gov.

    My experience with HDF5 has been in two areas. First, I worked for Research Systems, Inc. (currently ITT Visual Information Solutions) to incorporate support for HDF5 into their IDL package. Second, at the NASA Langley Atmospheric Science Data Center as a Science Data Specialist, I write visualization tools for satellite data stored in the HDF5 format, among others.

  2. Do you currently use or plan to use HDF5 in a production setting? What types of applications do you use with HDF5? Is HDF5 applicable to your applications (e.g., Does it work well with the data types and data manipulations in your application?)

    As a developer for ITT VIS' IDL, HDF5 file access was implemented in this commercial software product, so in a sense this was a production setting. The complexity and flexibility in HDF5 made it challenging to incorporate into the product's framework, especially in the area of data types.

  3. Why do you choose to use HDF5 over other data formats for your applications?

    N/A Not a data producer.

  4. Have you or your users encountered any difficulty when using some of the data access or visualization tools (e.g., IDL, GrADS, ) on HDF-5 data files? If you have, please provide a brief description of your experience.

    Have not experienced difficulties in using IDL with HDF5. Have encountered one instance using the HDF Group's HDFView with the HDF-EOS plug-in not handling a particular data element having multiple dimensions, two of which use the same dimension definition.

  5. Does the performance of HDF5 you have experienced meet your requirements? (e.g., Can it handle the data types in your applications? Does it take a long time to read and write HDF5 files?)

    Fortunately, the data I've worked with has stuck to the simpler native data types so HDF5 has been easy to deal with. I don't have experience with writing HDF5 files, but reading them seems to be fairly quick, and is noticeably faster than HDF4.

  6. What operational challenges or limitations does HDF5 present? (e.g., Does it take a long time to learn how to use it? Does it require advanced processing power, large amounts of memory, complex configuration, etc)

    HDF5 is somewhat complex in design and available functionality, especially in the area of data typing. However, this complexity gives it the flexibility to implement a variety of solutions to match data needs. The complexity affects data providers, who need to determine the best use of the format, and software package developers, who need to handle all aspects of the format, more than data users.

  7. What benefits does HDF5 present? Do the benefits of HDF5 outweigh the challenges? (e.g., Does it offer the flexibility you want to package the data types in your applications? Does it facilitate interdisciplinary studies?)

    As with any data format, getting up to speed on understanding the format and being able to read the data are the biggest hurdles for users, no matter what the format. HDF5 has a bit of a disadvantage at this point since it is new and there are fewer tools and less collective experience. The Aura Data Systems Working Group has tried to ease this learning curve by developing standard data storage mechanisms among the Aura instruments. So in this case, using a common format, which for Aura is HDF5, should facilitate inter-instrument studies among the instruments on that platform. As far as facilitating interdisciplinary studies, the diversity (plethora?) of formats used as well as structure based on an instrument's particular characteristics will always keep interdisciplinary studies challenging.

  8. How much data do/will you provide or archive in HDF5? (number of distinct data products or data sets, total data volume, number of files.)

    Our data center is the archive for data sets from the TES instrument on the EOS Aura satellite, which was launched in July 2004 and is presently operating. TES takes Global Survey (GS) measurements for 16 consecutive orbits on an every-other-day basis. The current and planned products in HDF5 or HDF-EOS5 format include the Level 1B data (1 granule, comprising 4 files, per orbit during GS), Level 2 data (currently 8-11 files per GS), and three Level 3 globally averaged data products (7-10 GS products, an 8-day product, and TBD monthly products).

    Data volume totals:

    Level 1B 600 MB per orbit

    Level 2 5 GB per 16-orbit Global Survey

    Level 3 TBD, but relatively small (probably < 20 MB per Global Survey)

  9. How many users do you have or expect to have for data in HDF5, and what is your expected user community?

    There will likely be several hundred users, especially once studies using data from multiple A-Train satellite instruments become more prevalent, since TES on Aura is flying in that constellation. The communities include many areas of atmospheric and climate study, including atmospheric chemistry, air quality, aerosol studies, and climate studies. Our TES user base includes government, university and commercial users.

 

+ Privacy Policy and Important Notices. NASA - National Aeronautics and Space Administration Curator: Jody Gibson
NASA Official: Richard Ullman