Document Actions

Review-4

by Ananth Rao last modified 2007-09-20 11:28

Univeristy Research Organization

Review of operational suitability of HDF-EOS5:

NASA's Earth Science Data Systems Standards Process Group (SPG) is considering the HDF-EOS5 for adoption as a community standard. This is the second review of HDF-EOS5, this one focusing on its readiness for operational use. The questions below are provided to guide feedback from data systems, application providers, instrument teams and others. You only need to answer questions applicable to you. Please send comments to spg-rfc-008@lists.nasa.gov.

  1. Describe in a sentence or two your overall experience related to HDF-EOS5 (e.g., science data provider, science data systems, software tools developer, and science data user, etc).

    I am a science data provider as well as a science data user of HDF-EOS5 data.

  2. Do you currently use or plan to use HDF-EOS5 in a production setting? What types of applications do you use with HDF-EOS5? Is HDF-EOS5 applicable to your applications (e.g., Does it work well with the data types and data manipulations in your application?)

    Yes, my team is using HDF-EOS5 in both Fortran and C++ production code. HDF-EOS5, with the underlying HDF5 is a great format for storing and distributing our science data. We have been able to easily store all of the data in the organizational structure which makes sense for the data. We have not needed to make any concessions on the ideal structure. We have also found HDF-EOS5 easy to use in utilities such as IDL and Matlab, using the tools provided to access the HDF5 library.

  3. Why do you choose to use HDF-EOS5 over other data formats for your applications?

    The use of HDF-EOS5 is mainly due to heritage and the NASA directives to use HDF-EOS2 early on. The HDF-EOS API allowed us to easily migrate from HDF4-based files to HDF5-based files. This migration would have been much more difficult if we had not had the HDF-EOS API hiding most of the HDF4 to HDF5 API changes from us.

  4. Have you or your users encountered any difficulty when using some of the data access or visualization tools (e.g., IDL, GrADS, ..) on HDF-EOS5 data files? If you have, please provide a brief description of your experience.

    Yes, we had a problem once using IDL to read HDF-EOS5 files because of a problem with the underlying HDF5 library. The problem occurred when HDF5 corrected a bug which changed the internal format of their files. Older versions of HDF5 were unable to read the newly created files. Because IDL included an older version of the HDF5 library internally within their application, files which were created with the more recent version of HDF5 were unable to be read. The only solution was to wait for IDL to issue a new release containing the then current library. Since HDF5 does not usually have a backwards compatibility problem, this delay of versions within IDL is usually not an issue.

  5. Does the performance of HDF-EOS5 you have experienced meet your requirements? (e.g., Can it handle the data types in your applications? Does it take a long time to read and write HDF-EOS5 files?)

    Yes, HDF-EOS5 meets our requirements in both the ability to handle our data types as well as its performance. We can write our data files, using HDF5 internal compression and have it not make a significant impact on processing speed.

    That said, it should be noted that it is quite possible to easily create a file where I/O performance is unacceptable. This can occur when data is written in little pieces and performance is degraded even more if compression is being used. First time users can fall into this trap fairly easily. One of my first files had this problem, and a quick consultation with the HDF Group via the help desk led to the discovery and solution of the problem.

  6. What operational challenges or limitations does HDF-EOS5 present? (e.g., Does it take a long time to learn how to use it? Does it require advanced processing power, large amounts of memory, complex configuration, etc).

    HDF-EOS5 does take some time to learn and requires a few calls in order to write out even the simplest data. It is more complicated than saying "write" or "print" in Fortran/C programs. Data providers can aid the reading process by providing a sample code to read their data.

    A concern that I have is that support for HDF-EOS5 continue, especially in the realm of being a data standard. When I first started using HDF-EOS, I had extensive email correspondence with the HDF-EOS help desk. A new user also needs to have this capability and as far as I am aware, Abe Taaheri is the only person left on the project to both answer questions as well as keep the software current.

    Another minor challenge is that the HDF-EOS5 package actually consists of two libraries (HDF-EOS5 and HDF5) maintained by two different organizations. When one encounters a problem or has a question, it is not always clear which organization needs to be contacted.

  7. What benefits does HDF-EOS5 present? Do the benefits of HDF-EOS5 outweigh the challenges? (e.g., Does it offer the flexibility you want to package the data types in your applications? Does it facilitate interdisciplinary studies?)

    As I've stated before, the HDF-EOS5 API was able to successfully hide most of the changes to the underlying HDF library. While this is for the most part a benefit, it can also present a challenge. The challenge is when a new feature is added to HDF5 which is not readily supported through the current HDF-EOS5 interface. As HDF5 continues to be actively developed, it is important that HDF-EOS5 be maintained just as actively.

  8. How much data do/will you provide or archive in HDF-EOS5? (number of distinct data products or data sets, total data volume, number of files.)

    Our entire HIRDLS data product set from our NASA atmospheric satellite mission is being stored and distributed in HDF-EOS5. The current delivered/archived data product is currently broken into two data files. Each file can contain information for up to 12 different chemical species and also includes information useful in cloud and gravity wave studies. The size of the two files we are currently archiving can be up to 500 Mb combined per day. There will be one file of each type, each day, for the length of the mission. The HIRDLS mission extends from late January 2005 to date. We expect to reprocess the entire mission a number of times.

  9. How many users do you have or expect to have for data in HDF-EOS5, and what is your expected user community?

    Researchers who are interested in atmospheric chemistry data and related topics will be the users of our data. If you need to know how many people may access our data products, I would suggest contacting the Goddard DISC (previously known as the DAAC). They are the main distributors of our data in the US. RAL in the UK will be distributing the data overseas. I would assume both organizations have projections on this.

 

+ Privacy Policy and Important Notices. NASA - National Aeronautics and Space Administration Curator: Jody Gibson
NASA Official: Richard Ullman