Everything just broke: HDF5 woes on Ubuntu 18.04
I wrote a system for capturing data and placing it into tables using HDF5. The system ran for 6 years or so, almost every single night, through various updates and upgrades. Then one day, I took production systems and just opportunistically updated them because I had some time on my hands. (I know, I know. It’s terrible dev-ops, but I’m a rag-tag operation here. Just look at this blog.) Oddly, as of two weeks ago, with no code changes, my system started generating .h5 files that were unusable. I only detected this problem as of 3 hours ago as a result of getting some bad calculations out of a batch job.
I’m left thinking it was some kind of library change or library dependency that broke. I actually don’t know the exact reason (yet) why things broke, but I am guessing either some 1.8 to 1.10 issue or some library dependency triggered this new problem. Either way, I don’t care that much — I just want access to my old data back.
Here are the symptoms and when I knew I was in trouble:
HDF5 1.10 on Ubuntu no longer let me read read-only archives of files that were being read from and used daily for years. h5ls failed. h5dump failed. hdfview failed. When I tried to use h5ls or h5dump, they both just simply complained that they were unable to open the older files.
If you’re panicking about data being inaccessible (like I was two hours ago), I have a temporary work-around. (I will update this post if anything changes.) You need to drop back to the older version of hdf5 — and you need to build it from source.
git clone https://bitbucket.hdfgroup.org/scm/hdffv/hdf5.git
Then, once you’re in the hdf5 directory from the git clone:
git fetch origin hdf5_1_8
git checkout hdf5_1_8
mkdir build
cd build
cmake -DHDF5_ENABLE_Z_LIB_SUPPORT=on -DZLIB_USE_EXTERNAL=off ..
Then build. Once it’s built, you can ‘sudo make install’ and the default configuration will put the installation files in /usr/local/HDF_Group. Here, you’ll find a set of builds of tools like h5ls, etc. You will also find development headers and libraries you can link your own utilities to.
If you set your linker and include paths properly and rebuild your utilities against this library, you’ll at least be able to get back into your data. This will gave you some time with your data before you have to ponder future storage needs. In my case, I’ll be using the older library to re-encode the last two weeks of data.
Perhaps it’s time to try Kudu or Parquet? Who knows? In any case, I hope this helps random people who might be flipping out over data loss.