Is there a need for a research data management specialism?

Fire damaged chemical lab Hiroshima; from Wikimedia; public domain


There is an interesting difference between how risks are often approached in a research lab where a lot of data is handled and in a chemical lab. Many people working with data regularly encounter problems like not being able to locate data quickly or not being able to reproduce results exactly, but do often they think these problems are an integral consequence of working with large amounts of data, and do not recognize these are problems with the data management practice and preparation. The equivalent in a chemical lab would be researchers thinking that daily fires and explosions naturally belong to working with chemical compounds, rather than recognizing these as a consequence of bad lab practice and bad preparation.

There is also resistance to the uptake of a data management specialism because many researchers think that data management is relatively easy. Everybody has a computer at home, and many maintain photo libraries. However, this experience does not directly translate into work with large amounts of data in the lab:

  • Data in the lab is often 1-3 orders of magnitude larger than a photo library at home. A maintenance job that costs an hour for photolibrary would translate into more than 6 months of work in a large data-intensive project. Because of this, there is really a need for different approaches.
  • Data in a photo library consists of JPG files and maybe RAW files, and these files have simple 1 to 1 relationships. In the lab there are many more different kinds of data, and the relationships are much more complex.
  • A photo library is usually maintained by a single person. In the lab, the same data is worked on by different people, and they must each be aware of everything that is done by the others.

And in fact, even in a photo library at home one can not always quickly find what one is looking for.

No Solar Power

On December 27, 2014 during the night quite a nasty snow cover started to come down on a warm earth. At some point the snow stayed, but it stayed wet and icy. This managed to cover part of our solar panels in a thick layer of snow and ice. Since the cover was incomplete, the inverter refused to start and gave repeated errors. We ended the day with 0 Wh of power, a first for our installation. During the night, pieces of ice and snow kept coming down, giving rise to scary cracks and crashes on the roof.

December 28 I went to the roof around 13:00 to clean the last bits off with a broom: this was a dangerous operation from below: patches of ice and snow up to 50 kg crashed down on grass and driveway. The result of this action is quite apparent in the graph of solar power on that day!

Unstable operationsIncomplete snow cover

Please do ask questions at a lecture, except...

Via twitter, I saw a very cynical remark about asking questions after a scientific lecture with a flow diagram discouraging most people to ask anything at all. This does not at all correspond to my experience organizing symposia and conferences. Most of the time, questions are very welcome, and people are way too shy to share their visions. I therefore made a rebuttal in the form of the following flow diagram which I think is a better representation of the line of thought to follow.

Rotterdam CS renewed

New hall

Over the course of the last 4 years, Rotterdam Central Station has been completely renovated. Many times I have taken pictures (with my phone's camera, sorry), focusing on the area around track 1-3 where the action started and ended. For a documentation on four years of change, look at the pictures in my flickr account.

Reptile Zoo

PythonA few weeks ago I visited two reptile zoos together with Maxim: one in Breda and one in Tilburg. Click on the picture to see a sampling of a few pictures I took that day. I've been experimenting with some hand-held HDR pictures (three exposures each) at high ISO. All pictures have been post-processed through DXO Optics Pro 9, the HDR images have been aligned using Hugin and mapped using Luminance HDR.

Five star rating your own photos

Have you ever been wondering how to use the five stars in your photo catalog? I’ve heard people say: there are only two kinds of pictures: pictures you could show to someone, and pictures you wouldn’t show to anyone. Isn’t choosing between zero and one star enough?

Read more: Five star rating your own photos

Max OS-X: Shrink PDF files in the Finder

Today I finally got around to use Automator, and make one of the command line scripts I have been running for ages a little easier to use.

The problem I have been trying to solve is the fact that the PDF files that a Mac creates are normally of very high quality and hence large. If you just want to send someone a document for reading on screen, a much smaller PDF file would do. The open source package “ghostscript” has a tool called ps2pdf that can be (ab)used to adjust the size of components for PDF files. I installed this in /opt/local/bin using the “macports” software.

Read more: Max OS-X: Shrink PDF files in the Finder

Very odd ratio

I was reading some news when I noticed a mathematical curiosity. The article on a physics result mentioned a chance of “one in ten to the minus 7”. Of course this is a mistake: a small chance is either one in ten to the 7” or “ten to the minus 7”. The combination of “one in” and “minus” is nonsense. Interesting enough, this mistake is really common….

#math #oops