Tell me what it is, not how you use it!
- Published: Friday, 03 May 2019 12:29
Regarding research data management I have been telling people the importance of describing a data set as "what it is" as opposed to "how you use it".
An early example that was given to me by a biobanking expert in The Netherlands was in the description of a chest X-ray: most likely such an image can be used both to study the bone in the spine, and to study the state of the major arteries (e.g. the aorta). If such an X-ray is acquired, it is likely that it is for one purpose only, but that does not exclude a re-use in another field. To optimize the re-usability of the data (see the FAIR principles), a chest X-ray should be labeled "chest X-ray" and not "X-ray of the spine" even if it was acquired for that specific goal.
I think this is similar to a "book cupboard". Most "book cupboards" are actually "book-size shelves". Those shelves can be used to store books, but can also used for other purposes. To optimize the Findability of the right storage solution in the shop, it would be useful if the label would not express a single use.
Recently I heard yet another very good example of the same principle in an episode of the SE-radio podcast: Function names when writing computer code. It really improves the human readability of computer software (and hence the maintainability) if each function is named after what it does, rather than how it is used. The example from the podcast: do not name the function "reformat_email", but name it "remove_double_newlines" if that is what it does.
I've heard someone say "Researchers are the worst judges on the possibilities for re-use of their own data". It is true: a researcher studying the aorta will not even see the spine on their own X-ray images, let alone think about ways in which the data can be reused by bone researchers. I think labeling a data set with how it is used is a consequence of this. A trained librarian/archivist, with training in classification systems, will quickly see through such a mistake and suggest better naming.