The other day, skimming through my folder of “to-read” material, I stumbled upon this report comnig from Europeana, titled: “Report and Recommendations from the Task Force on Metadata Quality“. As I have taken part in a project that actually contributed content to Europeana (Natural Europe, 2010-2013), I have been concerned with the quality of metadata in this context but also in the context of my PhD studies. You see, one of the experiments that I used to validate my Metadata Quality Assurance Certification Process (MQACP), involved Europeana collections.
After that, I also got to work a bit on collections coming from Greek institutions (museums & libraries) that are part of Europeana. To say the least, the level of quality of metadata was not nearly as good as someone would expect. So you can imagine how eager I was to get down to reading this report. So, having finished it, taking my notes on it, I decided to reflect on what I’ve read and share it, as always, openly and with no offense meant whatsoever.
First things, first. Europeana allows you to browse through a staggering 53 Million (!) artworks, artefacts, books and videos from cultural collections around Europe. Europeana, in its infancy, set a goal to provide access to more than 12 million resources, which it has. It has also gone well beyond that as you can see. One thing that has been worrying all those working on and around it, has been the metadata quality of the resources hosted within.
Reading this report, you will get the same picture. I am not sure if it’s worse or not, but definitely lots of the problems that were around in 2010, are still there. Having completed my PhD on metadata quality, I must admit that it feels really good reading reports like that, that outline how essential my field of study still is. Although “feeling nice” for a problem is not that appropriate, I have to admit I do. And I am also intrigued to look into more depth. The report starts by setting some criteria that make up for high quality metadata:
- Resulting from a series of trusted processes
- Meaningful to Audiences
- Clear on re-use
Well, looking at the criteria, I have to say that I am all for it. They make perfect sense. They do. I would add two criteria to the list… I would want the metadata to be (a) measurable in terms of quality and (b) scalable. Maybe this is the million dollar questions, but again, it’s the one that has to be tackled if any progress is to be made on the metadata quality front.
Facts (coming from my reading of the report):
- Metadata records in Europeana come from processes that are not transparent and can also be biased as biased comes,
- Case studies on metadata adoption and enrichment, etc., are not disseminated as much as they should,
- The minimum quality criteria that apply for each collection/item are still too low for the Europeana envisaged services on the content,
- Smaller collections are easier to manage and usually bring more quality to the table than bigger collections,
- IPR issues with metadata records are still impeding the development of Europeana, as metadata cost and therefore are valuable,
- Fixing broken metadata is more difficult than addressing the metadata creation process from scratch,
- Any effort to better metadata quality, passes through content providers that lack either the incentives or the budget to provide better metadata
My experience with metadata quality in a variety of collections, from educational to cultural and scientific, showed that metadata quality cannot be considered in a vaccum. It’s always a matter of context. There’s no golden ratio (φ=1,618) of metadata quality. Metadata is meaningful for a specific audience in a specific scenario of use. To serve diverse communities, you usually have to add value to the content itself (which also comes by adding value to its metadata). Despite that, there are some basics that have to be there in order to be able to re-use content and metadata for that matter. If you don’t have those, then it’s like flying a plane with no steering wheel or whatever. I think that this is what we should be after here. And then there’s this statement below which could not be more accurate:
“To undervalue the importance of metadata is to devalue the work of the data creator, the cultural heritage object, the institution and the audience for our cultural heritage”.
Trust is a vital component of the metadata creation process, it certainly is. Especially in Europeana but also in similar projects. I do need to trust the processes that give the metadata I harvest as I do need to trust the processes on the side of the portal that serves my content to the public. I mean, it’s like producing tomatoes and selling them directly to a mini-market. The mini-market needs to know that I produce my tomatoes in a reliable and proper manner, as I also need to know that they will be storing them properly so as to reach the final consumer as it should, ensuring that the mini-market sells as much as possible but also that I cultivate more and be profitable. That makes sense. But there’s also a limit as to how much I am willing to divulge to the mini market owner about my practices. And then, how much do the mini-market owner needs to know to get my tomatoes. I mean, shouldn’t there be a minimum set of farming practices (like in metadata practices) that we trust that the farmer handles correctly? It would be nice, but it’s not the reality of it, unfortunately. Especially when such processes are somewhat of a competitive advantage in some cases. How can you share the one thing that gets the food through the door? You can’t and you won’t. On the other hand, there are ways to force quality in metadata, once you incorporate feedback from the users’ side. What happens if you let the “X museum” know that their materials are the least accessed in all Europeana?
The document does offer some useful guidelines and suggestions. Some that have been around for quite some time in the relevant literature of the field of digital libraries but also in the field of digital repositories in general. It’s nice to have metadata crosswalks, metadata guides and metadata templates as supporting material for annotators. It’s nice to use automated methods and linked open data to try and incorporate additional metadata to resources (we may as well ruin them while trying to do so). We know that metadata should be standardized and also useful and meaningful to users. We fully understand the need for open licences for metadata as well, to make sure that content is used across platforms and value is added to it. But where’s the epiphany in all this? I think that the findings are a nice first step,and that the epiphany will surely follow!
We have already figured out the fact that “manual” is something you can’t do without when it comes to metadata. I mean, you don’t create content automatically, why should the metadata that describe the content you create be 100% automated? The need for automation-supported manual processes for metadata quality assurance is clear. It’s nice to see that more and more people accepting metadata as an integral part of each resource and not a “by-product” of a digitization/preservation process (really loved the concept introduced in the report). Although, capturing the by-products of the resource use, maybe a key for better quality rather than the cause of the problem.
The other statement that caught my eye was in the conclusion of the report, stating that: “Having first established what metadata quality is in this context, we can now go further, investigating how to improve or measure metadata quality in the future“. It seems to me that you can define metadata quality as a starting point, but you can never completely define it, unless you manage to define measures and metrics. And then, you can validate your claims on quality once you use your metrics and measures to adequately assess it and maybe improve it. Having said that, I am really looking forward to the next outcomes of the Task Force to see the ideas on metadata quality assessment.