Yesterday, I came across some really interesting research on Metadata Quality Metrics. It is the kind of research that feels so relevant and to the point. It is actually like someone else picked up from the last sentence of my PhD, and continued from the point that I stopped.
During my PhD research I had the opportunity to look, among others, at what constitutes a high quality metadata record, researching related literature. And then, based on my findings I tried to measure this quality in a large number of metadata records. I introduced processes and methods to improve metadata quality and then measured again, to see if any improvements were made (and they were).
I feel that the time has come to look again at metadata quality metrics, as the field has evolved quite a lot since I wrote my last chapter in 11/2013. Metadata quality metrics are well-defined still, but the ways in which they are measured and the resulting quantitative measurements have come a long way it seems.
In my research, I looked at the literature and produced the following table that outlined the metrics that the most prolific literature used to assess metadata quality.
Based on these outcomes, and also looking at the definition of each metric, merging some of them that were similar, I decided to use the following metrics in my Thesis:
For all of the above, the goal was to make sure that:
- They can be normalized in order to be comparable,
- They possess an interval scale so as to make sure that differences in measurements are consistent throughout the value range,
- They can be interpreted accurately by users. This means that they should be defined in a simple and straight-forward way,
- They can be adaptable in various contexts to express metadata quality for different types of repositories,
- They are feasible, meaning that the metric should be based on input parameters that are determinable but also scalable and affordable on the economic side
And then, based on the work of Clements and Pawlowski (2012) and Wilson & Town (2006), we proposed the Metadata Quality Maturity Model (MQMM) that offers five levels of metadata quality maturity for all quality assurance processes:
- Level 1 (Initial): The metadata quality process is ad hoc, and occasionally even chaotic. Few methods are defined for ensuring metadata quality and their success depends on individual effort.
- Level 2 (Repeatable): Basic metadata quality methods are established. A basic process is in place to repeat earlier quality levels.
- Level 3 (Defined): The metadata quality process is documented and standardized.
- Level 4 (Managed): Detailed measures of the application of the quality process are collected. The quality process is quantitatively understood and controlled.
- Level 5 (Optimizing): Continuous quality improvement is enabled by quantitative feedback and from piloting innovative ideas.
Even back then, I had already read the amazing work that was carried out by Xavier Ochoa and the late Erik Duval, on Automatic Evaluation of Metadata Quality in Digital Repositories. The metrics introduced were really interesting for me, but at the time of my PhD I tried to keep a balance between manual and automatic, so I did not go into great depth with their approach.
I remembered Erik and Xavier again, the other day, when I came across some interesting research from Peter Kiraly that builds upon existing research on the topic of automatic evaluation of metadata records. And it even takes it more than a step further, which is what research is supposed to do I guess, but rarely succeeds! Some “hard evidence” on this research can be found here and one first implementation is waiting for the enthusiasts, here.
For all of us that LOVE metadata, the volume of records analyzed and presented in this work, is breathtaking to say the least. For me personally, this was a wake-up call to get back (again) to what Erik has been preaching all along and take a closer look at it. Looking at the quality metrics in Peter Kiraly’s work are not too far away from the ones that I have used.
- Conformance to Expectations
- Logical Consistency & Coherence
“Completeness”, “Accuracy” and “Consistency” are there. “Conformance to Expectations” is what I termed as “Appropriateness”. In some cases, “Provenance” can also mean “Correctness” and “Objectiveness” in my approach, whereas “Accessibility” and “Timeliness” are two metrics that also make lots of sense.
I think that I’ll stop here for the time being and I will come back once I get some additional reading and studying done. From a professional point of view, my interest in this would be on how we can translate the outcomes of similar analyses into concrete, actionable intelligence for metadata annotators for specific types of digital content (in education, research, culture, etc.) to be able to support high-quality metadata generation.