! ! ! WARNING ! ! !
Open, honest and probably naive content ahead.
Proceed with caution!
As I said the other day, the last weeks or so, I have been catching up on some reading, or rather re-visiting some influential literature on the metadata quality metrics. I have been looking at the discussions and research findings and I am really trying to wrap my head around the use of specific metadata quality metrics, both manual but also automated. My gut feeling is still that the best scenario or solution lies somewhere in the middle of the two approaches. I think that it’s also apparent from literature that there’s no all-encompassing solution to the problem of low metadata quality.
“Manual” is good in its own way. It’s high-quality most of the times, it’s reliable and “intelligent” in a way that automated means are not (at the time being at least). In the same sense, “manual” is a value-adding activity for digital resources. On the other hand, it’s costly and it does not scale. And to be 100% honest, some of the problematic metadata that keeps us wake at nights, came from “manual” processes.
On the other hand, “automated” has its virtues. It facilitates hard and time-consuming processes and allows us to focus on what’s really important. It’s reliable but sometimes it can cause more data problems than it solves (at least if not used wisely). Although in the past “automated” was accused a lot, research on information quality has proven that far-fetched dreams we used to have, are not that far-fetched anymore. Having said that, it also remains a fact, that some of the problematic metadata were created by automated processes that meant well but did not deliver.
Having the habit of using seemingly unrelated imagery in my posts, I googled automated versus manual and suddenly lots of images came up that were mostly car-related, like the one above. And then I started thinking about manual transmission versus automatic. The metaphor there is not difficult to spot, but in many ways, transmission is like metadata. It seems that if you have a smooth road ahead, a long stretch with no curves and no bumps or high elevations, automatic will serve you nicely. You can focus on more important stuff like chatting in the car, looking at the scenery, etc, improving your overall driving experience. On the other hand, when the road is tricky and when you need to change gears to make sure that the engine does not stall, and when you need to keep your speed when entering a high-elevation, closed curve, then manual is the way to go. Of course it burns more fuel than the automatic one (another analogy there) but it’s a cost that you are willing to pay if you want the full driving experience (with less scenery-looking and less conversations in the car).
Come to think of it, on a certain level, isn’t this a matter of how much control you are willing to relinquish to your car, in order to enjoy your travel. With manual, you get your hands dirty and maybe then you give everyone in the car a faster ride and possibly, if the road is trecherous, a safer one. On the other hand, with the automatic part, you don’t care about times and you make the trip, being fuel-efficient and all. Again, how about being able to switch between manual and automatic? Sounds like the sweetest deal right? Maybe so, but then again, metadata is not to resources what transmission is to a car, no matter how much I love metaphors. So let’s get down to the details and look at metadata quality metrics a bit, or at least some of them.
Completeness: Automatic is the way to go, every day of the week on completeness. It makes perfect sense. The only thing that kind of confuses me is the weighting of the metadata elements. It sounds right and it can be customized based on the importance that each context gives to different metadata elements, but it feels like considering some elements more important than others, kind of steps into the other metric called “Conformance to Expectations”. On the other hand, as a metric that is subject to interpretation, I guess that having the completeness for each element, when we look at the numbers, we all go like “OK, completeness in X element is low but it doesn’t really make a difference“. Sounds like a thought you’ve had in the past, right?
Accuracy: Well, accuracy talks about correctness of a record and although this is black/white for absolute elements (file size, etc.) it’s not the same with textual ones. In the case of texts, (as Ochoa & Duval point out) we usually create a vector for the text contained in the original resource and a vector for the text in the metadata record and then we use a vector distance metric to find the semantic distance. This can go a step further using methods to also look for synonyms or words with close semantic relation so that we can avoid mistakes. But again, this method cannot work for images that are annotated e.g. with educational metadata in order to be used in education.
Conformance to Expectations: Well, this is about the degree to which metadata instance fulfills the requirements of a given community of users for a specific task. This is quite a tricky one, in the sense that requirements of a community are diverse. In some cases, this is also affected and led by completeness. For example, a community of teachers will definitely find useful the element of “Educational Description” in LOM. Is completeness enough to say that the field is conformant to the expectations of users? I think not. Bit I do think that looking at “expectations” and translating them is a key to the entire process.
Entropy has emerged as a nice metric here, measuring (loosely defined) the uniqueness of some terms the in metadata record, that make the resource easier to find, in a sea of other resources with similar descriptions and no word “standing-out”. Again, this one can work, but what about resources on similar topics that are described with high quality metadata. How would we distinguish? Or maybe entropy, when looking at a small sample of similar resources of a narrow collection on the same topic, can then be used to identify the worst metadata records and not the best, i.e. the ones that deviate from a pool of nicely annotated ones… For text fields the TFIDF can be used, although it’s optimal when you have access to resources themselves. And of course, it doesn’t work with pictures that well (except if you compare words in the metadata of a picture with all the metadata records of similar pictures).
The interesting thing I see about entropy or TFIDF here (again the work from X. Ochoa and E. Duval is my reading material), is to be able to deduct an optimal number of words used in each field, that results to a nice entropy or this Qtinfo metric. It seems that as the number of words in a description increase, this metric will not increase at the same rate. Maybe this is how we find the “sweetspot” of optimal words in a description field.
I see that the post is becoming too long for one sitting, so I’ll stop here and come back with some thoughts on the rest of the metrics identified in relevant work. Feel free to drop a line and share your thoughts or any developments that maybe relevant. I am partially justifying and partially brainstorming here, so the prompt in the beginning of the blog post, still applies! So, please don’t shoot the pianist, just grab the mic and sing your song in the comments section! 😉