Though I have been involved in a variety of projects that entailed metadata creation – that is annotation of digital content with data related to its educational use – the cost for creating this metadata was not always analysed down to the cent.
I have some evidence here and there but metadata cost is also something that is not as extensively researched in literature, as one would hope. Thinking of this today, I decided to share some numbers and figures from my own experience.
To begin with, our typical project comprised of the following phases. We usually examined the collection of content that was destined for metadata annotation, looking whether or not there is existing metadata, but also looking at the content itself.
If the content had metadata, we mapped them to our schema and then we harvested them into our target repository. Once harvested, the metadata were examined again, validating the values of each element. In cases when this was needed, a transformation could follow, making sure that the metadata values matched what was expected from our system.
After that, having the records already in our system, compliant with our metadata schema, time came for metadata enrichment. The records were examined against the actual resource and additional metadata were provided for each resource. Once the process was completed, the final records were imported into the “live” system and then published.
In all of the aforementioned steps there’s significant work, for metadata experts, technical experts and subject-matter experts alike. And all of this definitely contributes to the total cost of metadata (TCM). This TCM is really tricky and also ad hoc to calculate, and I am not going into it right now.
This post will only focus on validation, transformation and enrichment. And more specifically, it will reflect on the manual aspect of these processes, as parts of them can be supported automatically, like the automated transformation of a field that contains dates from one format to another.
Before crunching the numbers, there’s yet another hypothesis to be made. This work of validating, transforming and enriching, is the same as a process, but it’s not the same in terms of time vested in each step, when different kinds of content comes into the discussion.
So, if someone takes hold of a collection of learning resources with some limited metadata already filled in, he/she won’t need the same time with a collection of learning resources in which the only metadata is a title and a description. To this direction, the table I am sharing, takes into consideration the pre-existing quality of the metadata.
The following table breaks down the main tasks we usually carry out on the metadata records for each different metadata quality level. My best guess was that an experienced annotator would need approximately 40 minutes to enrich a poor metadata record with a set of 10-12 metadata elements.
In an effort to see if our times were reasonable, I retrieved some papers and articles discussing similar efforts (source 1, source 2, source 3, source 4, source 5). In these papers, the average time for annotation from scratch was around 30-31 minutes per resource whereas the average time for correcting/enriching metadata was around 39-40 minutes (seems that sometimes it’s more difficult to correct something that’s wrong instead of doing it from scratch).
Taking all of these into account, seems that 40 minutes per resource is a fairly good estimation for the time needed for an annotator to familiarise themselves with a resource and enrich its description.
The cost of an entire metadata record though, is another discussion, but I think that this ballpark number can come in handy in more than one occasions. Thoughts and comments, as always, are more than welcome! Similar experiences on the topic are welcome also!