For quite some times over the past year, I have taken advantage of small projects at work, to update my blog about them. Each time I set out to do something, even if I have done it a thousand times in the past, I always search a bit online to get up to date with recent practices. I hate taking anything for granted.
In doing so, I like to create a blog post, documenting a short story of the project, adding also what I consider as useful sources for anyone out there, tackling a similar problem or challenge. If you’re an avid reader of this blog, you may have stumbled upon some of the “greatest hits” so far… 😉
- “Making sense of ISO/IEC 19788“,
- “Standards & Technologies for Digital Learning“,
- “e-Learning Standards: MLR or LRMI?“,
- “Metadata Quality Metrics: Define“,
- “Thoughts on Metadata Quality Metrics“
I will be doing the same thing for metadata mappings and crosswalks. Needing to create a process on metadata mapping, I decided to document my work with this, creating a useful overview for all you metadata people out there. I will be thrilled to read your comments or material in the comments’ section.
So, what is metadata mapping, right? Well, put simply, it’s the process of translating metadata fields from one application profile (or schema or standard) to fields of another. Sometimes it can be as simple as matching the field “Title” with the field “Resource Title” and sometimes it can be nerve wrecking! Especially when you have a field that includes more information than the field it’s being mapped to, or when the values of the respective fields contain inconsistencies. In this process, some other operations related to metadata are involved, but will go into them in more detail in a minute.
Reading through the material I could find online, I really liked the work carried out by CARARE in the context of Europeana. I found their description of the process to be one of the most comprehensive out there, especially if you take a look at the public deliverables that are available (especially D2.3, D3.3 and D3.4 here).
So, a metadata crosswalk is actually something as simple as a table that relates elements from the native schema to the target schema. So, suppose that you want to harvest some metadata records (let’s assume they are in Dublin Core) in your repository (that uses a LOM Application Profile). A metadata crosswalk would be a table that declares that the field “dc.title” from the native schema, is related to the field “general.title” of the target schema.
And then you can be really strict about this table (only elements that are 100% equivalent are matched), having an “Absolute Crosswalking” or you can be less strict about it and have a “Relative Crosswalking” (mapping all elements in a source schema to at least one element of the target schema). Reading through webpages and reports, etc., I found my favorite expression for today, and it was by Hillman and Westbrooks (source):
“The more metadata experience we have, the more it becomes clear that metadata perfection is not attainable, and anyone who attempts it will be sorely disappointed. The key to a successful metadata crosswalk is intelligent flexibility. It is essential to focus on the important goals and be willing to compromise in order to reach a practical conclusion…”
So, why do we map metadata? Why do we create crosswalks? Perhaps this was a question for the beginning, but a late question and an answer is better than no question or no answer…! Well, to allow for a greater interoperability among systems of course. To allow them to exchange data with minimal loss of content or functionality.
Metadata crosswalks is a way to enhance interoperability on the schema level of the repositories (or systems in general) involved. Among derivation, application profiling, switching across (personal favorite), metadata frameworks and metadata registries, crosswalks are presented also as one of the ways in which you can allow for greater interoperability according to Chan & Zeng.
And then, you have mapping through crosswalks, that is so much more than a mix and match exercise. It’s a complex problem with semantic depth and also lots of technical parameters. It’s not simple (of course). And that’s why the gordian knot and Ariadne’s thread appeared before… Why? Well, for once, you can have elements with the same, or similar names, that contain different information. You can have the field coverage, containing from geographical coverage, to thematic coverage or time period coverage, etc.
These are the cases when some serious work needs to be carried out on the data. And when we don’t talk about textual data, but we also talk about numeric data, then the real fun begins, involving advanced processing on the data, transformations, etc., that are needed for the data to become uniform. This was actually one of my reasons for dusting off my R and (soon) Python skills these days.
On this topic, looking at some comprehensive metadata aggregation processes, I found the one by Europeana Sound (original slides here), which I copied in Powerpoint, just to get a version with a nicer resolution (although colours are much worse in mine).
I especially loved the timeframe and the level of detail. Make sure that you visit the link in the image caption cause the original presentation is really worth your time! So, as it turns out, this post is not just about metadata mapping and crosswalks. It’s actually about the entire process that a collection has to go through, in order to be ingested and ultimately published through a repository, or portal or aggregator, etc.
Having a renewed outlook on all of this, I will now delve into my project, looking at the entire aggregation process for collections, taking into account the specific context and unique challenges of my domain. I hope that this short post has given either some useful background, or an incentive for further reading! Having said that, this post will finish here, concluding with a short ppt that I compiled and that may come in handy.
As always, comments are more than welcome, especially if you have any online resources to share on the topic!