There has been spirited discussion in the library world recently about next generation catalogs, but that discussion has heavily centered on systems rather than the data that drives them. I’d argue that one needs both highly functional systems and good data in order to provide the sorts of access our users demand. How we get that good data is what I’ve been interested in recently. Humans generating it the way libraries currently do is one part of a larger-scale solution, but given the current ratio of interesting resources to funding for humans to describe them, we must find other means to supplement our current approach.
So what might we do? Here are my thoughts:
- Tap into our users. There are a whole lot of people out there that know and care a lot more about our resources than Random J. Cataloger. Let’s harness the knowledge and passion of those users, and provide systems that let them quickly and easily share what they know with us and other users.
- Get more out of existing library data. As Lorcan Dempsey says, we should “make our data work harder.” Although MARC and other library descriptive traditions have many limitations in light of next-generation systems, they still represent a substantial corpus of data that we must use as a basis for future enhancements. Let’s use any and all techniques at our disposal to transform this data into that which drives these next-generation systems.
- Look outside of libraries. Libraries do things differently than publishers, vendors, enthusiasts, and many other communities that create and use metadata. We should keep in mind the cliché, “Different is not necessarily better.” We need to both look at ways of mining existing metadata from other communities to meet our needs, and re-examine the way we structure our metadata with specific user functions in mind.
- Put more IR techniques into production. Information retrieval research provides a wide variety of techniques to better process metadata from libraries and other communities. Simple field-to-field mapping is only a portion of what we can make this existing data do for us. We must work with IR experts to push our existing data farther. IR techniques can also be made to work not just on metadata but the data itself. Document summarization, automatic metadata generation, and content-based searching of text, still images, audio, and video can all provide additional data points for our systems to operate upon.
- Develop better cooperative models. Libraries have a history of cooperative cataloging, yet this process is anything but streamlined. We simply must get away from models where every library hosts local copies of records, and each of those records receives individual attention, changing, enhancing, even removing (!) data for local purposes. Any edits or enhancements performed by one should benefit all, and the current networked environment can support this approach much better than was possible when cooperative cataloging systems were first developed.
My point is, we can’t plug our ears, sing a song, and keep doing things the way we have been doing. Let’s make use of the developments around us, contribute the expertise we have, and all benefit as a result.