By Jayne Kelly (Ebooks Administrator, Collections and Academic Liaison Department, Cambridge University Library) and Clara Panozzo (Latin American & Iberian Collections, Collections and Academic Liaison Department, Cambridge University Library)
During the COVID-19 pandemic, two colleagues from different areas within the Collections and Academic Liaison department at Cambridge University Library have tackled problems related to Open Access books’ metadata and accessibility. Here you will read about the particular case that sparked their conversations, and the challenges that librarians encounter when dealing with Open Access books.
In spring 2020, lockdown began and our focus turned to electronic publications, so that we could still guarantee our readers access to relevant resources during this period. It became apparent to the Latin American and Iberian Collections team at Cambridge University Library that work had to be done on the bibliographic records pertaining to publications by CLACSO (Consejo Latinoamericano de Ciencias Sociales, a network of 700 research institutions from 52 countries with a rich Open Access ebooks catalogue). CLACSO is an excellent publisher and it is essential for us that their publications are available for our readers.
The CLACSO titles form part of the ‘Open Access books on JSTOR’ collection. To aid efficiency when working with Open Access ebooks, the Cambridge ebooks team activate the relevant electronic collections and associated metadata from the “Community Zone” of their current library management system (LMS), Alma. Rather disappointingly though, the bibliographic metadata for Open Access collections is often very poor: only partial title and publisher information, no subject headings, no author or editor details, not to mention content details.
The consequences of having poor-quality bibliographic records are several. Firstly, it can cause confusion for readers or make the books altogether undiscoverable. Secondly and as a result, library staff might purchase print copies when they don’t need to. Thirdly, there are also decolonising implications. The current pandemic has created an understandable push for libraries to make online resources available to their readers. Consequently, in many institutions, budgets for print material have been reduced to compensate — yet regions like Latin America still have only a small (if growing) presence in terms of ebooks and e-resources, with many resources available in print only. As a result of these trends, access to diversified material (i.e. extending beyond mainstream scholarship from the Global North in English) has been restricted in the past year. If, on top of that, those Open Access e-publications from the Global South that are available are not in fact discoverable, we, as librarians, would feel we are failing to provide fair access to research and knowledge.
Considering these concerns, work was undertaken at the library to update nearly 200 CLACSO records by adding data on authors, editors, subjects, series, contents, bibliography, etc., so that readers consulting Cambridge Libraries’ catalogue can find and access them through more complex and diverse searches.
We are aware that this is just a tiny drop in the ocean. However, this work has led us to discuss the broader issues and challenges that poor-quality metadata for Open Access books imply:
1. Many publishers are providing satisfactory or high-quality metadata, which is available both for librarians to retrieve and download into their own library catalogues (if they have the capacity) and for library management system suppliers to harvest. One of the issues is that when certain LMS suppliers take that data, to put it very simply, they appear to disassemble the records and use only part of the data to create new and often much briefer, basic records. These newly created records are then fed into library customers’ discovery layers, or library catalogues, where they have activated the associated collections. These bibliographic records often lack relevant information, as detailed above. In this way, there is a disconnect between the publisher-supplied metadata and the data that libraries end up including in their catalogues. Although libraries could choose to use the publisher-supplied records direct, this is inevitably a more time-consuming process (staff would have to keep an eye out for publisher updates, for example). The LMS route is theoretically the most efficient and, for most large academic libraries, the only feasible way to manage Open Access monograph catalogue records. The quality of those records, however, inevitably creates more work for library staff down the line and can result in libraries purchasing print copies when they don’t need to.
2. Enriching and enhancing records is labour- and time-intensive. This sort of homegrown work was only possible because of the situation created by the pandemic in March, when library buildings closed, and cataloguers were displaced from their physical collections. Ex Libris (the supplier for our LMS, Alma) periodically does enrich collections of basic records in their Community Zone, using data obtained from the original supplier/publisher. The regularity and schedule of enrichments is not transparent and for those librarians awaiting better-quality records, the process seems at best to be sporadic. Most academic libraries rely completely on these MARC records; they don’t have large cataloguing teams who can systematically check and improve poor-quality records. The expectation is that the LMS should provide catalogue records of a decent quality in a sensible timeframe. Having to wait for months and months for higher-grade records to appear, and even to be told that their arriving at all may depend on criteria beyond our control, is not acceptable for librarians or their library users.
3. It is possible to contribute locally enhanced records back to the Alma Community Zone, where they might be shared by other libraries, but there are issues with this. These are to do with license restrictions between different catalogue record suppliers (we often source our records from OCLC) and a fear of good work being over-written further along the line. There is also a sense of a general lack of control over records in the Community Zone.
4. Owing to the dispersed way in which Open Access ebooks are created and hosted, the same book title will often appear in more than one OA collection. For example, a title published by a given publisher will appear in that publisher’s OA collection, as well as perhaps the OAPEN and DOAB collections. This leads to messiness and duplication in the library catalogue (see example below).
5. As a result of the problems with the quality of the ebook MARC records that arrive from the LMS, eresources librarians and cataloguers are forced to employ different methods to manage their various ebook collection records. For example, for purchased ebook collections in Cambridge, we prefer to use publisher-sourced data where we can. This requires retrieval from the publisher’s data-feeds, or the records are emailed to us directly. They need editing to add local fields and then loading into the library catalogue. This process is repeated for each update, be it monthly or quarterly. For Open Access ebook records we rely on the LMS-created metadata, as there is no capacity to do otherwise, and the resulting problems are well documented above. This ends up being a complicated landscape to manage and to explain to colleagues and end users.
6. Alongside the quality issues that librarians can find when using LMS-supplied/created catalogue records for Open Access ebooks, there can also be coverage issues. Sometimes newly (and not so newly) published titles are not added to the collections in a timely way, thus creating coverage gaps. Librarians must submit customer-support cases to request that updates are made when these gaps are discovered. This process can take months to be implemented – even though publishers’ metadata feeds are freely available for the LMS suppliers to harvest the new data from at any time, and often librarians will link to the relevant data feed in their requests. It is obviously understandable that there will be a certain lag with keeping collections updated with new titles, but it is not acceptable for that lag to extend to three months and often much longer than that.
7. Linking errors – Open Access ebook records are often raised in troubleshooting queries, such as when they link to the incorrect book. For large collections such as Project Gutenberg and the Biodiversity Heritage Library, there seems to be a high proportion of titles that don’t link to the content described in the bibliographic record. The described title is normally not available within the Open Access collection to which it purports to belong. This causes frustration for our users and can often lead to the library having to purchase the ebook, as the errant records raise expectations. In each case the library must report the case to the LMS provider and then wait for a correction to be applied.
Open access books are invaluable to libraries and their users, but poor-quality LMS-supplied metadata can inhibit their discoverability and cause unnecessary work for librarians. As this post explains, a book might be easily available via Open Access, but there is more to be done to make it easily discoverable for readers.
A postscript from the OABN: readers of this post might be interested in an OASPA webinar taking place at 3.15pm – 4.30pm GMT on 24 February 2021, ‘Open Book Metadata’. Details of the webinar are available here.
This work is licensed under a Creative Commons Attribution 4.0 International License.