Back in May last year I wrote a fairly technical post about database clean up. This post is a follow-up providing some stats about the ongoing results of the work that people have done to improve the quality of our database. We've been looking at the "health" of our bibliographic database & I thought I'd provide you with a few of the stats from this work & also provide some context for the stats.
We started building our consortium in May 2012, with the last library joining the consortium in September 2014. During this period we merged the bib records of 80 different LMS databases into the OneCard database. While we tried to do a "match and merge" when importing the records from each separate LMS, inevitably different local cataloguing conventions and other issues prevented us from having a "clean" database with perfect matches of all bib records. In fact the process resulted in a significant issue where we generated multiple records for the same title.
Over the last few years there have been several "blitzes" where staff from many libraries have contributed to merging records for the same title into a single bib record. These blitzes have tackled areas of the database where any automated procedures cannot merge records. However we have also worked with SirsiDynix to run various automated scripts which have had a significant impact on reducing the number of duplicate records.
The result of all of this work is really very positive - even if there is more work to do. Below are some statistics from 2014 until now:
We started with 1,154,576 bib records, and these have been reduced to 960,989 - a reduction of 193,587 - or approximately a 16.8% reduction. And it needs to be noted that during this period we kept adding new bib records for all of the new titles we purchased. This is reflected in the number of items in the database. We started 3,909,921 items, and this has reduced ever so slightly to 3,887,175. So we're got about the same number of items, but considerably fewer bib records.
This change is reflected in another stat that the team uses to measure change - which is the average number of items attached to each bib record. this figure has increased from 3.39 to 4.04. This mayn't seem like a large increase, but over a database of our size this is a considerable achievement.
And finally - we've had an internal KPI of reducing and sustaining the number of duplicate bib records to below 5%. This figure was set as a target when we identified that there were 13.7% of records that were duplicates. All the work across the network has got us to almost reaching this first target. It currently sits at 5.4% - so a huge improvement over a relatively short period of time.
All of these stats are great in measuring how improved our database has become, but in reality they're a means to several ends - the customer experience and efficient service provision. Reducing the number of bib records the customer has to wade through is important. It is also important that when a customer places a hold on a bib record that they have access to all of the items in the network as they're all attached to a single bib record. And likewise, from libraries' point of view we want to be shipping the 1st available copy, rather than choosing from an incomplete list of items attached to one of several bib records.
There is obviously more work to do in this area & PLS will work with libraries to continue to improve the user experience as well as the efficiency of the system. We will keep you posted as we continue to progress these changes.