Thursday 25 August 2016

Big Data - What we're finding out about a State-wide collection

Now that the SA public library network has all of its library branches using a shared Library Management System we can begin to do some analysis on both the composition of the State's public library holdings, and customer use of this collection.

We have started running some reports which show us which communities are making the most use of the collections of other libraries, and which libraries are supplying the items to fulfil that demand. We're also looking at collection sizes & how they compare to the size of the communities they're meant to be serving.

We have also shared the dataset of our collection, to be used by students and others who're interested in analysing our collections and also in finding new ways to represent the data using interesting tools.

A university student, Keren Sutcliffe was interested in looking at our Non-Fiction holdings & also in using some visualisation tools to show the data. While I haven't had time to dig deeply into the data the ways in which it is presented here is fascinating. 

Rather than looking at standard columns & rows in spreadsheets, or even the standard graphs produced by Excel, these representations are really engaging.  They're also interactive.  You can hold your cursor over a part of the display to get more data.  While this isn't new in itself, it is interesting to use on these data representations.  For example, while it is easy to see that items with the Dewey number 641 is the largest single collection libraries hold, hovering over this square tells you that 641 is the Dewey number for food & drink, and that we hold 15,880 titles and 54,733 items with that Dewey number.  And at the other end of the scale I can find that for the Dewey number 497 (North American native languages) we hold 3 titles and 4 items.

There is also a good, simple bar graph which shows how many titles and copies we have in each Dewey hundred group.  This one shows us at a glance that our largest collection in this area is the 600's - Technology & Applied Sciences, where libraries hold 77,586 titles and 236,651 copies, at a ratio of almost exactly 3 copies per title.  While out 800's - Literature & Rhetoric, while not the smallest group at 19,293 titles and 45,507 copies has a title to copy ratio of 2.36.

This is all very interesting, but not overly useful at this stage.  However we were talking in the office about this yesterday & we're thinking that the next couple of datasets we could look at would be the lending patterns of the NF collection & then see whether our collecting patterns reflect demand.  We could then use these visualization tools to show the "hardest working" parts of our collection. 

However it is always difficult to draw firm conclusions from such information.  Are people borrowing items because they're there & on the shelf, or do they have demand for more content in some subject areas, but libraries don't have sufficient stock in these areas?  We can't see unfulfilled demand. 

The good news is that we have the data & the tools, as well as access to people with the skills to provide us with these sorts of representations.  The more complex bit will be both the analysis of the data, and then trying to see patterns and causes. 

So, if you're interested in some high level collection overviews and you want to see the "ground floor" of our collection analysis journey then this is really worth looking at.