Sierra API
&
Visualized Data Analysis

2017 SCIUG Conference at Chapman University
(Oct. 10)

Presented by Seong Heon Lee, Systems & Technology Librarian Hugh & Hazel Darling Law Library, Chapman University

Welcome everyone. Hi, my name is SeongHeon. I am a systems & technology librarian at Chapman University Law Library. First of all, thank you all for being my audience. Today, I want to share about Sirera SQL API and Visualized Data Analysis. This presentation is motivated by my summer project, that creates an automated/visualized report using Sierra SQL API. Even though I am still on the progress of the project, I found some useful things that I want to share with you.

When SCIUG requsted presentation proposals, they also sent out a list of interests. I find that many members are interested in alternative ways of doing library business analysis. And I could also see an interest of Sierra API. I thought that two interests can be combined as one subject. This presentation would be that one. I hope that you learn some useful things with this presentation.

Final Product Preview

Circulation Transaction Report

Why (1)?

Sierra has Web Management Reports

How many of you are using WMR?
What is your user exprience?
Decision Center

My first question is why we do need a custom report program. As you know, Sierra provides with a build-in report prgoram, called Web Management Reports. How many of you are using WMR? What is your user exprience? Personally, I do not have many words to say about it, because I did not use it much. Preparing this presentation, I visited the report program to make a quick evaluation. My first impression is that it has not been improved for a long time. UI is old and not intuitive (Millenium logo, complex parameter setup). I know that Innovative has a new data analytic product, Decision Center. Again, I have no word to say about Decision Center, because I have no exprience. But I am sure that not all libraries are affordable to purchase the product. Particularly for the libraries running a small or medium size of colletion, I am not sure how much the product will pay off, even though they can purchase it.

Library administrators try to find a right balance in using money and its ROI, as other business operators do. I think that Sierra API (SQL, REST) are great tools to leverage the gap. A local custom report program might not do all things that Decision Center does. However, it can be sufficient to take care of our business.

Why (2)?

Who uses what?

Compare resource usages among libraries
Compare by other criteria:
- locations
- patron types
- item types
- transaction times
- etc...

Haven't you ever been curious about how different libraries use Sierra resources? Chapman University have three branches setup in Sierra, Leatherby, Law Library, Brandman. Our library director, in one day, threw a genuine question. Leatherby and Law Library equally share the cost of Sierra. In terms of the size of collections and users, the main library Leatherby should use more resources. That was a legitimate assumpton. She wanted to see some factual data to prove or not prove that assumption.

This is one example of how data analysis can be useful. However, our interests can stretch beyond this. We may want to compare library resource usages not only by libraries but also by locations, patron types, item types, transaction times, etc.

Why (3)?

Why not?

Plenty of data visualization tools
Sirera offers APIs (Direct SQL & REST API)
Customize a data report as you wish

Why not? There are many data visualization tools available. Many of them are open source. For example, D3 javasript library is very popular. You may heard of many buzz words like big data, AI, machine learning, etc. Our generation try to understand large amount of data that is being produced every single second through online services. Although libraries do not handle extremly large data as big companies do, we, librarians, have also genuine interest in user behaviors and interactions.

Thankfully, Innovative has advocated Open Library Stacks (OLS) architecture. With the merger of Polaris and VTLS, OLS idea becomes more important. The main idea is to create a hightly scalable cloud platform to make their different products communicate each other. For this goal, APIs are critically important, because APIs make data sharable among other products, which can be Innovative's another product, or third body of companies', or your local program.

We, humans, are creative beings. We want to make reports in the format that we want. Not only data content, but also data presentaiton matters. I know that you all have expriences in tweaking an excel file to create a good report.

Exploring data
is
fun!

Plotly (Online & Offline)
Sierra SQL API
Some Examples (Sierra & Plotly)
Jupyter & Data Analysis

What is Plotly?

A data analytics and visualization tool
Charts & Dashboard
Online and Offline
Open Source
D3.js(SVG) and stack.gl (WebGL) for web graphics
Great API Documentation & Examples

Plotly Online

Plotly examples
Plotly Create page
bar/pie/scatter chart - checkouts by librararies
multiple traces in scatter plot (data)
Import: File Upload, URL Link, SQL DB Connector

Plotly Offline (1)

Support multiple languages (Python, R, MATLAB, JavaScript, etc)
Python Library
Plotly Offline Setup
Make the "first" offline chart (as a html file)
Interactive: display modeBar
How to embed charts in a html?

Plotly Offline (2)

How to embed charts in a html?

offline library: output_type option (file, div)
a template engine (Jinja2)
- Output Div => Template
a template html
an embed example

Plotly Offline (3)

Plotly: Summary

Creating different types of charts
Interactive
Data, Plot, Attributes
Data from file, url, db connector
Online & Offline (output_type, offline.plot())
Template engine (Jinja2) and a template html

Sierra SQL API

PostgreSQL relational database
Sierra_view schema & 349 tables (read only)
Special permisson to "Sierra SQL Access" required
Access to:
- bibliogrphical data (bib, item, holding)
- transactional data (circ_trans, fine, patron)
- system parameters (location, custom codes, loanrule, properties)
By default, 5 concurrent connections per user

PGAdmin, SQL Client

Version 3 or 4
Connect to database
Run SQL queries
Show results in tables
Creating/testing SQL queries

Setup PGAdmin

Know Sierra Database

SierraDNA
Learn the structure of Sierra database
Category links and search box
Detailed table view
ERD (Entity Relationship Diagram) view
Example: Transactions > Circulation > checkout

Find Checkouts

(PGAdmin Demo)


					-- Search for checkedout
					SELECT pv.home_library_code, pv.ptype_code, c.checkout_gmt, c.renewal_count, pv.barcode, c.item_record_id 
					FROM sierra_view.checkout AS c
					JOIN sierra_view.patron_view AS pv ON pv.id = c.patron_record_id
					WHERE NOW() - c.checkout_gmt <= interval '1 hours'
					AND pv.home_library_code = 'lhome'

Plotly & Sierra

How to use Sierra data in Plotly?

Re-use Plotly offline script
But, with Sierra data
Three things:
- get Sierra data (connection and query)
- transform Sierra data to plotly data
- generate a Plotly chart

SQL Output

You may feel a little puzzled on the transformation part. Why do we need to transform Sierra data before using it for Plotly? That's a good question.
It is not possible to feed a sql output directly into a plotly chart, because data structure is different. I will use an example of a query of circulation transaction table (circ_trans). This is the SQL output. Each row present a circulation transaction. Let's picking one column (patron_home_library_code). We are going to present how many circulation transactions happened by patron_home_library (Leahterby, Law, Brandman). The column includes different patron_home_library_codes, which are associated to each circulation transaction. For our purpose, we need to count transactions in that column by each different home_library_code.

Plotly X Y Axes

Embed Sierra Charts

Can we embed multiple charts in one HTML page?

Use the same SQL output from circ_trans
Run data transformations on each column
Pltoly option: output_type = div
Pass the chart outputs to the template engine
Prepare a template html that presents multiple charts

Plotly Offline (3)

Interactive Program

Can we input specific options?

Python sys.argv - read user inputs from terminal
User Inputs:
- days (10 => 5)
- transaction types(o i f r = > o i)
- ouput filename

python sierra_chart_embed.py 5 oi sierra_interactive.html

Now, our program can create a ciculation transaction report with multiple charts. Currently, our SQL query find four types transactions (i, o, f, r) in the last 10 days. What about if you want to find only checkout and checkin in the last 5 days? Of course, you can change the sql query accordingly. However, it is not a good idea to modify a sql query inside the program every time when we need a different serach.

We have a better way. Psycopg2 (PostgreSQL database adapter) can pass variables to SQL queries. We will get a user input as variables using Python sys.argv. And we will pass them to a sql query using psycopg adapter. In this way, we can create an interactive program.

I am going to demonstrate how it runs. I have three user inputs (days, transaction types, output filename).

(In teminal, run the script, sierra_chart_embed, interactively).

I will not go deeper on this subject. But this technique will be used in the next section, Jupyter.

Sierra SQL API & Plotly: Summary

PGAdmin to build SQL commands
SierraDNA to learn Sierra Database
Plotly's Python offline with Sierra data
Database adapter to connect (psycopg2)
Data transformation & template
Interactive program

What is Jupyter?

Web application with its own host
Open Sources Project of Ipython
Ipython: Interactive Python Shell
Data science and scientific computing
Interactive data visualization
Share code, data, plots, and explanation
Publish in pdf, html, slide, and more
Documentation

Jupyter is a web application that can run on its own host. It started as an Open Source Project of Ipython. Ipython is another project that enhances a Python shell for data science and scientific computing (mathematics, physics, etc). In the Ipython shell, scientists can easily import great tools like Numpy, Pandas, Matplotlib, SciPy. They can test their science experiments with running programs interactively and find the result of their testing. Jupyter is a browser-based tool. Scientists can do what they do with Ipython "on the browser". The great benefit of using Jupyter is sharing thoughts. Jupyter has wonderful features: running codes (python, R, others), writing desciptions, data presentation, and visualization of data with scientific plots.

Try Jupyter

How to Use Jupyter on Your PC

Install Anaconda 3.6 (Link)
Open the anaconda terminal
Type "jupyter notebook" and Enter

Plotly Offline & Jupyter (1)

Re-use the offline Plotly codes
Three tweaks:
- Plotly.offline.iplot
- init_notebook_mode(connected=True)
- No output filename
More Info

Plotly Offline & Jupyter (2)

Three ways to use Plotly on Jupyter

Write a script on a cell and run
Import a script as a module
Use a magic function %run with user inputs

Plotly, Sierra SQL, and Jupyter: Summary

Plotly: View
Sierra SQL: Data
Jupyter: Communication

You saw how we can utilize three tools together. Plotly can be used to create visualizated charts. We can collect library data using Sierra SQL API (circulation transaction, patron, overdue, bib and items, acq/orders, systerm codes, etc). And we can present the Plotly charts and Sierra data on Jupyter dynamically. On Jupyter, a browser-based web application, we may more focus on communicating ideas.

SQL API is not writable. This means that we cannot update records with this API. However, for creating a report, mostly reading data should be enough. Using it in smart ways, we can create library data anlytics tools and handle our real-time data interactively.

Good news is that Sierra APIs is "OPEN" services and approachable anytime. I want to hear your feedback later if this combination does make sense to you.

What kinds

of
business and collection analytics
do we need?

Library Business & Data Analytics

Possible to create many analytics tools
Re-run or auto-run
Communicate on a Jupyter
Examples:
- Expired Patrons with Checkouts
- InterLibrary Loan Map:
  - ILL lendings in the last 100 days
  - ILL partners

Thoughts

More data from Sierra:
- WebPac usages (search, download)
Non-Sierra data:
- EzProxy
- LibGuides
- Library building usage (gate counter)
- Digital repository system
- And so on...

Sierra API & Visualized Data Analysis

Final Product Preview

Why (1)?

Why (2)?

Why (3)?

Exploring data is fun!

Contents

What is Plotly?

Plotly Online

Plotly Offline (1)

Plotly Offline (2)

Plotly Offline (3)

Plotly: Summary

Sierra SQL API

PGAdmin, SQL Client

Setup PGAdmin

Know Sierra Database

Find Checkouts

(PGAdmin Demo)

Plotly & Sierra

SQL Output

Plotly X Y Axes

Embed Sierra Charts

Plotly Offline (3)

Interactive Program

Sierra SQL API & Plotly: Summary

What is Jupyter?

Try Jupyter

How to Use Jupyter on Your PC

Plotly Offline & Jupyter (1)

Plotly Offline & Jupyter (2)

Three ways to use Plotly on Jupyter

Plotly, Sierra SQL, and Jupyter: Summary

What kinds

of business and collection analytics do we need?

Library Business & Data Analytics

Thoughts

What is the next?

[ Library Data Group ]

Seong Heon Lee, selee@chapman.edu

Thank You.

Sierra API
&
Visualized Data Analysis

Exploring data
is
fun!

of
business and collection analytics
do we need?