Visualizing Data:
Sierra SQL API and Plotly

2018 SCIUG Conference at Chapman University
(October 23)

Presented by Seong Heon Lee, Systems & Technology Librarian Hugh & Hazel Darling Law Library, Chapman University

Final Product Preview

Why?

Sierra has Web Management Reports

How many of you are using WMR?
What is your user exprience?
Decision Center

My first question is this. Why do we need a custom report program? (CLICK) As you know, Sierra provides with a built-in report prgoram, called Web Management Reports (WMR). How many of you are using WMR? What is your user exprience? Personally, I do not have many words to say about it, because I have not used it much. Preparing this presentation, I revisited the report program to make a quick evaluation. My first impression is that it has not been improved for a long time. UI is old and not intuitive.

I know that Innovative has a new data analytic product, Decision Center. Again, I have no word to say about Decision Center, because I have no experience of the product. But I am not sure how many libraries are affordable to purchase the product, particularly a library running a small or medium size of colletion. Would it be valuable enough for the money we invest? The answer may be "yes" for some libraries, or 'no' for others.

I think that Sierra APIs (SQL Direct, REST) are great tools to leverage the gap. A custom report program, which is built with the APIs, can provide a good solution for data analytics.

Why? (continued)

Who uses what?

Compare resource usages by libraries
Compare by the following criteria:
- locations
- patron types
- item types
- transaction times
- etc...

Particularly, for the shown example, I started with questioning who uses what? (CLICK) Chapman University have three branches in Sierra, Leatherby, Law Library, Brandman. Haven't you ever been curious about how different libraries use library resources? Our library director was curious about how Leatherby (the main library) and Law Library utilize resources. Because the size of collections and users, Leatherby should use more resources than Law Library does. That is a reasonable assumption. Can we prove that assumption with factual data.

This is one example of how data analytics can be useful. Going further, we can compare library resource usage not only by libraries but also by locations, patron types, item types, transaction times, etc.

Why? (continued)

Why not?

Plenty of data visualization tools (Open Source)
Open Library Stacks (Direct SQL & REST API)
Customize a data report as you wish

One of the best way to answer a question "Why?" is to ask "Why Not?" (CLICK) There are many data visualization tools available. Many of them are open sources. We are evolving into a data-driven society. We hear so many buzz words like big data, AI, machine learning, data analytics, data visualization, etc. With such a high demand and interest of data, there will be more tools being developed.

Innovative advocates Open Library Stacks (OLS) architecture. With the merger of Polaris and VTLS, OLS idea becomes more important. The main idea is to creat a highly scalable cloud platform to make their different products communicating each other. For this goal, APIs are critically important, because APIs make different products share and exchange data. Successful vendors implement APIs to create the ecosystem that their products lead in competative market. They outreach more developers and users.

We are creative beings. We like to customize reports as we wish. Data matters. But data presentaiton also matters. We all have experiences in tweaking an excel file to create a nice report. We can do the same with using APIs and visualization tools.

Exploring data
is
fun!

Plotly (Online & Offline)
Sierra SQL API
Actual Reports (Sierra & Plotly Offline)
Web App (Sierra & Plotly)

What is Plotly?

A data analytics and visualization tool
Charts & Dashboard
Online and Offline
Open Source
Support multiple languages (Python, JavaScript, R, MATLAB, etc)
D3.js(SVG) and stack.gl (WebGL) for web graphics
Great API Documentation & Examples

Plotly Online

Plotly examples
Plotly Create page

Import data: File Upload, URL Link, SQL DB Connector
Select a chart type (bar/pie/scatter) and attibutes (X, Y axes)
Adjust chart styles
Multiple traces in scatter plot (data)

Plotly Offline

Python Library
Plotly Offline Setup & Syntax

Import plotly library and chart objects
Use offline plot function
Feed data
Decorate layout
Pass optional arguments (filename, show_link)
Interactive dipslay modeBar

Plotly Offline (continued)

How to embed charts in a html?

Follow the same rules
Use div for output_type, instead of filename
Hold a chart in a variable
Pass the variable to a template (Jinja2 engine)
A template file will receive multiple chart variables

Plotly Offline (continued)

Plotly: Summary

Online & Offline
Various types of charts
Data, Layout, Other attributes
Interactive
Offline

plotly.offline.plot() function
output_type='div'
template engine (Jinja2) and a template html

Sierra SQL API

PostgreSQL relational database
Sierra_view schema & 348 tables (read only)
User permission to "Sierra SQL Access" required
Access to:
- bibliographic data (bib, item, holding)
- transactional data (circ_trans, fine, patron)
- system parameters (location, custom codes, loanrule, properties)
By default, 5 concurrent connections per user

PGAdmin, Destkop SQL Client

Version 4
Connect to database
Run SQL queries
Show results in tables
Creating/testing SQL queries

Setup PGAdmin

Know Sierra Database

SierraDNA
Learn the structure of Sierra database
Find a table in category and search
Detailed view of a table
ERD (Entity Relationship Diagram) view
Example: Transactions > Circulation > checkout

I just explained how to connect to Sierra Database. Now we need to learn database structure.

PGAdmin displays all 348 viewable tables (under sierra_view schema). To get familiar, you need to spend some times. You may browse through all tables in PGAdmin. However, often times, some tables or fields are not obvious to understand.

Then, you can go to SierraDNA, which is the msot valuable resource to learn Sierra database. [Show SierraDNA]

It includes detailed descriptions on tables and fields. To find a table, you can follow category or use the search box on the top right. SierraDNA shows not only detailed table view but also Entity Relationship Diagram. ERD is useful to create a join query because it visualizes relationships among tables.

For example, let's take a look at checkout table.

Find Checkouts

(PGAdmin Demo)


					-- Search for checkedout
					SELECT pv.home_library_code, pv.ptype_code, c.checkout_gmt, c.renewal_count, pv.barcode, c.item_record_id 
					FROM sierra_view.checkout AS c
					JOIN sierra_view.patron_view AS pv ON pv.id = c.patron_record_id
					WHERE NOW() - c.checkout_gmt <= interval '3 hours'
					AND pv.home_library_code = 'lhome'

Plotly & Sierra

Transactions by Patron Home Library

Reuse an offline script
Use Sierra data, not a dummy one
New things:
- Connection & query: psycopg2
- Transform Sierra data to plotly data

So far, we have learned how to create a chart with Plotly and how to query Sierra data. Now is a time to put them together. We will create Plotly charts that visualize transactions by Patron Home Library for the last 10 days. [CLICK]

We will reuse the same offline script that was used before in Plotly demo, because visualization logic is the same. However, in this time, we will use actual Sierra data, instead of dummy data.

There are some new processes involved. To get data, we need to connect and run a query on Sierra database. I showed a demo how to do this in PGAdmin. Now, how do we do that in a Python program? We will use a PostgreSQL database adapter, called psycopg2. With psycopg2, we can connect Sierra database and run a query in Python program.

Afte we receive data from Sirera, we need to manipuate the data into a right format that Plotly requires.

[code sierra_chart.py]
[Explain the code and run]

Sierra SQL Output

Plotly X Y Axes

Embed Sierra Charts

Can we embed multiple charts in one HTML page?

Use the same SQL output from circ_trans
Run data transformations on each column
Plotly option: output_type='div'
Pass chart outputs to the template engine
Multiple charts will be embedded as instructed in a template file

Can we present multiple charts in one html document? [CLICK] Yes, we can. We can store charts in div and pass the charts into a template using Jinja2 template engine.

For this time, we will use the same Sierra data that we used in the previous example. Because we create multiple charts, we will transform data for each column (patron_home_library, ptype_code, time, op_code, itype_code, item_location).

We will feed the transformed data of each column to Plotly. Then, plotly output charts into divs. The chart outputs will be passed to the template engine and then the multuple charts will be displayed in HTML as instructed in a template file.

[Code - sierra_chart_embed.py]
[Explain the code and run]

Interactive Program

Can we specify days and transaction types?

Python sys.argv - read user inputs from terminal
User Inputs:
- days
- transaction types(o i f r)
- ouput filename

python sierra_chart_embed.py 60 oi sierra_interactive.html

Our program created a transaction report with multiple charts. The SQL query retrieved data of four types transactions (i, o, f, r) and in the last 10 days. What about if you want to find only checkout (o) and checkin (i) and in the last 5 days? [CLICK] Of course, you can change the sql query accordingly. However, it is not a good idea to modify a sql query inside the program whenever we need a different serach.

We have a better way. Psycopg2 (PostgreSQL database adapter) can dynamically pass user inputs to a SQL query. We will get user inputs using Python sys.argv in terminal. And we will pass them to a sql query using psycopg2. In this way, we can create an interactive program.

I am going to demonstrate how it runs. I have three user inputs (days, transaction types, output filename).

(In teminal, run it interactively).

Sierra SQL API & Plotly: Summary

Plotly Python Offline
SierraDNA to learn Sierra database
PGAdmin to run test queries
Python database adapter (psycopg2)
Transform data for Plotly
Template engine Jinga2 to embed multiple charts
Interactive program with user inputs

More Ideas...

Create the reports of special purposes
Automatically run and deliver reports (scheduler, clone)
Explore more data from Sierra (collections, circulation, acquisition)
Collect non-Sierra data (counter, space usages, research/reference)
Data from other APIs (Googlesheet, LibGuides)
Plotly's different charts (map, bubble, heatmap)

Visualizing Data:
Sierra SQL API and Plotly

Suggested Proposal Topics

Final Product Preview

Why?

Why? (continued)

Why? (continued)

Exploring data
is
fun!

Contents

What is Plotly?

Plotly Online

Plotly Offline

Plotly Offline (continued)

Plotly Offline (continued)

Plotly: Summary

Sierra SQL API

PGAdmin, Destkop SQL Client

Setup PGAdmin

Know Sierra Database

Find Checkouts

(PGAdmin Demo)

Plotly & Sierra

Sierra SQL Output

Plotly X Y Axes

Embed Sierra Charts

Interactive Program

Sierra SQL API & Plotly: Summary

More Ideas...

What is next?

[ Library Data Group ]

Seong Heon Lee, selee@chapman.edu

THANK YOU.

Visualizing Data:Sierra SQL API and Plotly

Suggested Proposal Topics

Final Product Preview

Why?

Why? (continued)

Why? (continued)

Exploring data is fun!

Contents

What is Plotly?

Plotly Online

Plotly Offline

Plotly Offline (continued)

Plotly Offline (continued)

Plotly: Summary

Sierra SQL API

PGAdmin, Destkop SQL Client

Setup PGAdmin

Know Sierra Database

Find Checkouts

(PGAdmin Demo)

Plotly & Sierra

Sierra SQL Output

Plotly X Y Axes

Embed Sierra Charts

Interactive Program

Sierra SQL API & Plotly: Summary

More Ideas...

What is next?

[ Library Data Group ]

Seong Heon Lee, selee@chapman.edu

THANK YOU.

Visualizing Data:
Sierra SQL API and Plotly

Exploring data
is
fun!