Visualizing Data: Sierra SQL API and Plotly
2018 SCIUG Conference at Chapman University (October 23)
Presented by Seong Heon Lee, Systems & Technology Librarian
Hugh & Hazel Darling Law Library, Chapman University
Welcome everyone. Hi, my name is Seong Heon. I am a systems & technology librarian at Chapman University Law Library.
First of all, thank you all for joining this session.
Today, I will share about Sirera SQL API and Plotly.
Suggested Proposal Topics
III Customer Support / new functionality
Weeding
More Polaris presentations
MySQL workshop / SQL report-writing
3rd-party tools w/III products
Best practices
User experiences (migration)
ERMS
Encore Duet
Sierra APIs
Stats / create lists / other ways of reporting
Linked Data
Electronic ordering
Access/User Services issues
Interlibrary Loan topics (ex. ILL basics/update)
Technology / marketing ideas and the library
When SCIUG requsted presentation proposals, they sent out a list of interests.
(CLICK)
I find that there are some interests in SQL report-writing, Sierra APIs, and Stats and other ways of reporting.
I think that I can cover those interests in this presentation.
Final Product Preview
Circulation Transaction Report
Why don't we start with looking at an outcome that you are expected to build after this presentation.
This will give us an idea to understand our goal of the presentation.
I will show a circulation transaction report using Sierra SQL API and Plotly. The basic process is
to extract data from Sierra's PostgreSql database and to create a report using Plotly visualization tool.
Let's take a look at a sample outcome.
Why?
Sierra has Web Management Reports
How many of you are using WMR?
What is your user exprience?
Decision Center
My first question is this. Why do we need a custom report program? (CLICK) As you know, Sierra provides
with a built-in report prgoram, called Web Management Reports (WMR). How many of you are using WMR?
What is your user exprience? Personally, I do not have many words to say about it,
because I have not used it much. Preparing this presentation, I revisited the report program to make
a quick evaluation. My first impression is that it has not been improved for a long time.
UI is old and not intuitive.
I know that Innovative has a new data analytic product, Decision Center. Again, I have no word to say about Decision Center,
because I have no experience of the product. But I am not sure how many libraries are affordable to purchase
the product, particularly a library running a small or medium size of colletion.
Would it be valuable enough for the money we invest? The answer may be "yes" for some libraries, or 'no' for others.
I think that Sierra APIs (SQL Direct, REST) are great tools to leverage the gap. A custom report program, which
is built with the APIs, can provide a good solution for data analytics.
Why? (continued)
Who uses what?
Compare resource usages by libraries
Compare by the following criteria:
locations
patron types
item types
transaction times
etc...
Particularly, for the shown example, I started with questioning who uses what? (CLICK)
Chapman University have three branches in Sierra, Leatherby, Law Library, Brandman.
Haven't you ever been curious about how different libraries use library resources?
Our library director was curious about how Leatherby (the main library) and Law Library utilize resources.
Because the size of collections and users, Leatherby should use more resources than Law Library does.
That is a reasonable assumption. Can we prove that assumption with factual data.
This is one example of how data analytics can be useful. Going further, we can compare
library resource usage not only by libraries but also
by locations, patron types, item types, transaction times, etc.
Why? (continued)
Why not?
Plenty of data visualization tools (Open Source)
Open Library Stacks (Direct SQL & REST API)
Customize a data report as you wish
One of the best way to answer a question "Why?" is to ask "Why Not?" (CLICK)
There are many data visualization tools available. Many of them are open sources.
We are evolving into a data-driven society. We hear so many buzz words like big data,
AI, machine learning, data analytics, data visualization, etc.
With such a high demand and interest of data, there will be more tools being developed.
Innovative advocates Open Library Stacks (OLS) architecture. With the merger of Polaris
and VTLS, OLS idea becomes more important. The main idea is to creat a highly scalable cloud platform
to make their different products communicating each other. For this goal, APIs are critically important,
because APIs make different products share and exchange data. Successful vendors implement APIs
to create the ecosystem that their products lead in competative market.
They outreach more developers and users.
We are creative beings. We like to customize reports as we wish. Data matters. But
data presentaiton also matters. We all have experiences in tweaking an excel file to create
a nice report. We can do the same with using APIs and visualization tools.
Exploring data is fun!
The best motivation of all things: FUN! Fun comes from curiosity
and the discovery of things.
Contents
Plotly (Online & Offline)
Sierra SQL API
Actual Reports (Sierra & Plotly Offline)
Web App (Sierra & Plotly)
Now let's move on to the real part of presentation.
Here is the list of items that I will cover today. First,
We will start with Plotly library. We will learn how to create a chart online and offline.
Next, we will go through how to connect Sierra SQL API and get data from Sierra Database.
After that, combining the knowledge of both Plotly and Sierra SQL API, we will create
actual reports with Sierra and Plotly Offline. Finally, I will show a sample web app.
What is Plotly?
A data analytics and visualization tool
Charts & Dashboard
Online and Offline
Open Source
Support multiple languages (Python, JavaScript, R, MATLAB, etc)
D3.js(SVG) and stack.gl (WebGL) for web graphics
Great API Documentation & Examples
How many people have used Plotly? How many people have heard about Plotly?
[Read Slide]
Plotly Online
Plotly examples
Plotly Create page
Import data: File Upload, URL Link, SQL DB Connector
Select a chart type (bar/pie/scatter) and attibutes (X, Y axes)
Adjust chart styles
Multiple traces in scatter plot
(data)
Let's look at different types of charts first.
Create a sample chart of library checkouts.
A: Leatherby, Law, Brandman, Science, Music
B: 530, 280, 74, 50, 38
Bar, Pie, Scatter
As we saw, it is very easy to create a chart.
[Read Slide]
Plotly Offline
As I explained, Plotly supports many languages.
I will choose Python for Today's demo.
Python Library pages show examples and documentation (Full Reference).
Before creating an offline chart, Let's see how to setup offline plotting.
[Code - first.py]
Plotly Offline (continued)
How to embed charts in a html?
Follow the same rules
Use div for output_type, instead of filename
Hold a chart in a variable
Pass the variable to a template (Jinja2 engine)
A template file will receive multiple chart variables
In most cases, we may need embeded charts. This is how we do it.
[Read Slide]
[Code - embed.py]
Plotly Offline (continued)
Plotly outputs a chart (bar_chart, pie_chart) as div.
We hold the charts in variables, which will be feeded
into the template engine, Jinja2. The engine plug the template variable into
a designated template file. Finally, the engine outputs
the charts in HTML as specified in the template file.
Plotly: Summary
Online & Offline
Various types of charts
Data, Layout, Other attributes
Interactive
Offline
plotly.offline.plot() function
output_type='div'
template engine (Jinja2) and a template html
Now, we covered most basics of Plotly Online and Offline.
[Read Slide]
Now we have enough foundation to build charts with real data.
Let's move to the next topic, Sierra database.
Sierra SQL API
PostgreSQL relational database
Sierra_view schema & 348 tables (read only)
User permission to "Sierra SQL Access" required
Access to:
bibliographic data (bib, item, holding)
transactional data (circ_trans, fine, patron)
system parameters (location, custom codes, loanrule, properties)
By default, 5 concurrent connections per user
PGAdmin, Destkop SQL Client
Version 4
Connect to database
Run SQL queries
Show results in tables
Creating/testing SQL queries
The easiest way to work with Sierra database is using SQL Client. PostgreSQL has
its own SQL client, called PGAdmin. You can download it with PostgeSQL database on your local PC.
[Read Slide]
Setup PGAdmin
Setting PGAdmin is straightforward. [Explain Slide]
Know Sierra Database
SierraDNA
Learn the structure of Sierra database
Find a table in category and search
Detailed view of a table
ERD (Entity Relationship Diagram) view
Example: Transactions > Circulation > checkout
I just explained how to connect to Sierra Database. Now we need to learn database structure.
PGAdmin displays all 348 viewable tables (under sierra_view schema). To get familiar,
you need to spend some times. You may browse through all tables in PGAdmin.
However, often times, some tables or fields are not obvious to understand.
Then, you can go to
SierraDNA, which is the msot valuable resource to learn Sierra database.
[Show SierraDNA]
It includes detailed descriptions on tables and fields.
To find a table, you can follow category or use the search box on the top right.
SierraDNA shows not only detailed table view but also Entity Relationship
Diagram. ERD is useful to create a join query because it visualizes relationships among tables.
For example, let's take a look at checkout table.
Find Checkouts (PGAdmin Demo)
-- Search for checkedout
SELECT pv.home_library_code, pv.ptype_code, c.checkout_gmt, c.renewal_count, pv.barcode, c.item_record_id
FROM sierra_view.checkout AS c
JOIN sierra_view.patron_view AS pv ON pv.id = c.patron_record_id
WHERE NOW() - c.checkout_gmt <= interval '3 hours'
AND pv.home_library_code = 'lhome'
Now let me show you a real query in PGADMIN.
First, open PGADMIN. Once we are connected to Sierra database,
you will see all tables and fields.
To make a query, we click SQL button. We see query pane and result pane.
Let's copy and paste the left SQL query and run it.
Plotly & Sierra
Transactions by Patron Home Library
Reuse an offline script
Use Sierra data, not a dummy one
New things:
Connection & query: psycopg2
Transform Sierra data to plotly data
So far, we have learned how to create a chart with Plotly and how to query Sierra data.
Now is a time to put them together. We will create Plotly charts that visualize transactions
by Patron Home Library for the last 10 days. [CLICK]
We will reuse the same offline script that was used before in Plotly demo,
because visualization logic is the same. However, in this time, we will use actual Sierra
data, instead of dummy data.
There are some new processes involved. To get data, we need to connect and run a query on Sierra database.
I showed a demo how to do this in PGAdmin. Now, how do we do that in a Python program?
We will use a PostgreSQL database adapter, called psycopg2. With psycopg2,
we can connect Sierra database and run a query in Python program.
Afte we receive data from Sirera, we need to manipuate the data into a right format that Plotly requires.
[code sierra_chart.py]
[Explain the code and run]
Sierra SQL Output
You may feel a little puzzled on the data transformation part.
Why do we need to transform Sierra data before using it for Plotly?
That is because Plotly needs X and Y axes data in specific structure.
Plotly X Y Axes
I have chart_data function that takes care of this business.
The function reads through each row of the patron_home_library_code column
and count record by patron_home_library.
Finally, we see that data is arranged for Plotly x and y axes.
It is in two-demensional list format.
Embed Sierra Charts
Can we embed multiple charts in one HTML page?
Use the same SQL output from circ_trans
Run data transformations on each column
Plotly option: output_type='div'
Pass chart outputs to the template engine
Multiple charts will be embedded as instructed in a template file
Can we present multiple charts in one html document? [CLICK] Yes, we can.
We can store charts in div and pass the charts into a template using Jinja2 template engine.
For this time, we will use the same Sierra data that we used in the previous example.
Because we create multiple charts, we will transform data
for each column (patron_home_library, ptype_code, time, op_code, itype_code, item_location).
We will feed the transformed data of each column to Plotly.
Then, plotly output charts into divs.
The chart outputs will be passed to the template engine and then
the multuple charts will be displayed in HTML as instructed in a template file.
[Code - sierra_chart_embed.py]
[Explain the code and run]
Interactive Program
Can we specify days and transaction types?
Python sys.argv - read user inputs from terminal
User Inputs:
days
transaction types(o i f r)
ouput filename
python sierra_chart_embed.py 60 oi sierra_interactive.html
Our program created a transaction report with multiple charts. The SQL query retrieved
data of four types transactions (i, o, f, r) and in the last 10 days. What about if you want to find
only checkout (o) and checkin (i) and in the last 5 days? [CLICK] Of course, you can change the sql query accordingly.
However, it is not a good idea to modify a sql query inside the program whenever we need a different serach.
We have a better way. Psycopg2 (PostgreSQL database adapter) can dynamically pass user inputs to a SQL query.
We will get user inputs using Python sys.argv in terminal. And we will pass them to a sql query
using psycopg2. In this way, we can create an interactive program.
I am going to demonstrate how it runs. I have three user inputs (days, transaction types, output filename).
(In teminal, run it interactively).
Sierra SQL API & Plotly: Summary
Plotly Python Offline
SierraDNA to learn Sierra database
PGAdmin to run test queries
Python database adapter (psycopg2)
Transform data for Plotly
Template engine Jinga2 to embed multiple charts
Interactive program with user inputs
More Ideas...
Create the reports of special purposes
Automatically run and deliver reports (scheduler, clone)
Explore more data from Sierra (collections, circulation, acquisition)
Collect non-Sierra data (counter, space usages, research/reference)
Data from other APIs (Googlesheet, LibGuides)
Plotly's different charts (map, bubble, heatmap)
Data is everywhere.
We can ask, "Where is data?" "What can we do with it?" "How should we present it?"
[Read Slide]
[Web App Demo]
What is next?
[ Library Data Group ]
Seong Heon Lee, selee@chapman.edu
"What is next?" We may form "Library Data Group". We can explore various
ways of using library data and create analytic tools. So, if you are interested in,
please send me an email. I think we can grow learning each other.
THANK YOU.
My daughter said that the last slide is very important.
It must be pretty and meaningful.
Here we go. THANK YOU.