For the Implementation of Blazegraph and Other External Triplestores
We are happy to announce the release of an exciting new module to the Islandora community, the Trippi-Sail Triplestore Adapter. It helps Islandora repositories scale beyond a few million objects without negative impacts on performance and enables faster querying of the triplestore.
Our work on this module started in 2015 with The University of Pittsburgh. They started their Islandora repository project with 150,000 objects in 2012. However, in 2015 the repository had grown to three million objects; that’s almost a 2 thousand percent increase over three years. They plan to continue to grow their Islandora repository to 5 million objects in 2016. The scaling was put on hold because the site suffered significant performance barriers. For example, a book was taking up to a minute to load, rendering the site unusable. In working with discoverygarden, we found that bulk of the performance issues identified were with the Mulgara Triplestore.
Brian Gregg from the University of Pittsburgh said, “we were essentially stuck; the system became non-responsive whenever you tried to view an object no matter what we did to try to improve the situation. We had deadlines that were being missed and a system that was extremely slow to respond at best – we had to find out how to resolve this and address it fast. We could not continue our plans for our rollout without this issue being resolved. We asked Discoverygarden for assistance and they came up with a path to help move us forward.”
The current Islandora installation includes Fedora Commons as the backend repository software. The Mulgara triplestore is the default backend for Fedora's resource index. Mulgara has a number shortfalls. For example:
There has been a lack of recent development activity on the project since 2014.
Undertaking CRUD (Create, Read, Edit, Delete) operations on a large mulgara triplestore results in “severe performance penalties” (Fiz-Karlsruhe, 2008). Anecdotal evidence from our customers has shown that performance is particularly hard hit when Mulgara reaches 50 million triples (2-5 million objects)
It’s unable to cluster or load balance. This functionality was on the product’s roadmap but didn’t come to fruition (mulgara, n.d.)
It has an incomplete implementation of the SPARQL language. Our developers prefer to use SPARQL over iTQL for a few reasons, such as, SPARQL is standardized; iTQL is not and iTQL will not be used in Fedora 4. (Source: Dukart, 2014)
To resolve the University of Pittsburgh’s use case we assembled a team of developers, led by Adam Vessey, to replace Mulgara by leveraging existing code to create the Trippi-Sail Triplestore adapter.
The module allows Sesame Sail-compliant Triplestores to connect into the Trippi Triplestore interface, allowing them to be used with the Fedora Commons Repository layer of Islandora. During testing for the University of Pittsburgh our developers connected the open source version of Blazegraph. This module was developed to allow the community to implement Blazegraph or other Triplestore solutions, such as JENA/Fuseki recently implemented by the Smithsonian.
Blazegraph is highly scalable up to 50 billion triple counts on a single server, allows clustering, and supports SPARQL.
“Congratulations to discoverygarden and the Islandora community on this milestone. With the release of Trippi-Sail Triplestore Adaptor, Islandora joins a growing group of knowledge graph applications such as the Wikidata Query Service powered by Blazegraph. Blazegraph provides an open-source, highly scalable graph database supporting the RDF/SPARQL and the Apache Tinkerpop™ APIs.”, said Blazegraph CEO, Brad Bebee.
Blazegraph is currently being pegged as the likely default triplestore candidate for Fedora 4.
This development is stretching the capacity of Islandora to scale to larger numbers without negative impacts on performance. Discoverygarden inc. CEO John Eden says,
“This development will help support Islandora users who have immediate or near term scaling requirements; with further growth opportunities with Fedora 4. In our role with the Islandora Foundation and as a service provider for our customers, we are committed to providing sound, viable, long term solutions that will fit people’s timeline, requirements, and budgets.”
When implemented for the University of Pittsburgh, we saw books that had taken 1 minute to load now take only 1.5 seconds to load. That’s close to a 4000% increase in speed. Brian Gregg from the University of Pittsburgh says,
“Our systems response times are now back to where we were hoping that they would be. The project is now back on track and we are nearly ready to launch our new sites to the public, providing access to the content that we have been putting together to share with them. Without this solution from Discoverygarden utilizing Blazegraph we would have not been where we are now.”
We’re in the process of implementing Blazegraph in several other organizations with large-scale repository needs. We look forward to releasing further enhancements to this module. We invite you to take a look at our public Github code repository. We welcome feedback and contributions on this exciting new development.
Works Cited: BLAZEGRAPH BY SYSTAP, LLC. (2016). Home. Retrieved from https://www.blazegraph.com/ Dukart, J. (2014, August 8). Don’t be Dreary, Let’s Query! Retrieved from http://islandora.ca/sites/default/files/Copy_Don%E2%80%99t%20be%20Dreary%2C%20Let%E2%80%99s%20Query%21.pdf Fiz-Karlsruhe. (2008, July 17). Welcome to the Fedora Performance and Scalability Wiki. Retrieved from http://fedora.fiz-karlsruhe.de/docs/ Mulgara. (2014,February 10). Recent News section.paragraph 1. Retireved from http://www.mulgara.org/ Mulgara. (n.d.). Future. Retrieved from http://www.mulgara.org/future.html