-
Notifications
You must be signed in to change notification settings - Fork 3
Which RDF Store To Use
See Persistence for a discussion on the persistence strategy for ldp-service and related services such as oslc-service. That page concludes that OSLC and LDP would benefit from using an RDF database that supports a SPARQL endpoint. This page looks at different RDF databases in order to determine which one to use for OSLC4JS.
Ultimately we don't want any OSLC or LDP client application to have any dependency on the persistence platform used by an OSLC4JS server. This should be hidden in the ldp-service implementation and not exposed through the RESTful APIs. Similarly we don't want the implementation of the oslc-service or ldp-service to be too tightly bound to a particular database implementation, relying on standard REST calls and SPARQL to isolate the service implementation from the database. This will facilitate substituting different RDF databases for different purposes. OSLC4JS could then start with something simple and cost effective to support a reference implementation. Then other branches could be built on something that provides higher QoS at potentially higher cost.
These requirements are for the initial implementation of OSLC4JS, including oslc-service and ldp-service implementations. As such these are the minimal requirements for the database so that attention can be paid on OSLC and LDP. Subsequent implementations may address different requirements depending on what the OSLC and/or LDP application requirements might be.
- Simple, standard, cost-effective RDF triple store that supports a SPARQL endpoint.
- The ability to store and retrieve graphs of triples.
- Can be deployed on Bluemix and possibly other Cloud platforms for flexible deployment targets
- Can be accessed with rdflib.js, the preferred API for dealing with RDF in OSLC4JS applications
There are a number of RDF database possibilities to choose from, each explored briefly in the following sections.
From the README:
rdfstore-js is a pure Javascript implementation of a RDF graph store with support for the SPARQL query and data manipulation language.
The library is going right now through a major rewrite. Versions > 0.9.X must be considered development versions until version 1.0.0 is finished. Many features present in versions 0.8.X have been removed. Some of them, will be added in the next versions, other like the MongoDB backend will be discarded.
Looks interesting and supports the minimal requirements. But it hasn't had any updates in 7 months, despite the reference to an ongoing major rewrite. A quick attempt at getting a sample to work failed, possible with dependency issues for the devDependencies. Moving on...
Persistence already discusses using MongoDB to store and query RDF data and concludes this is not a good choice. The biggest issue is that there is no practical query language supported. MongoDB query could be used, but it isn't practical if the data is stored as RDF triples, or expanded JSON-LD. It's possible, but not practical since the data is not in a sufficiently useful or friendly JSON form.
Apache TinkerPop is an open source Graph Computing Framework representing a large collection of capabilities and technologies with thrid-party contributed graph libraries and systems.
TinkerPop is a graph abstraction layer over different graph databases and different graph processors, so there are many Graph instances you can choose from to instantiate in the console.
- Blazegraph - RDF graph database with OLTP support. Uses RDF* to reify RDF statements and store them in property graphs using TinkerPop3.
- Stardog - RDF graph database with OLTP and OLAP support.
- gremlin-javascript Javascript graph database client for TinkerPop3 Gremlin Server
A Bluemix service for a Graph Data Store built on Apache TinkerPop 3 and including the Gremlin graph-specific query language.
- persistent property graph data store
- Graph Data Store is in the Bluemix Labs Catalog
- Click the Service Credentials link. What you see next is a block of JSON that includes three important items: the URL to your specific graph store, and the basic authentication user id and password.
- agnostic about the particular graph store that is used
- adhere to the published REST interface, and we can build and interact with the graph in any programming language
- Graph Data Store is built with latest Apache TinkerPop 3
- Includes Gremlin, the powerful, graph-specific query language, and the Gremlin Server
- use an “unbounded” service so that we can access the graph via REST from a program running outside the Bluemix
This would be a good option, but would require the servers to implement the OSLC query service in order to provide a standard query mechanism. A Gremlin endpoint could be provided too for additional, repository-specific queries, and there may be ways to translate SPARQL to Gremlin to provide a standard SPARQL query capability. For specific server implementations that have high QoS, scalability and performance needs, this could be a good option. However, it may be unnecessary for an OSLC4JS ldp-service implementation.
GraphDB is an ontotext semantic graph database supporting RDF and SPARQL. Uses RDFS and OWL for the schema language. Supports inferencing. Offers 5 editions: Standard, Enterprise, Database-as-a-Service on S4, Cloud on AWS and Free.
4store, 5store, Virtuoso, etc. would be possible candidates. 4store's Web site is no longer up since the company has changed hands and the site was not funded. There is a wayback machine download of the web site in the 4store git rpository, so at least the documentation is not lost and can be reconstructed. In any case, 4store is no longer in active development, and 5store, which replaced it, is no longer available. Too bad, this was a nice triple-store.
Virtuoso is a commercial product and that's hard to use in IBM and/or open-source projects.
Jena TDB is already used for LQE and is the basis of TopQuadrant's persistence strategy. IBM has experienced some scalability and performance issues with large triple counts. However Jena TDB address all of the requirements for OSLC4JS and makes a good, cost effective choice.