One of the most powerful features of data.world’s data-first platform approach is that access to data is exposed through a well-standardized query language and web service protocol — SPARQL. And because SPARQL is so well standardized, there are many tools on the market that use it for data access. Today I want to highlight metaphactory (by metaphacts) — a semantic application development platform that allows you to build web applications on top of a SPARQL endpoint with minimal technical overhead. I’ll show how it can be extremely easy to use data.world together with metaphactory to rapidly build out a simple application.
Setting up the integration
metaphactory integrates with a knowledge graph via a SPARQL endpoint, and because every data.world dataset and project exposes a SPARQL endpoint, it’s quite easy to integrate — you can look into data.world’s API docs to see the URI Pattern for a SPARQL endpoint:
https://api.data.world/v0/sparql/{owner}/{id}
Then provide that in your metaphactory configuration when you start the server. The trial version of the metaphactory platform is available on Amazon as a CloudFormation template that can be used to set it up together with data.world in one click. See the bottom of this post for the setup instructions.
Now the metaphactory instance is connected to data.world as a data source and can issue queries. metaphactory lets users build user interfaces with a web-based interface for editing HTML, with a rich set of HTML 5 Semantic Components that can be used to dynamically tie page elements to SPARQL queries.
How it works
The example here is drawn from a talk given at Enterprise Data World. The topic was using data.world and the FIBO financial ontology to model mortgage data and use that modeled data to highlight potential cases of money laundering. We’re not going to focus on the data modeling part of the exercise here. Instead, we’ll look at taking that work and exposing an end-user application with metaphactory on top of a data.world SPARQL endpoint. In other words, we’ll walk through one way you can help even your least technical teammates across the business find, easily understand, and use your data for their own self-serve analysis.
In this screenshot, you can see a SPARQL query that was authored in data.world to select all of the loans modeled as part of this exercise:
Being able to run this query ad-hoc and get back a table of results is great — but for many less-technical users, seeing the SPARQL query isn’t useful. For example, your Marketing team probably prefers to just get a table of results. If you also give them the ability to manipulate query results via a faceted search control, they can easily spin, slice, and dice the results of your analysis to answer additional questions with the data. You, the analyst, can get out of the way and move on to your next data task.
The combination of data.world and metaphactory makes this possible through a semantic search component: a composite HTML5 component that defines a custom search environment, within which one can define a semantic table component that would render the search results. This semantic table can be configured to execute SPARQL when rendered, producing an interactive table that looks like this:
For example, our mortgage loan search has the following structure (complete source code can be accessed by clicking the “Edit Page” button in the toolbar):
<semantic-search selector-mode='dropdown' optimizer='none' ...>
<semantic-search-query-keyword domain='<https://spec.edmcouncil.org/fibo/ontology/LOAN/LoanTypes/MortgageLoans/MortgageLoan>'
placeholder='Search for loans'
min-search-term-length=2
query='...'>
</semantic-search-query-keyword>
<div data-flex-layout="row stretch-stretch">
<div ...>
<semantic-search-facet></semantic-search-facet>
</div>
<semantic-search-result-holder>
<div ...>
<bs-tabs … >
<bs-tab event-key='1' title='Table'>
<semantic-search-result>
<semantic-table id='field-results'
query='...'>
</semantic-table>
</semantic-search-result>
</bs-tab>
...
</bs-tabs>
</div>
</semantic-search-result-holder>
</div>
</semantic-search>
The semantic-search tag that defines the whole search environment and its parameters. Inside, it has nested subcomponents like:
- semantic-search-query-keyword that defines how the query is formulated using keywords: e.g., that we are looking for instances of MortgageLoan, require minimum 2 symbols for a search term, and use a specific SPARQL query template to search by keyword.
- semantic-search-facet that enables faceted exploration of query results
- semantic-search-result-holder that contains one or multiple semantic-search-result tags. Each one defines a specific visualization of search results: e.g., our semantic-table.
Notice that many of the elements rendered in that form are shown as web links. These are entities, referenced by their URIs. metaphactory allows for the definition of template pages for different semantic types. That means that wherever that URI is referenced, the target of that link will be a page that executes a SPARQL query to retrieve information about that entity, and render page components to display that information visually.
For example, the page for a loan looks like this:
Notice at the top of the page that we see the URI and Type for the entity referenced, then in the Summary we have two more table components : one to show the details for this loan, and another to show the details for any other loans that happen to have the same property address (part of our “find suspicious behavior that indicates money laundering” theme.) Finally, because we have geo-coded the address into a latitude and longitude, we can render the location in a map control.
And if you click through the link to JPMORGAN CHASE & CO, you land on another page that is built on a template for displaying bank information. Here we’re leveraging another strength of SPARQL — federated queries — to join in information from a remote knowledge base. This query links in information like the headquarters address and current Officers & Directors of the company from a remote database:
The template page can be accessed by clicking the “Edit Page” button on the toolbar of the instance page and then following the link to the template appearing on top of the editor field (the instance page itself will be empty). The template page contains HTML5 code looking like this:
The descriptor of the semantic map component can be configured using a single SPARQL query passed via the query attribute. By convention, this SPARQL query returns four output variables with pre-defined names:
- latitude and longitude that correspond to the coordinates
- description that contains the textual description associated with the marker
- link that contains the link to the corresponding instance in the knowledge base
Note that the query contains the placeholder ?? in the subject position. In the template pages, this placeholder gets replaced with the actual URI of the instance to which the template is applied.
metaphactory gives us a nice “browse” interface over the data that we’ve loaded into data.world, which is a big benefit for the majority of non-technical users who aren’t going to be comfortable seeing SPARQL queries while trying to get answers to their data questions. Remember, in addition to making it easier for anyone to access and use this data, we also kicked off this example hoping to automate the detection of money laundering schemes at least a little bit. We can do that with a set of rules implemented as SPARQL queries. Here’s the query that discovers instances of the “successive selling” pattern:
In metaphactory, a Rule that can be configured to execute on a schedule, and violations of rules are recorded and reported on a timeline:
Each violation is automatically linked to another templated page that shows the nature of the violation, and links to the related entities, so the human in the loop can investigate further. One of the most valuable pieces of that investigation is a visualization of the graph of relationships involved in the rule violation. metaphactory provides graph visualizations by embedding Ontodia, a library for rendering RDF graphs:
This is one of the reasons why a knowledge graph like a data.world dataset is a good match for solving these kinds of problems. Having identified a pattern that indicates an issue worth investigating, it’s straightforward to encapsulate it in a query, run that query as a rule periodically, and give your end users a rich interface to access the knowledge base and do the final research.
I hope you found this post informative. If you’re interested in learning more about how data.world can help you start leveraging the power of knowledge bases, and how you can build rich interactive applications on top of those knowledge bases quickly with metaphactory, please reach out at help@data.world.
How to setup metaphactory with data.world
To create your own Amazon EC2 instance running the metaphactory platform connected directly to your data catalog in data.world, you can sign up here:
You will receive an e-mail with a link to the CloudFormation script, which sets up your metaphactory and connects it to data.world along the following steps:
In the CloudFormation script the mandatory parameters are:
- URL of the data.world SPARQL endpoint. To test the system with the mortgage loans data described in this blog post, you can connect to the public dataset https://api.data.world/v0/sparql/edw-fibo-2018-demo/fibo-modeling.
- Note: In case you want to try this with your own dataset instead, you can still download the test mortgage loans data from your instance
{your MetaphactoryURL}/assets/samples/fibo-modeling-export.ttl
and add it to your dataset manually.
- Security access token to connect to data.world. You can get one from your data.world account profile by navigating to https://data.world/settings/advanced. You would need to copy/paste the Read/Write access token.
- EC2SSHKeyPairName — key pair for the SSH access to the metaphactory instance
For the purpose of this blog post it is sufficient to use a t2.small instance for the metaphactory platform.
After completing the CloudFormation stack, it produces three outputs:
- MetaphactoryPassword: admin password of the metaphactory system
- MetaphactoryURL: URL to access the new metaphactory instance
- MetaphactorySSHAccess: SSH command to login to the metaphactory instance server.
Now you can access metaphactory by navigating to {MetaphactoryURL} in your browser. You should use “admin” as the login name with the {MetaphactoryPassword} returned by the CloudFormation stack.
From the start screen you can access the pre-configured semantic search page with the example described in this blog post (you may need to scroll down to the “Start Example” button).