GraphDB Conversion Engine

Structured data means data that has a particular structure (RDBMS, Excel, CSV, TSV, and RDF). Unstructured data means data that has the document’s structure (web documents, manuals, etc.). The process to convert and integrate data into graph data begins in order to store such data in GraphDB. Thereby, complicated relationships between various large-scale data that scatter internally and externally can be determined more easily and promptly.

Using the data conversion method, it is possible to map the data source structure with the knowledge graph data model. It is also possible to extract data of a certain resource in the form of the property and value that correspond to a knowledge graph model and convert such data into graph data. Also, RDB and RDF can be directly connected and integrated through W3C’s RDF Direct Mapping technology.

Saltlux’s data conversion engine supports RML (Rule Mapping Language), the language used for conversion and integration. This engine and W3C’s R2RML support RDB, various data sources, data purification, and data filtering in the mapping process, thereby securing and processing high-quality graph data.

< GraphDB conversion engine – Conceptual diagram of data conversion >

Introduction

By mapping data sources (DBMS, CSV, RDF, etc.) to the knowledge graph model, the GraphDB conversion engine can create data corresponding to knowledge graphs. This engine provides the function to convert all data that have a structured form, such as RDB, through the support of W3C’s R2RML language and the provision of RML language (the internal data conversion rule). It also provides the function to convert and process user data into the virtual data view. The user can perform the data conversion easily and promptly through the graph data conversion engine.

< GraphDB conversion engine – Block diagram of functions >

The data conversion engine’s management function allows the user to carry out data conversion and processing (data pre-processing, conversion, data post-processing) by providing useful features, including data mapping and conversion, data source viewer, data model (schema) viewer, SPARQL viewer and test, CSV/Excel file viewer, RML editor and test, and statistics conversion.

< Graph data conversion process >

Carry out the data conversion process as data source selection, generation of data view corresponding to the data source, graph map definition, data view, graph map binding, and graph data generation. The graph map defines an instance corresponding to a graph model. The graph map is used in case value filtering and purification are necessary for creating the property value of a certain resource.

< Large scale unstructured data extraction process and tool >

Main features

The GraphDB conversion engine can directly define a data view (defined by the user) and the filtering function. The engine can apply these functions by providing a virtual data view (supports large-scale data conversion and various data sources), data purification, and data filtering (for data conversion). The biggest advantage of the graph data conversion engine is that the user can create data views as a plug-in. As they all have a URL address, the same function can be used separately for other projects (operation) through the URL. Also, the configuration for each project in progress can be managed through the linkage with the configuration management servers (SVN, CVS, Git, etc.). This engine has the following features.

Main Functions and Specifications

The GraphDB conversion engine handles the graph data generation for structured and unstructured data in the GraphDB Suite. It is configured as the management tool supporting data conversion function and conversion operation. Structured data can be extracted and converted through schema mapping. In the case of unstructured data, necessary property values for a data model can be extracted and converted by combining the KENT’s data extraction function.

Data conversion function that supports various formats

The data conversion function of the GraphDB Suite provides the procedures and methods for generating graph data. It also enables the testing of the result before conversing and storing the conversion result directly in GraphDB. The core functionality is structured primarily as a plug-in so that it can be optimized according to the user environment.

Ultra large-scale GraphDB conversion and augmentation function

The GraphDB conversion engine provides a complicated data conversion process and method, such as knowledge conversion, knowledge GraphDB augmentation, and error correction, for large data sets existing inside and outside of Wikipedia and Wiki data.

The engine provides data collection function, extraction function, resource integration and correction function, and GraphDB generation function. These functions can be added or optimized using the plug-in method. The engine also provides the management API for managing and controlling the conversion process.

Data Conversion Engine Management Function

The conversion engine management tool includes functions such as the data conversion rule edit and e

The conversion engine’s management tool includes functions such as the data conversion rule edit and execution, data source, user function, SPARQL, and resource viewer. Users can write a conversion rule easily and promptly using such functions.
All functions in the conversion engine are based on the namespace, allowing a duplicated function to be used separately. The rule editor in the management tool provides the autocomplete function for variables and constants. When the user’s data model is imported, the editor automatically includes the class and the property in the autocomplete items. Thus the user can create a conversion rule by referring easily to the class and the property.