GraphDB Conversion Engine
GraphDB Conversion Engine
Structured data means data such as RDBMS, Excel, CSV, TSV and RDF that has a particular structure, and unstructured data means data that has the structure of documents such as web documents and manuals. The process to convert and integrate data into graph data begins in order to store such data in GraphDB so that a very complicated relationship between various large-scale data scattered internally and externally can be determined more easily and promptly.
For the data conversion method, it is possible to map the data source structure and the knowledge graph data model, or extract data in the form of the property and value of a certain resource corresponding to a knowledge graph model and convert such data into graph data. Also, RDB and RDF can be directly connected and integrated through W3C’s RDF Direct Mapping technology.
Saltlux’s data conversion engine supports RML (Rule Mapping Language) which is the mapping language used for conversion and integration, and W3C’s R2RML and this engine support RDB, various data sources and data purification and filtering in the mapping process, so that it is appropriate for securing and processing high quality graph data.
< GraphDB conversion engine – Conceptual diagram of data conversion >
The GraphDB conversion engine is the engine used to create data corresponding to knowledge graphs through mapping between data sources (DBMS, CSV, RDF, etc.) and the knowledge graph model. This engine provides the function to convert all data that have a structured form, such as RDB, through the support of W3C’s R2RML language, as well as the provision of the RML language, which is the internal data conversion rule and the function to convert and process user data into the virtual data view. The user can perform the data conversion easily and promptly through the graph data conversion engine.
< GraphDB conversion engine – Block diagram of functions >
The data conversion engine’s management function provides useful functions, including data mapping and conversion, data source viewer, data model (schema) viewer, SPARQL viewer and test, CSV/Excel file viewer, RML editor and test, and the conversion statistics function for the user to carry out data conversion and processing (data pre-processing, conversion, data post-processing).
< Graph data conversion process >
The data conversion process is carried out as data source selection, generation of data view corresponding to the data source, graph map definition, data view and graph map binding, and graph data generation. The graph map defines an instance corresponding to a graph model, and if filtering and purification of the value are necessary for creating the property value of a certain resource, a function is used for that process.
< Large scale unstructured data extraction process and tool >
The GraphDB conversion engine can directly define a data view defined by the user and the filtering function and apply them to the engine by providing a virtual data view that supports large-scale data conversion and various data sources and providing data purification and filtering for data conversion. The biggest advantage of the graph data conversion engine is that the user can create data views or user functions as a plug-in, and they all have a URL address, so the same function can be used separately for other projects (operation) through the URL. Also, the configuration for each project in progress can be managed through the linkage with the configuration management servers (SVN, CVS, Git, etc.). This engine has the following features.
Main Functions and Specifications
The GraphDB conversion engine that handles the graph data generation for structured and unstructured data in the GraphDB Suite is configured as the management tool that supports the core data conversion function and easy conversion operation. Structured data can be extracted and converted through schema mapping, and in the case of unstructured data, necessary property values for a data model can be extracted and converted by combining the KENT’s data extraction function.
- Data conversion function that supports various formats
The data conversion function of the GraphDB Suite provides the procedures and method for generating graph data and the functions to test the result in advance before conversion and storing the conversion result directly in GraphDB. Core functions are configured mostly in the form of plug-ins, so these functions can be optimized according to the user environment.
- Ultra large-scale GraphDB conversion and augmentation function
The GraphDB conversion engine provides a complicated data conversion process and method for large data sets existing inside and outside of Wikipedia and Wiki data, such as knowledge conversion, knowledge GraphDB augmentation, and error correction.
The functions, including data collection, extraction, resource integration and correction, and GraphDB generation, are provided, and functions can be added or optimized using the plug-in method. The management API for managing and controlling the conversion process is also provided.
- Data Conversion Engine Management Function
The conversion engine management tool includes functions such as the data conversion rule edit and execution, data source, user function, SPARQL and resource viewer, and the user can write a conversion rule easily and promptly using such functions. All functions in the conversion engine are based on the name space, so a duplicated function can be used separately through the name space. The rule editor in the management tool provides the autocomplete function for variables and functions, and when the user’s data model is imported, the editor automatically includes the class and the property in the autocomplete items, so the user creates a conversion rule by referring easily to the class and the property.