Data Collection ENGINE Tornado
data collection solution Tornado
In a traditional database (DB) environment, the data is generated in an application and DB’s front end, instead of being imported from outside and from processing initiates. Meanwhile, in big data, the data is brought in from outside and from the processing initiates, instead of being generated internally. The data processing starts from the data collection in the big data environment.
Definition and features
The Tornado Data Collection solution can be applied to business processes, helping businesses improve business efficiency
Various collection features (collection based on user scenarios, RSS collection web collection, collection deep web, social collection, collection based on OpenAPI) are built-in for various types of internal and external big data collection following user's needs.
Through a web-based collection rule editor that considers users' usability, the collection rule editor is built to easily extract and collect data from various types of dynamic websites such as JS and AJAX.
It can simultaneously collect a large amount of data using various set rules much faster and more stable through the distributed parallel method. It can also be installed and operated in multiple operating systems (UNIX, Window, etc.).
For user convenience, it provides a feature to confirm the quality of the data collected by data collection simulation in advance with previously generated collection rules through preview before collecting the user data.
Operator/manager canto easily and quickly check the current status through an integrated dashboard, which could monitor the overall condition of the collection engine, and an operation management tool, which could monitor the collection policies and schedule setting per collection source in real-time.
Saltlux Technology's Tornado data collection solution is capable of cross-platform data collecting based on RSS, deep web, metasearch, social networks and OpenAPI. It also provides the functions of operating, simulating, scheduling, monitoring operating status, etc.
Social network data collection function
Scenario-based Data Collection function
RSS collection function
Deep web collection function
Metasearch collection function
Open API-based collection function
Operation management function
User management function
Management feature per collection target (project)
Definition of collection tasks
Activities performed by users on the internet (input, click, search, etc.) are collected and stored by collection rule.
Preview on simulations and results
The ability to preview the results to see if the rules set by implementing simulation perform properly.
Implementing collection engine
Collect and store web data based on defined rules by implementing collection engine.
See the results
Verifies the results of informal data collected from the web as semi-formal/formal data through workbench.