Data Collection ENGINE Tornado
Saltlux technology
data collection solution Tornado
In a traditional database (DB) environment, the data is generated in an application and DB’s front end, instead of being imported from outside and from processing initiates. Meanwhile, in big data, the data is brought in from outside and from the processing initiates, instead of being generated internally. The data processing starts from the data collection in the big data environment.
The Data Collection Solution by Saltlux has the ability to crawl on multiple platforms, collect big data based on RSS, deep web, metasearch, social networks, and Open API.
Definition and features
Real-time
It is a strong big data processing engine that could perform real-time automatic and parallel big data collection according to users’ preferences.
Multi-platform
Collect data from multiple platforms, from deep web, social networks, IoT, Meta Search, streaming data and even Open API.
Methods
Use both active and passive approaches.
Data processing
Perform data loss and duplication prevention, data compression, data structuring, encryption of stored data, flawless validation, and user convenience
Automation
Extract, convert and store big data automatically from hidden web pages, along with powerful web collection.
Environment
Provide an optimized Big data environment for users to perform multi-faceted analysis (competitors, products, markets and products, risk management and customer voice recognition) in real time.
Application
The Tornado Data Collection solution can be applied to business processes, helping businesses improve business efficiency
Improve efficiency
Helping businesses improve their brand management and feedback respond to VIP customers, as well as contributing to new product development.
Forecast
By in-depth analysis of customer reviews and feedback, businesses can detect abnormalities early and provide a real-time feedback system.
Decision making
Providing a premise for businesses to analyze and evaluate customers' reactions and consumption trends, thereby making timely decisions and strategies.
Highlights
Various collection features (collection based on user scenarios, RSS collection web collection, collection deep web, social collection, collection based on OpenAPI) are built-in for various types of internal and external big data collection following user's needs.
Through a web-based collection rule editor that considers users' usability, the collection rule editor is built to easily extract and collect data from various types of dynamic websites such as JS and AJAX.
It can simultaneously collect a large amount of data using various set rules much faster and more stable through the distributed parallel method. It can also be installed and operated in multiple operating systems (UNIX, Window, etc.).
For user convenience, it provides a feature to confirm the quality of the data collected by data collection simulation in advance with previously generated collection rules through preview before collecting the user data.
Operator/manager canto easily and quickly check the current status through an integrated dashboard, which could monitor the overall condition of the collection engine, and an operation management tool, which could monitor the collection policies and schedule setting per collection source in real-time.
functions
Saltlux Technology's Tornado data collection solution is capable of cross-platform data collecting based on RSS, deep web, metasearch, social networks and OpenAPI. It also provides the functions of operating, simulating, scheduling, monitoring operating status, etc.
Social network data collection function
It has a scheduling feature that allows you to collect multiple types of social data, such as Twitter, public Facebook pages, and Weibo timelines, and set the collection cycle target. It also has a status history view function to verify the status.
Scenario-based Data Collection function
Based on user scenarios from various sites such as news, blogs, shopping malls, and general homepages, data about the collection target is extracted and collected. It provides a scheduling feature to set collection cycles and a status history feature to view collection status within the workbench.
RSS collection function
It provides a feature to read RSS (Really Simple Syndication) feed and extract the data within the collection target feed and original data. It includes a scheduling function that could set the collection cycle and collection status history features in which users can check the collection status even in the workbench.
Deep web collection function
It could easily collect the information within websites by collecting site-wide information based on URLs or filtering with URL patterns or keywords. It also provides the scheduling feature to set the collection cycle and the status history view feature to check the collection status.
Metasearch collection function
It has a keyword-based collection feature that sends user keywords to various search engines, including Google, Bing, Daum, Naver, and Yahoo, to consolidate search results into a single list. It also provides a scheduling feature to efficiently collect and set the collection cycle for the collection target and a status history view feature to check the status.
Open API-based collection function
It provides a scheduling function to easily collect various documents and open data, including domestic public data, overseas public data, and local government public data, while also setting the collection cycle target. It also provides a collection status history view function to verify the status.
Operation management function
Provides a dashboard that monitors and operates the Tornado engine features.
User management function
Allow one or more users to access and assign permissions to users.
Management feature per collection target (project)
Manage each data collection item by different objectives, data sources, or preferences.
Operating Process
Definition of collection tasks
Activities performed by users on the internet (input, click, search, etc.) are collected and stored by collection rule.
Preview on simulations and results
The ability to preview the results to see if the rules set by implementing simulation perform properly.
Implementing collection engine
Collect and store web data based on defined rules by implementing collection engine.
See the results
Verifies the results of informal data collected from the web as semi-formal/formal data through workbench.