Web information resources acquisition system
Product introduction: KLAND-Spider web information resources acquisition system is a set of network information resource development, utilization and integration system, which can be used to customize tracking and collecting the real-time information on the Internet and establish the reusable information service. KLAND-Spider is able to collect specific information that users are interested in from a variety of network sources, including webpages, BLOG, BBS, etc., and provide them for end-users in multiple forms after automatic classification.
KLAND-Spider can quickly and timely capture the network information content that the users need, such as market intelligence, policies and regulations, industry information and hot news, which can be widely used in the construction of the enterprise portal website, intelligence collection, public opinion analysis and network sensitive information monitoring.
Product function: KLAND-Spider network information resources acquisition system consists of four subsystems i.e. acquisition navigator, web spider, data processor and publishing system.
Acquisition navigator is used to customize to set the acquisition target. The web spider captures data from the user-defined website to form a data packet (data table) and sends it to the data processor, which analyzes the captured data by filtering, automatically classifies it by site, channel, keyword, or other classification models, saves it in a local database, and releases it in a selected format or style through a publishing system, thus making it easier for users.
Product features: The flexibility of the acquisition method, the diversity of acquisition sources, the accuracy of the collected data and the automation of the incremental collection.
*Support a variety of forms of webpages: Static webpages, dynamic webpages, document webpages (Word, EXCEL, PDF, etc.);
*Support turning navigation and content pages;
*Support the acquisition of embedded tables;
*Support annex acquisition and analysis of articles (Word, EXCEL, PDF, etc.);
*Acquisition of metadata automatic test of analysis results;
*Removal of the acquisition results;
*Automatic collection of new information on the target website (time interval can be set).