Wiker, feeding their database from web data
Context
Wiker is the first local network to connect all the actors of medium-sized or rural towns via their economic, cultural or social information in order to promote local life, its economy and its short circuits and support the sustainable development of town centers and territories.
To do this, Wiker collects relevant information from the territories from numerous internet sources (sites and open data) - some of which are accessible by API - and aggregates it into a single service. But this collection phase is time consuming and a source of errors. For example, each employee spends an average of half an hour a day checking the websites monitored, retrieving interesting information and labeling them ("event", "article" for example). The multiplicity of employees can lead to variations in labeling.
Examples of source pages
Solution
Scenario used
1
Scraping worker and API connectors
Retrieve information from the web
2
Worker
ETL
Cleanses and formats data
3
Prediction Worker
Label the recordings
4
Worker
INSEE
Add the INSEE codes to the municipalities
5
Worker
Database
Save to database
SmartMyData - Example of final results with labeling and INSEE codes
(Screenshot from a Wiker table)
Return on investment
Automate the collection of information from a wide variety of sources
Automatic labeling of records via a Machine Learning module
Reduced risks and errors linked to human intervention
Reallocation of employees to more rewarding tasks