Creation of an argus from classified ads
How to create a forklift argus from web data?
A company specializing in material handling equipment is seeking to create an argus from second hand advertisements on specialized sites across the globe. The data of some of the sites are already retrieved, normalized and grouped together in a file. Some attributes are already extracted but the identification of the material requires a business operator because the same model can be designated in several forms: with or without spaces, sometimes with spelling errors or with additional information depending on the editors and site repositories. It is therefore difficult to identify models without a standard repository.
The mission consists in retrieving the content of the advertisements on all the specialized sites then in the standardization and the regrouping of the data around the correctly identified materials according to the repository of the company. The whole will feed into a database which will be used to build the argus.
Examples of source pages
Scraping worker and API connectors
Retrieve information from the web
Cleanses and formats data
Label the recordings
Save to database
Return on investment
Automate the collection of information from a wide variety of sources
Cleaning, standardization, control and consolidation of information
Automatic labeling of records via a Machine Learning module
Reallocation of employees to more rewarding tasks
And then ?
It would be quite possible to create a data visualization showing the data by model, price, country or any other required filter.