Creation of an argus from classified ads

Context

How to create a forklift argus from web data?

A company specializing in material handling equipment is seeking to create an argus from second hand advertisements on specialized sites across the globe. The data of some of the sites are already retrieved, normalized and grouped together in a file. Some attributes are already extracted but the identification of the material requires a business operator because the same model can be designated in several forms: with or without spaces, sometimes with spelling errors or with additional information depending on the editors and site repositories. It is therefore difficult to identify models without a standard repository.

The mission consists in retrieving the content of the advertisements on all the specialized sites then in the standardization and the regrouping of the data around the correctly identified materials according to the repository of the company. The whole will feed into a database which will be used to build the argus.

Examples of source pages

Solution

Scenario used

1

Scraping worker and API connectors

Retrieve information from the web

2

ETL
Worker

Cleanses and formats data

3

Prediction
Worker

Label the recordings

4

Database
Worker

Save to database

Return on investment

Automate the collection of information from a wide variety of sources

Cleaning, standardization, control and consolidation of information

Automatic labeling of records via a Machine Learning module

Reallocation of employees to more rewarding tasks

And then ?

It would be quite possible to create a data visualization showing the data by model, price, country or any other required filter.