Edge-ML : usage contexts and performances

icones fusee

We provide you with examples of datasets and scripts in order to reproduce some experiments. Thus, you can easily assess the use and the performances of Edge-ML in real life conditions.

If you do not have time to reproduce these experiments, you can find for each use-case a short description of the used dataset, the exploited hardware resources and the reached performances

Features of the experiments
  • Size of the dataset (rows x columns)
  • Number of classe values to be predicted
  • Time spent on data preparation
  • Processing time for the training of the model
  • Used hardware resources
  • Obtained performance (average AUC over the class values)
  • Robustness of the model (AUC Train vs AUC Test)
Logo casusage 2

Insurance: Marketing Targeting

The aim of the project is to predict customers' interest to contract a caravan insurance policy (without knowing which customers actually own a caravan). Three types of variables are used to describe the clients: i) their use of other insurance products; (ii) socio-demographic information; (iii) local information derived from zip area. The ROI of a telemarketing campaign can be optimized by using a model to target the most interested customers.

Results of the experiment

Features of the experiments

  • Size of the dataset: 12 000 X 10
  • Number of classe values: 2
  • Time spent on data preparation: 1 minute
  • Processing time for training: 1 seconde
  • Hardware: Laptop, Core i7 2014 - 8go RAM
  • Performance (AUC) : Train = 0.969 / Test = 0.971
  • Robustness (Delta AUC) : 0.02

Comments on the obtained results

The learning stage is almost instantaneous for this dataset size. The learned model is both very accurate (AUC close to 1) and very robust (there is no significant difference between the AUC measured on the train and test sets)

Download the script and the dataset to reproduce the experience: link

Logo casusage 1

Security:  Network intrusion detector

The goal of the project is to build a network intrusion detector by using Machine Learning techniques. 22 types of known attacks have been simulated in a military network environment. The input variables technically describe the connections to the network (eg used protocol, connection time, etc.). This use-case shows the security of a network can be improved continuously thanks to Auto ML techniques.

Results of the experiment

Features of the experiments

  • Size of the dataset: 3 millions X 42
  • Number of classe values: 23
  • Time spent on data preparation: 1 minute
  • Processing time for training: 2h35
  • Hardware: Laptop, Core i7 2014 - 8go RAM
  • Performance (AUC) : Train = 0.999 / Test = 0.999
  • Robustness (Delta AUC) : 0.0001

Comments on the obtained results

Edge ML is optimized to process large amount of data and requires minimal hardware resources. Here, the model is learned on 3 million rows in 2 hours and 35 minutes, by using a common laptop.

Download the script and the dataset to reproduce the experience: link

Logo casusage 3

Environment: Forest Management

The aim of the project is to optimize the choice of tree species for a reforestation purpose, from cartographic data. This dataset contains forest cells described by altitude, slope, exposure, distance to the nearest water point, soil type ... The variable to be predicted is one of the 7 types of cover present on the cells. Edge ML solves this problem with a very hight accuracy.

Results of the experiment

Features of the experiments

  • Size of the dataset: 465 000 X 54
  • Number of classe values: 7
  • Time spent on data preparation: 1 minute
  • Processing time for training: 13 minutes
  • Hardware: Laptop, Core i7 2014 - 8go RAM
  • Performance (AUC) : Train = 0.935 / Test = 0.934
  • Robustness (Delta AUC) : 0.001

Comments on the obtained results

The model must predict one of the 7 forest cover types: this is a multi-class learning problem. Edge ML natively learns a multi-class classifier (based on the MODL approach) without resorting to the usual heuristic which consists in learning several classifiers (one versus all). The learning stage is faster and the model is easier to use.

Download the script and the dataset to reproduce the experience: link

Logo casusage 4

IOT and Sensors: Activity detection

The aim of the project is to detect the type of activity (run or walk) from sensor data. This dataset contains measurements which are collected every 10 seconds by using the gyroscope and the accelerometer of an iPhone 5S. The model has been learned by exploiting minimalist hardware resources: a Raspberry Pi 2 (resources lower than a current smartphone). The learned model is accurate and very robust. Edge-ML paves the way for learning secure models directly on devices!

Results of the experiment

Features of the experiments

  • Size of the dataset: 71 000 X 8
  • Number of classe values: 2
  • Time spent on data preparation: 1 minute
  • Processing time for training: 30 secondes
  • Hardware: Raspberry Pi 2, Model B
  • Performance (AUC) : Train = 0.995 / Test = 0.904
  • Robustness (Delta AUC) : 0.001

Comments on the obtained results

The model is learned in 30 seconds by using a Raspberry Pi 2. Thus, it is possible to learn the models directly on the devices, without externalizing the data collected by the smartphones and the IOT . Edge ML paves the way for new uses of Machine Learning 100% privacy-friendly!

Download the script and the dataset to reproduce the experience: link

Logo casusage 5

Online ads re-targeting

The goal is to estimate the click rate of an online ad when it is presented to a particular user. The volume of collected data for online ads re-targeting is very important, thus it is necessary to use scalable Machine Learning algorithms. Edge ML pushes the limites by processing several tens of millions of rows on a standard server (i.e. Xeon 8 core / 64 GB RAM).

Results of the experiment

Features of the experiments

  • Size of the dataset: 10 millions X 40
  • Number of classe values: 2
  • Time spent on data preparation: 3 minute
  • Processing time for training: 14h
  • Hardware: Server, Xeon 4 cores 64go RAM
  • Performance (AUC) : Train = 0.763 / Test = 0.755
  • Robustness (Delta AUC) : 0.008

Comments on the obtained results

This use case allows you to evaluate the scaling capability of Edge ML by using several training sets with different sizes. Using a standard server, the model is learned in 13 minutes on 1 million examples, in 44 minutes on 2 million examples, in 1h30 on 3 million examples and in 14h on 10 million examples. Edge ML pushes the limits of your hardware :-)

Download the script and the dataset to reproduce the experience: link

Logo casusage 6

Productivity: E-mail categorization

The goal of the project is to automatically classify e-mails into 10 categories. In this case, the e-mails are characterized by several "sequencial" variables. The object and body of each e-mail are considered as sequences of words. The sender and recipient information (organizations, countries, etc.) are encoded as sets. Edge ML automaticaly prepares these complex kind of data by extracting relevant and robust sub-sequences and sub-sets.

Results of the experiment

Features of the experiments

  • Size of the dataset: 17 000 X 6 sequences
  • Number of classe values: 10
  • Time spent on data preparation: 10 minute
  • Processing time for training: 18 minutes
  • Hardware: Laptop, Core i7 2014 - 8go RAM
  • Performance (AUC) : Train = 0.979 / Test = 0.959
  • Robustness (Delta AUC) : 0.02

Comments on the obtained results

Edge ML processes sequential variables seamlessly: a single command line is enough to extract the relevant sequential rules and learn an ensemble classifier. The slight decrease in robustness is due to the fact that the rules extracted from textual data are generally not independent ( in this case, the option '-lessRule' is recommended).

Download the script and the dataset to reproduce the experience: link

Logo casusage 7

Sentiment analysis: Customer feedback on a product catalog

This project aims to predict the ratings of the products which are sold on an e-commerce web site, based on the written opinions of customers. These textual data is processed in their raw state, as sequences of words (no pre-processing is performed - eg lemmatization). Edge ML automatically extracts sub-sequences that are both relevant and robust, and then automatically learns a predictive model. Edge ML solves this problem with great precision, while providing an easily interpretable model and rules.

Results of the experiment

Features of the experiments

  • Size of the dataset: 100 000 X 1 sequence
  • Number of classe values: 2
  • Time spent on data preparation: 5 minutes
  • Processing time for training: 1h45 minutes
  • Hardware: Laptop, Core i7 2014 - 8go RAM
  • Performance (AUC) : Train = 0.911 / Test = 0.909
  • Robustness (Delta AUC) : 0.0023

Comments on the obtained results

The extracted rules are easily interpretable and constitute a valuable help for the Features Engineering step. For instance, Edge ML extracts the following rules:

  • "I + highly + recommend"
  • "dont + waste + your + money"
  • "This + is + a + great"

Download the script and the dataset to reproduce the experience: link