Thursday, May 19, 2022

Improving the forecast using the ML model

We, like many other large companies, do not stand still, but implement time-tested data models.


We recently had another team in R&D that was able to create a pipeline for training and using ML models in production. Thanks to this, we began to use ML models in combat conditions.


The guys from the Data Science team help us improve the quality of the forecast. On historical data, using the CatBoost library, they trained a gradient boosting model to predict the Last mile segment. On each prediction, our service goes to the experiment service, where it receives the model identifier, with which we go to another service, passing the identifier and data for prediction:

  • restaurant information;
  • information about the courier;
  • the time the order was created.


Then we get the net time by which the courier should arrive at the client. This whole process looks like a black box for us, so if we want to retrain and test the model, we just need to get a new experiment ID and go with it for new predictions. In the course of a smooth transition to the forecast from the ML model, we reduced the MAE (mean absolute error) for Last mile.

How to control order forecast in logistics software

For example, to control the forecast error, we listen to another topic on the actual order labels. We subtract our forecast from the fact, divide by the number of orders and get the error percentage. Let's say the service predicted the arrival at the restaurant at 15:00, and in fact the courier clicked the status "came to the restaurant" at 15:05 in the application. So, we can calculate the error using the formula:

duration of the forecast = (forecast of the arrival at the restaurant) - (the fact of the beginning of the order)

actual duration = (actual arrival at the restaurant) - (actual start of the order)

error in seconds = forecast duration - actual duration

The modulo error in this case will be equal to 300 seconds.


We also have a graph of the average delivery time for N orders. It helps to see various outliers and anomalies. Thanks to this, we have secured ourselves from accidental changes in the code base. We don't have to wait for analytics charts, which are usually collected for the previous day, we can track forecast statistics in real time.

Improving the forecast using the ML model

We, like many other large companies, do not stand still, but implement time-tested data models. We recently had another team in R&D that...