img


McKinsey - Hack the crash

Predicting damage inflicted in traffic accidents.

Data Preprocessing

Feature Selection

Data Analysis

Plotting distributions also helps to realize if selected features behave properly just before training.

img

Machine Learning Pipeline

I've used custom pipeline strategy to train and test 5 different ML algorithms. I'll describe it on example of Decision Tree Classifier.

Model Selection

 

We choose Decision Tree Classifier as our algorithm.

Model Selection - Part II - choosing features

As a result none of our feature selection technique performed better than whole features set.

Lesson for future projects

Summary

The best algorithm for predicting damage inflicted in traffic accidents is

DecisionTreeClassifier(criterion='entropy', class_weight='balanced', max_depth=5)

working on all features and whole set gained F1 Score = 34.17

 

Mateusz Dorobek, Piotr Podbielski, Aitor Mato, Jaume Mora Viñes - Team Safely - HACK UPC 2019