Abstract
We compare several unsupervised probabilistic machine learning methods for market basket analysis, namely binary factor analysis, two topic models (latent Dirichlet allocation and the correlated topic model), the restricted Boltzmann machine and the deep belief net. After an overview of previous applications of unsupervised probabilistic machine learning methods to market basket analysis we ...
Abstract
We compare several unsupervised probabilistic machine learning methods for market basket analysis, namely binary factor analysis, two topic models (latent Dirichlet allocation and the correlated topic model), the restricted Boltzmann machine and the deep belief net. After an overview of previous applications of unsupervised probabilistic machine learning methods to market basket analysis we shortly present the methods which we investigate and outline their estimation. Performance is measured by tenfold cross-validated log likelihood values. Binary factor analysis vastly outperforms topic models. The restricted Boltzmann machine attains a similar performance advantage over binary factor analysis. Overall, a deep belief net with 45 variables in the first and 15 variables in the second hidden layers turns out to be the best model. We also compare the investigated machine learning methods with respect to ease of interpretation and runtimes. In addition, we show how to interpret the relationships between hidden variables and observed category purchases. To demonstrate managerial implications we estimate the effect of promoting each category both on purchase probability increases of other product categories and the relative increase of basket size. Finally, we indicate several possibilities to extend restricted Boltzmann machines and deep belief nets for market basket analysis.