上QQ阅读APP看书，第一时间看更新

Gradient boosting

Gradient boosted trees are an ensemble of shallow trees (or weak learners). The shallow decision trees could be as small as a tree with just two leaves (also known as decision stump). The boosting methods help in reducing bias mainly but also help reduce variance slightly.

Original papers by Breiman and Friedman who developed the idea of gradient boosting are available at following links:

Prediction Games and Arcing Algorithms by Breiman, L at https://www.stat.berkeley.edu/~breiman/games.pdf
Arcing The Edge by Breiman, L at http://statistics.berkeley.edu/sites/default/files/tech-reports/486.pdf
Greedy Function Approximation: A Gradient Boosting Machine by Friedman, J. H. at http://statweb.stanford.edu/~jhf/ftp/trebst.pdf
Stochastic Gradient Boosting by Friedman, J. H. at https://statweb.stanford.edu/~jhf/ftp/stobst.pdf

Intuitively, in the gradient boosting model, the decision trees in the ensemble are trained in several iterations as shown in the following image. A new decision tree is added at each iteration. Every additional decision tree is trained to improve the trained ensemble model in previous iterations. This is different from the random forest model where each decision tree is trained independently from the other decision trees in the ensemble.

The gradient boosting model has lesser number of trees as compared to the random forests model but ends up with a very large number of hyperparameters that need to be tuned to get a decent gradient boosting model.

An interesting explanation of gradient boosting can be found at the following link: http://blog.kaggle.com/2017/01/23/a-kaggle-master-explains-gradient-boosting/.