
Gradient boosting
Gradient boosted trees are an ensemble of shallow trees (or weak learners). The shallow decision trees could be as small as a tree with just two leaves (also known as decision stump). The boosting methods help in reducing bias mainly but also help reduce variance slightly.
- Prediction Games and Arcing Algorithms by Breiman, L at https://www.stat.berkeley.edu/~breiman/games.pdf
- Arcing The Edge by Breiman, L at http://statistics.berkeley.edu/sites/default/files/tech-reports/486.pdf
- Greedy Function Approximation: A Gradient Boosting Machine by Friedman, J. H. at http://statweb.stanford.edu/~jhf/ftp/trebst.pdf
- Stochastic Gradient Boosting by Friedman, J. H. at https://statweb.stanford.edu/~jhf/ftp/stobst.pdf
Intuitively, in the gradient boosting model, the decision trees in the ensemble are trained in several iterations as shown in the following image. A new decision tree is added at each iteration. Every additional decision tree is trained to improve the trained ensemble model in previous iterations. This is different from the random forest model where each decision tree is trained independently from the other decision trees in the ensemble.
The gradient boosting model has lesser number of trees as compared to the random forests model but ends up with a very large number of hyperparameters that need to be tuned to get a decent gradient boosting model.