Skip to main content

Trinity College Dublin, The University of Dublin

Menu Search

Module Descriptor School of Computer Science and Statistics

Module CodeCS7DS1
Module Short Title
Semester Taught1st
Contact Hours

4 lectures and 1 lab per week.

Module PersonnelProfessor Myra O’ Regan
Learning Outcomes

To understand the theory and be able to apply the following techniques to  a set of data

  • Classification and Regression Trees
  • Ensemble methods



           Random forests

           RuleFit Procedute

  • Evaluation of models
Learning Aims

The aim of the course is to introduce the students to a set of techniques including classification and regression trees, and ensemble methods. Methods to evaluate models will also be discussed.

Module Content
  • Overview of Field

  • Handling data

  • Missing data

  • Detailed discussion of Classification and Regression trees

  • General Overview of Ensemble methods

  • Detailed Discussion of Bagging


                                         Random Forests

                                         RuleFit Procedure

  • Detailed discussion of Model evaluation

  • Handling unbalanced datasets.

  • Other methods of growing trees

  • Stacking

Recommended Reading List


  • Ayres, I.  Supercrunchers, How anything can be predicted, John Murray, 2007.
  • Berry M. J, A., & Linoff, G.  Data Mining Techniques, John Wiley & Sons, 2011.
  • Bishop, Christopher, Pattern Recognition and Machine Learning, Springer Science, 2006.
  • Breiman, L., Friedman, J. H. Olshen, R. A. & Stone, C. J.  Classification and regression Trees, Chapman and Hall,1984
  • Davenport, T.H. Harris, J.G. Competing on Analytics, The New Science of Winning, Harvard Business School Press, 2007.
  • Chang, Winston R Graphics Cookbook, O’Reilly, Sebastopol, CA, 2013
  • Efron, Bradley & Hastie ,Trevor Computer Age Statistical Inference Algorithms, evidence and Data Science, Cambridge University Press, 2016 (available online).
  • Hand, D., Mannila, H. & Smyth P. Principles of Data Mining, MIT Press, 2001.
  • Hastie Trevor, Tibshirani, R., Friedman, J.  The Elements of Statistical Learning, 2nd Edition, Springer Series, 2009
  • James Gareth, Witten Daniela, Hastie Trevor, Tibshirani Robert, An Introduction to Statistical Learning, Springer Series, 2013
  • Japkowicz, N & Shah Mohak Evaluating learning Algorithms, Cambridge University Press 2011.
  • Kuhn, Max & Johnson, K. Applied Predictive Modeling, Springer, 2013.
  • Ripley, B. D.  Pattern recognition and Neural Networks, Cambridge University  
  • Seni, G. and Elder J. Ensemble methods in Data Mining, Morgan & Claypool, 2010
  • Tan, Pang-Ning Steinbach, M. Kumar, V.  Introduction to Data Mining, Pearson, 2006
  • Torgo, Luis Data Mining with R: Learning with Case Studies, Chapman and Hall, 2011
  • Tuffery Stephane,  Data Mining and Statistics for Decision Making, John Wiley & Sons, 2011
  • Unwin, A.  Graphical Data Analysis with R,CRC press, 2015
  • Webb, Andrew, Copsey, K Statistical Pattern Recognition 3rd Edition, Wiley, 2011
  • Zhou Zhi-Hua Ensemble Methods Foundation and Algorithms, Chapman and Hall, 2012



Module Prerequisites

A course on Multivariate Analysis covering principal components multiple regression, clustering techniques and logistic regression.  A good working knowledge of R is also required.    

Assessment Details

Students will be required to carry out a project worth 40% of the total marks with an exam in the n accounting for the remaining 60%.

Assessment in the Supplemental session will be based on 100% exam.

Module Website
Academic Year of Data2017/18