Skip to main content

Trinity College Dublin, The University of Dublin

Menu Search



Module Descriptor School of Computer Science and Statistics

Module CodeCS7DS1
Module NameDATA ANALYTICS
Module Short Title
ECTS10
Semester Taught1st
Contact Hours

4 lectures and 1 lab per week.

Module PersonnelProfessor Myra O’ Regan
Learning Outcomes

To understand the theory and be able to apply the following techniques to  a set of data

  • Classification and Regression Trees
  • Ensemble methods

            Bagging

            Boosting

           Random forests

           RuleFit Procedute

  • Evaluation of models
Learning Aims

The aim of the course is to introduce the students to a set of techniques including classification and regression trees, and ensemble methods. Methods to evaluate models will also be discussed.

Module Content
  • Overview of Field

  • Handling data

  • Missing data

  • Detailed discussion of Classification and Regression trees

  • General Overview of Ensemble methods

  • Detailed Discussion of Bagging

                                         Boosting

                                         Random Forests

                                         RuleFit Procedure

  • Detailed discussion of Model evaluation

  • Handling unbalanced datasets.

  • Other methods of growing trees

  • Stacking

Recommended Reading List

 

  • Ayres, I.  Supercrunchers, How anything can be predicted, John Murray, 2007.
  • Berry M. J, A., & Linoff, G.  Data Mining Techniques, John Wiley & Sons, 2011.
  • Bishop, Christopher, Pattern Recognition and Machine Learning, Springer Science, 2006.
  • Breiman, L., Friedman, J. H. Olshen, R. A. & Stone, C. J.  Classification and regression Trees, Chapman and Hall,1984
  • Davenport, T.H. Harris, J.G. Competing on Analytics, The New Science of Winning, Harvard Business School Press, 2007.
  • Chang, Winston R Graphics Cookbook, O’Reilly, Sebastopol, CA, 2013
  • Efron, Bradley & Hastie ,Trevor Computer Age Statistical Inference Algorithms, evidence and Data Science, Cambridge University Press, 2016 (available online).
  • Hand, D., Mannila, H. & Smyth P. Principles of Data Mining, MIT Press, 2001.
  • Hastie Trevor, Tibshirani, R., Friedman, J.  The Elements of Statistical Learning, 2nd Edition, Springer Series, 2009
  • James Gareth, Witten Daniela, Hastie Trevor, Tibshirani Robert, An Introduction to Statistical Learning, Springer Series, 2013
  • Japkowicz, N & Shah Mohak Evaluating learning Algorithms, Cambridge University Press 2011.
  • Kuhn, Max & Johnson, K. Applied Predictive Modeling, Springer, 2013.
  • Ripley, B. D.  Pattern recognition and Neural Networks, Cambridge University  
  • Seni, G. and Elder J. Ensemble methods in Data Mining, Morgan & Claypool, 2010
  • Tan, Pang-Ning Steinbach, M. Kumar, V.  Introduction to Data Mining, Pearson, 2006
  • Torgo, Luis Data Mining with R: Learning with Case Studies, Chapman and Hall, 2011
  • Tuffery Stephane,  Data Mining and Statistics for Decision Making, John Wiley & Sons, 2011
  • Unwin, A.  Graphical Data Analysis with R,CRC press, 2015
  • Webb, Andrew, Copsey, K Statistical Pattern Recognition 3rd Edition, Wiley, 2011
  • Zhou Zhi-Hua Ensemble Methods Foundation and Algorithms, Chapman and Hall, 2012

 

 

Module Prerequisites

A course on Multivariate Analysis covering principal components multiple regression, clustering techniques and logistic regression.  A good working knowledge of R is also required.    

Assessment Details

Students will be required to carry out a project worth 40% of the total marks with an exam in the n accounting for the remaining 60%.

Assessment in the Supplemental session will be based on 100% exam.

Module Website
Academic Year of Data2017/18