Arthur White
Instructor: Arthur White
Email: arwhite@tcd.ie
Office: Room 144, Lloyd Building
Office hours: 10-12am Fridays
Email me to schedule a meeting, or I can also meet you remotely using Teams
All material will appear on blackboard and class page: [scss.tcd.ie/~arwhite/Teaching/CS7DS3.html]
Lectures
Monday 1pm LB 1.07
Friday 9am LB 1.07
Supporting videos will also be available on blackboard
Case studies will accompany lecture material
Email: arwhite@tcd.ie
Discussion board
Your interaction and feedback are crucial
Input from class reps always useful
This module is assessed 100% by coursework, i.e., no exam
2 x small assignments: 15% each. These will be problem sets
Main assignment: 70%. This will be a report describing a detailed analysis of a complex data set
All assignments will be submitted through Turnitin
These will be scheduled with goal to give you plenty of time to complete, especially main assignment. More details to follow.
There is no compulsory textbook for this course, but the following cover different aspects of the material:
P.D. Hoff, A first course in Bayesian statistical methods. Springer, 2009. Library e-link: http://stella.catalogue.tcd.ie/iii/encore/record/C__Rb17405199
S.N. Wood, Core Statistics. Cambridge University Press, 2015. Library link: http://stella.catalogue.tcd.ie/iii/encore/record/C__Rb16031862 and free pdf online: https://people.maths.bris.ac.uk/~sw15190/core-statistics.pdf
C.M. Bishop, Pattern recognition and machine learning. Springer, 2006. Library link: http://stella.catalogue.tcd.ie/iii/encore/record/C__Rb16031862 Free pdf: https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/
B. Efron & T. Hastie. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science Cambridge University Press, 2016. Free pdf: https://web.stanford.edu/~hastie/CASI_files/PDF/casi.pdf
This module will provide an overview of statistical models and how to apply them to analyse data.
We will focus on theory and application:
Our models will be motivated by different research problems
Students from 100 different schools take a standardised test.
Can we quantify which schools are best? By how much?
## school mathscore
## 1 1 52.11
## 2 1 57.65
## 3 1 66.44
## 4 1 44.68
## 5 1 40.57
## 6 1 35.04
## 7 1 50.71
## 8 1 66.17
## 9 1 39.43
## 10 1 46.17
## 11 1 58.76
## 12 1 47.97
Special aerobics vs standard running programme, \(n = 12\)
Can we quantify the programme’s effect on oxygen increase, accounting for age?
## uptake aerobic age
## 1 -0.87 0 23
## 2 -10.74 0 22
## 3 -3.27 0 22
## 4 -1.97 0 25
## 5 7.50 0 27
## 6 -7.25 0 20
## 7 17.05 1 31
## 8 4.96 1 23
## 9 10.40 1 27
## 10 11.05 1 28
## 11 0.26 1 22
## 12 2.51 1 24
Duration of eruption and time between eruptions of Old Faithful geyser, \(n = 272\)
Can we identify groups of similar eruptions times?
## eruptions waiting
## 1 3.600 79
## 2 1.800 54
## 3 3.333 74
## 4 2.283 62
## 5 4.533 85
## 6 2.883 55
## 7 4.700 88
## 8 3.600 85
## 9 1.950 51
## 10 4.350 85
## 11 1.833 54
## 12 3.917 84
“Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine,” Polack et al (2020)
Sample of 43,548 participants randomized to receive mRNA Covid-19 Vaccine or placebo (i.e., control group).
Authors report that 8 cases of Covid-19 recorded from vaccinated patients while 162 cases recorded from the control arm.
Vaccine efficacy estimated by \(VE = 100\times(1-RR),\) where \(RR\) is the estimated ratio of confirmed cases of Covid-19 in vaccine vs. placebo groups.
How effective is the vaccine?
Generically, a statistical model will have parameter(s) \(\theta.\)
A typical statistical analysis will be interested in answering some or all of of these questions:
Often we will have different, related models for consideration, in which case we will have to decide:
We will examine how to answer these questions using both frequentist and Bayesian methods
We are going to study different models:
We will use frequentist and Bayesian inference frameworks to estimate model parameters
We will use optimisation and Monte Carlo computational methods
We will communicate our findings in terms of the original context of the research question