##
**Multiple Regression** Using lm() and glm() (Case study: *Carbohydrate Diet*)

In this dataset (taken from: An Introduction to Generalized Linear Models, A. J. Dobson & A. G. Barnett, 3rd edition p.96),
the response variable y corresponds to the percentage of total calories
obtained from complex carbohydrates for 20 male insulin-dependent diabetics who have been on a high carbohydrate diet for six months.
Additional information is collected about the individuals taking part in the study including age (in years), weight (relative to ideal weight)
and other calories intake from protein (as percentage).

```
carbohydrate=c(33,40,37,27,30,43,34,48,30,38,50,51,30,36,41,42,46,24,35,37)
age=c(33,47,49,35,46,52,62,23,32,42,31,61,63,40,50,64,56,61,48,28)
weight=c(100,92,135,144,140,101,95,101,98,105,108,85,130,127,109,107,117,100,118,102)
protein=c(14,15,18,12,15,15,14,17,15,14,17,19,19,20,15,16,18,13,18,14)
#using lm
res.lm=lm(carbohydrate~age+weight+protein)
summary(res.lm)
#using glm
res.glm=glm(carbohydrate~age+weight+protein,family=gaussian)
summary(res.glm)
```

##
**Computation of Least square estimate** using Linear algebra in R (Case study: *Carbohydrate Diet*)

In this dataset (taken from: An Introduction to Generalized Linear Models, A. J. Dobson & A. G. Barnett, 3rd edition p.96),
the response variable y corresponds to the percentage of total calories
obtained from complex carbohydrates for 20 male insulin-dependent diabetics who have been on a high carbohydrate diet for six months.
Additional information is collected about the individuals taking part in the study including age (in years), weight (relative to ideal weight)
and other calories intake from protein (as percentage).

This code used linear algebra to recover some of the results found by functions lm() and glm() used in the previous example (same dataset).

```
carbohydrate=c(33,40,37,27,30,43,34,48,30,38,50,51,30,36,41,42,46,24,35,37) # response vector
age=c(33,47,49,35,46,52,62,23,32,42,31,61,63,40,50,64,56,61,48,28)
weight=c(100,92,135,144,140,101,95,101,98,105,108,85,130,127,109,107,117,100,118,102)
protein=c(14,15,18,12,15,15,14,17,15,14,17,19,19,20,15,16,18,13,18,14)
#
X=matrix(1,20,4)
X[,1]=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
X[,2]=age
X[,3]=weight
X[,4]=protein
# least square estimate
BetaHat=solve(t(X)%*%X)%*%t(X)%*%carbohydrate
fitted =X %*% BetaHat
residuals=carbohydrate-fitted
SSE=t(residuals)%*%residuals
sigmahatsq=SSE/(length(carbohydrate)-length(BetaHat))
#Residual standard error
sqrt(sigmahatsq)
# uncertainty of BetaHat (standard error)
CovBetaHat=solve(t(X)%*%X)*as.numeric(sigmahatsq)
Std.Error.BetaHat=sqrt(diag(CovBetaHat))
#Multiple R-squared: (coefficient of determination)
S0=sum((carbohydrate-mean(carbohydrate))*(carbohydrate-mean(carbohydrate)))
Rsq=(S0-SSE)/S0
```