Trinity College Statistics Workshop 2012
8th November 2012, Large Conference Room, O'Reilly Institute

The Discipline of Statistics, a part of the School of Computer Science and Statistics, is delighted to host the second Trinity College Statistics Workshop. This will be a full 1 day meeting that is open to anyone in Trinity College and beyond who has research interests in the theory and application of statistical methods. We want to encourage both those who consider themselves statisticians and those with research problems that require statistical analysis to come and present their work.


Location
The Large Conference Room, O'Reilly Institute, Trinity College. The O'Reilly Institute is located at the east end of the campus (see map here). Enter the Institute on the first floor from the steps opposite the Lloyd Institute; the Large Conference Room is on the left.


Programme

9:45    Welcome Reception
in the O'Reilly Institute Lobby.  Coffee and tea will be available.

10:00    Introduction by Head of Discipline

Session 1:

10:10    Robust statistics with applications in computer vision. Rozenn Dahyot (Abstract)

10:35    Robust L2-based model fitting for shape reconstruction.  Claudio Arellano (Abstract)

11:00    Sequential Inference with Iterated Laplace Approximation.  Tiep Mai (Abstract)

11:25    Coffee

Session 2:

11:45    Exploring the effect of biodiversity on multiple ecosystem function in grassland systems.  Aine Dooley (Abstract)

12:10    A Bayesian approach to the construction and inversion of the radiocarbon calibration curves.  Thinh Doan (Abstract)

12:35    Functionality and accuracy of phosphorous load apportionment models.  Lucy Crockford (Abstract)

1:00    Lunch

Session 3:

1:45    Incorporating data from various trial designs into a mixed treatment comparison model.  Susanne Schmitz (Abstract)

2:10    A conjugate class of utility functions for sequential decision problems.  Donnacha Bolger (Abstract)

2:35    Parametric and topological inference for masked system lifetime data.  Louis Aslett (Abstract)

3:00    Coffee

Session 4:

3:20    Model choice for social networks using the collapsed latent position cluster model.  Triona Ryan (Abstract)

3:45    Modelling the risk of spacecraft re-entry explosion.  Cristina De Persis (Abstract)

           

Abstracts

Robust statistics with applications in computer vision.

Rozenn Dahyot


We introduce the Generalised Relaxed Radon Transform (GR2T) as an extension to the Generalised Radon Transform (GRT) [1]. This new modelling allows us to define a new framework for robust inference where the resulting objective functions are probability density functions that can be optimised with gradient ascent methods. This framework is versatile as it also  explains  standard well-know  approaches for inference such as the likelihood function and the Hough transform [2].   We will also point out the relation with L2 estimation [3].

References:
[1] Dahyot R. and Ruttle J. , Generalised relaxed radon transform (GR2T) for robust inference, Pattern recognition (In press 2012)
[2] Goldenshluger A. and Zeevi A., The Hough transform estimator, Annals of Statistics 2004
[3] Scott, D.W. (2001), ``Parametric Statistical Modeling by Minimum Integrated Square Error,'' Technometrics, 43, 274--285.


Robust L2-based model fitting for shape reconstruction. 

Claudio Arellano


The inference of parameters for shape reconstruction is commonly solve using a Bayesian framework. The likelihood is expressed as a function of vectors containing the  observation. Such formulation require a known correspondence between points on the vectors. In practice, this correspondence is not trivial to find and its accuracy determinate the final result of the reconstructed shape. In this work, we propose to model  the likelihood using the L2 distance between two density functions, one modelled using the observation an the other using the shape model. We show  the advantage of this modelling and the results obtained when reconstructing faces from data captured using noisy cameras.

References:
[1] C. Arellano amd R. Dahyot. 'Shape Model Fitting Algorithm without Point Correspondence'. 20th European Signal Processing Conference (Eusipco), Bucharest Romania, August 2012.
[2] C. Arellano amd R. Dahyot. 'Mean Shift Algorithm for Robust Rigid Registration Between Gaussian Mixtures Models'. 20th European Signal Processing Conference (Eusipco), Bucharest Romania, August 2012.
[3] C. Arellano amd R. Dahyot. 'Shape Model Fitting Using non-Isotropic GMM'. 23nd IET Irish Signals and Systems Conference, Maynooth, June 2012
[4] B. Jian and B. Vermuri. 'Robust point set registration using gaussian mixture models'. IEEE Transactions on Patterns Analysis and Machine
Intelligence,2001
[5] D. Scott, 'Parametric statistical Modelling by Minimun integrated square error'. Technometrics, 2001


Sequential Inference with Iterated Laplace Approximation. 

Tiep Mai


Nowadays, due to the need of real time update, sequential inference is becoming more and more important. However, Particle Filter, the main approach of sequential inference, suffers from some problems. Firstly, the whole sampling population can degenerate into one particle of significant weight, especially in the case of outlier. Secondly, the parameters are assumed to be known by most particle filters as these parameters cannot be regenerated by evolution equation. In this talk, we try another approach by applying a smooth approximation, Iterated Laplace Approximation (Bornkamp, 2011), sequentially for the filter distribution of both state vectors and parameters.  The method is tested on some dynamic model examples.


Exploring the effect of biodiversity on multiple ecosystem function in grassland systems. 

Aine Dooley


Biodiversity is the variety, relative abundance and composition of the species within an ecosystem. Developing a better understanding of how the biodiversity affects ecosystem functioning, such as the biomass produced by the system, is a focus for groups such as ecologists and agronomists. Most existing studies to date have focused on exploring the biodiversity-ecosystem functioning relationship by examining a single ecosystem function however recently the study of how biodiversity affects the ecosystem’s ability to maintain multiple functions simultaneously has begun to be explored.

Our current work in progress aims to develop methods for analysing the effect of biodiversity on multiple ecosystem functions to develop a better understanding of the biodiversity-ecosystem functioning relationship.


A Bayesian approach to the construction and inversion of the radiocarbon calibration curves. 

Thinh Doan


In this talk I provide an gentle introduction to Bayesian methodology and demonstrate how it is used for inference of radiocarbon calibration curves. In particular, I discuss a spline smoothing approach that is reformulated in a Bayesian framework. I then show the calibration process of new radiocarbon samples is carried out via Bayesian inverse prediction. Finally, motivated by the practical consideration to deal with the complex nature of the radiocarbon dataset, I discuss some potential improvements to the existing calibration curve (INTCAL09). Acknowledgement: Prof. John Haslett, Dr. Andrew Parnell and Dr. Michael Sater-Townsend


Functionality and accuracy of phosphorous load apportionment models. 

Lucy Crockford


Load apportionment models (LAMs) have been developed to apportion the contribution of riverine P load to either point or diffuse sources based on the relationship between the river discharge rate and the change in river P concentration. It is hypothesised that at extremely low river discharges the predominant contributors are point sources but as river discharge rate increases the contribution becomes more predominantly from diffuse sources.

Due to field sampling constraints, previously, there has been a lack of data available to determine the accuracy of these models. However, a large high frequency dataset (1 data-point per hour) for one year is now available to investigate the functionality and accuracy of these methods to apportion P load.

One hundred subsets of the large dataset, to reflect a frequency of 3 times per week sampling, have been manually constructed to obtain the range of proportions obtained by each model indicating the model precision. However, to thoroughly investigate the models’ performance in describing the river load apportionment, a Monte Carlo simulation has been suggested. The issue here is the time consuming manual construction of specific datasets with various variables and the labour intensive modelling procedure.

These models have been developed to provide an easy method for river management and the outcomes of the statistical analysis of the proportions calculated by the models will provide some boundaries to the models’ use in the field.


Incorporating data from various trial designs into a mixed treatment comparison model. 

Susanne Schmitz


Bayesian mixed treatment comparison models (MTCs) provide a powerful methodology to obtain estimates of relative efficacy between alternative treatments when head to head evidence is not available or not sufficient. Most evaluations only consider evidence from randomized controlled trials (RCTs), while information from other trial designs is ignored. In this work we propose 3 methods to extend MTC models to systematically include evidence from different trial designs using an application in Rheumatoid Arthritis  (RA).


A conjugate class of utility functions for sequential decision problems. 

Donnacha Bolger


The use of the conjugacy property for members of the exponential family of distributions is commonplace within Bayesian statistical analysis, allowing for tractable and simple solutions to problems of inference. However, despite a shared motivation, there has been little previous development of a similar property for using utility functions within a Bayesian decision analysis.  As such, this paper explores a class of utility functions that appear to be reasonable for modeling the preferences of a decision maker in many real-life situations, but which also permit a tractable and simple analysis within sequential decision problems.


Parametric and topological inference for masked system lifetime data. 

Louis Aslett


Commonly reliability data consists of lifetimes (or censoring information) on all components and systems under examination.  However, masked system lifetime data represents an important class of problems where the information available for statistical analysis is more limited: one only has failure times for the system as a whole, but no data on the component lifetimes directly, or even which components were failed.  For example, such data can arise when system autopsy is impractical or cost prohibitive.  A novel signature based data augmentation scheme is presented which enables inference for a wide class of component lifetime models for an exchangeable population of systems.  It is shown that the approach can be extended to enable topological inference of the underlying system design.  A number of illustrative examples are included and the work is linked to the presentation at last year's TCD Statistics Workshop on inference for repairable redundant systems via Phase-type models.


Model choice for social networks using the collapsed latent position cluster model. 

Triona Ryan


This talk presents the collapsed latent position cluster model - a principled, fast and scalable approach to model choice for social network analysis.  Social network data represent binary interactions between actors or nodes.  The latent position cluster model embeds actors in a latent `social space' whereby positive interaction is more likely for actors that are close in space.  The probability of a link between two actors is assumed to be independent of all other links conditional on their positions.  Drawing conclusions about clustering of actors is often a focus of interest in social network analysis. The latent position cluster model incorporates clustering using a fi nite multivariate normal mixture model. However the number of clusters is treated as fixed in the current literature. Information criteria such as BIC or AICM are then used to approximate the model evidence.  This is compuationally cumbersome and is not scalable to large networks due to the likelihood calculations of order n^2 where n is the number of actors in the network.

To collapse the latent position cluster model, the clustering parameters are integrated out of the posterior distribution. The Markov Chain iteratively updates the random positions in tandem with the allocation vector. It is a fi xed dimension algorithm. It should produce more accurate results due to having integrated out some parameter uncertainty. It is relatively scalable to larger networks. Computation time still involves the full likelihood but with less parameter updates and faster allocation updates.
Results (work in progress) are compared to the more compuationally expensive reversible jump Markov chain Monte Carlo, where the Markov chain `jumps' between models with parameter spaces of varying dimension.


Modelling the risk of spacecraft re-entry explosion. 

Cristina De Persis


Spacecraft and rocket bodies re-enter via targeted trajectories or naturally decaying orbits at the end of their missions. An object entering the Earth's atmosphere is subject to atmospheric drag forces. The friction caused during the entry by these forces heats up the object. The action of the aerodynamic forces, the heating of the structure, with resulting internal structural stresses and melting of some materials, usually cause the fragmentation of the object. In some cases it may occur, under certain physical conditions, that the structural integrity of the object can no longer be contained and the object explodes. The resulting fragments, after the explosion or the fragmentation, which impact the Earth's surface
could cause serious damages. While there are various tools able to detect the fragmentation of a spacecraft, the explosion process is a break-up mode not yet adequately modelled. First of all, I want to demonstrate how Fault tree and a Bayesian network theories could be applied to assess the probability to get an explosion, starting from the combination of the
elementary causes that can lead to its occurrence. Next I want to present a fi rst attempt to model the uncertainty of these elementary causes, i.e. conditions of temperature and pressure, using an autoregressive model, as well as I want to show how the Cox's proportional hazard model could be useful to solve this issue.