Statistical Methods for ICT Applications

Latest News

1st Dec 2009: Hal Varian, chief economist at Google, on why statistics is the career of the future. See here.

News Item

News Item

Who are we?

We are a group of statistics researchers in Trinity College Dublin who are interested in statistical problems that arise in computer science, information systems and telecommunications. STATICA is led by Simon Wilson, and currently consists of 2 post-doctoral researchers, 4 postgraduate students and several other associated researchers within the School of Computer Science and Statistics. STATICA is funded by Science Foundation Ireland.

What is our research about?

In STATICA, we are first and foremost statisticians, and so our research interest is in developing new methods of statistical analysis.  This means asking the following questions:

How can we make better use of data to learn about the world around us, and use them to predict how the world will behave in the future?

How certain should we be about what the data tell us and what we predict?

How do we use this knowledge to make decisions and how does uncertainty about the true state of the world affect our decision making?

At STATICA we believe that statistical methods offer a promising way of answering these questions.  We work closely with collaborators in science and engineering who are interested in specific problems and have complicated data where these issues arise.  Our goal in proposing solutions to these problems is always to advance the state of the art in statistical methods for complex data.

For details on specific projects that we are currently working on, click on the Research link.

Why are statistical methods important in computer science and information technology?

There are many new statistical problem that are arising in computer science and information technology.  These problems are caused by the huge amounts of data that are being collected and that advances in technology are making possible.  Technology firm IDC has estimated that in 2007 alone we added 161 million terabytes (or 161 exabytes) of new storage.  The total amount today is now in the order of 1,000 million terabytes (or 1 zetabyte).  To put this in persepctive, that's 250 thousand million DVDs full of data, and even though a lot of this stored data is replicated, it's clear that there are huge amounts of it around.

Added to this, it is becoming cheaper and cheaper to generate data, fed most recently by the availability of new experimental methods, cheaper sensors, the development of "pervasive" computing and the continued exponential growth in data transmission rates and storage capacity.  Further, these data sets are very complicated.  Examples of the large and complicated data that might interest us are: multimedia data (e.g., images, video and sound), biological data (e.g., genome and proteomics), sensor network data (e.g., road traffic, communications networks) and environmental data (e.g., weather).  As an excellent example of the sorts of sizes of data that we can expect in the future, the Australian national research organisation CSIRO has stated that, in the next decade, astronomers expect to be processing 10 thousand terabytes of data every hour from the Square Kilometre Array telescope.

While the quantity and complexity of data is increaing quickly, the corresponding methods to make full use of the data have not moved nearly so fast.  That fact is at the heart of STATICA research.  We develop new statistical methods that can handle the size and complexity of large data sets, thus allowing us to answer more complicated questions about the data and extract as much information as possible from them.  And, as importantly, we must propose methods that can be computed in a reasonable time.  In fact, developing computational methods that allow us to implement in practice our new statistical analyses is one of the most challenging aspects of our research.