1st Dec 2009: Hal Varian, chief economist at Google, on why
statistics is the career of the future. See here.
News Item
News Item
Who
are
we?
We
are
a
group
of statistics researchers in Trinity
College
Dublin
who are interested in statistical problems that arise in computer
science, information systems and telecommunications. STATICA is led by Simon Wilson,
and currently consists of 2 post-doctoral researchers, 4 postgraduate
students and several other associated researchers within the School of
Computer Science and Statistics. STATICA is funded by Science Foundation Ireland.
What is our research about?
In
STATICA,
we
are
first and foremost statisticians, and so our research
interest is in developing new methods of statistical analysis.
This means asking the following questions:
How can we make better use of data to learn about the world around us,
and use them to predict how the world will behave in the future?
How certain should we be about what the data tell us and what we
predict?
How do we use this knowledge to make decisions and how does uncertainty
about the true state of the world affect our decision making?
At STATICA we believe that statistical methods offer a promising way of
answering these questions. We work closely with collaborators in
science and engineering who are interested in specific problems and
have complicated data where these issues arise. Our goal in
proposing solutions to these problems is always to advance the state of
the art in statistical methods for complex data.
For details on specific projects that we are currently working on,
click on the Research link.
Why are statistical methods important in computer science and
information technology?
There
are
many
new
statistical problem that are arising in computer science
and information technology. These problems are caused by the huge
amounts of data that are being collected and that advances in
technology are making possible. Technology firm IDC has estimated that in 2007 alone we
added 161 million terabytes (or 161 exabytes) of new storage. The
total amount today is now in the order of 1,000 million terabytes (or 1
zetabyte). To put this in persepctive, that's 250 thousand
million DVDs full of data, and even though a lot of this stored data
is replicated, it's clear that there are huge amounts of it around.
Added to this, it is becoming cheaper and cheaper to generate data, fed
most recently by the availability of new experimental methods, cheaper
sensors, the development of "pervasive" computing and the continued
exponential growth in data transmission rates and storage
capacity. Further, these data sets are very complicated.
Examples of the large and complicated data that might interest us are:
multimedia data (e.g., images,
video and sound), biological data (e.g.,
genome
and
proteomics),
sensor network data (e.g.,
road
traffic, communications
networks) and environmental data (e.g.,
weather).
As
an
excellent example of the sorts of sizes of data
that we can expect in the future, the Australian national research
organisation CSIRO
has stated that, in the next decade, astronomers expect to be
processing 10 thousand terabytes of data every hour from the Square Kilometre Array
telescope.
While the quantity and complexity of data is increaing quickly, the
corresponding methods to make full use of the data have not moved
nearly so fast. That fact is at the heart of STATICA
research. We develop new statistical methods that can handle the
size and complexity of large data sets, thus allowing us to answer more
complicated questions about the data and extract as much information as
possible from them. And, as importantly, we must propose methods
that can be computed in a reasonable time. In fact, developing
computational methods that allow us to implement in practice our new
statistical analyses is one of the most challenging aspects of our
research.