Professor in computational linguistics
|
Room
|
Extension
|
|
ORI.LG16
|
1538
|
Projects that I supervise align with my research interests.
All projects include review of the relevant literature,
and where appropriate, argumentation in support of analyses
given.
Note that implementation is not an essential component of
every project in computational linguistics -- there's
definitely more to the field than computer applications --
however, formal rigor is quite essential.
Don't worry if you don't recognize the name of the
systems/languages mentioned. If the theme itself interests
you we can sort out the technical details in person. Of
course, these are all just suggestions, we're assuming that
the final project description will be individually tailored
in most cases.
Students who do projects with me will agree to regular
weekly meetings at which we discuss the preceding week's
work, and plans for the following week's. The initial weeks
typically involve a considerable amount of diverse readings.
Students intending to work with me on their project are
encouraged to contact students who have done projects with
me in the past.
Projects listed here are suitable for final year students
on the CSLL/CSL course; students from other
undergraduate and postgraduate courses may also find
suitable topics here.
Develop an HPSG (Head-driven Phrase Structure Grammar)
grammar for a fragment of Irish and Implement it in the LKB
focusing on the syntax of one of the following
construction types:
- Noun Phrases
- Embedding Verbs
- proposition embedding verbs
- question embedding verbs
Some examples of comparable projects are available for
Irish,
French,
and
German.
- Design and implement a chart parser for a CFG grammar
with a dominance interpretation for phrase structure
rules. This is essentially a framework for underspecified
semantics. A disambiguation method must also be provided.
Extend the semantic coverage in one of one of the
frameworks included in the CLEARS (Computational
Linguistics Education and Research Tool for Semantics)
system.
Particular areas of interest might be: negation, spatial
modifiers, belief reports.
An example of a project that did this in the past is
available
here.
Extend the functionality of a generic interface for
web-based experimentation in cognitive science (this will
involve empirical research in an area of cognitive
science to be agreed upon).
This offers several possible topics and with varying
degrees of implementational requirements. For all, some
implementational extensions to the underlying system are
necessary. Some will involve more or less actual
experimentation using the system.
Previous stages of the system are described, among other places,
here,
here, and
here.
- Extend and experitment with a platform for
experimenting with models of dynamic systems, with
particular attention to modeling evolution of linguistic
behaviors. A starting point is described
here, subsequent work is described
here.
- Extend work on utilities for statistical analysis of
linguistic corpora and apply them to specific tasks such
as detection of grammatical errors, and automated
correction suggestion.
- Develop and validate lexical resources for sentiment
analysis.
- Develop methods within computational stylistics for
investigating text-internal linguistic variables with
external variables using large online textual resources.
A comparable project is described.
here.
- Develop methods for tracking events under varying
descriptions in journalistic prose.
- Develop a Prolog implementation simulating the
operation of theories in dynamic semantics.
- Develop a Prolog implementation of real-time belief
revision systems.
- Extend an automatic crossword generator implemented in
Java and Prolog. Documentation of its state in 2003 state
is available here
A more recent version is
documented here.
One avenue in which to extend this is to establish it as a
system fully anchored on the Suns, with application in language
learning and other topical areas.
- Develop online tools for other forms of fun with words --
an innovative anagram server, a crossword clue generator,
etc.
- Formal syntactic and semantic analysis of dialogue.
Example past attempts at this are available
here and
here.
Implement an efficient spelling checker for Irish in
java, in the context of a webserver that collects words
and their frequencies of use in checked documents, along
with some other utilities for corpus linguistics.
- Projects in psycholinguistics. Past Examples appear
here,
here,
here and
here.
Some specific topics I would like to explore further:
- Linguistic priming and unconscious coordination in written
communication.
- Degrees of grammaticality and acceptability.
- Human reasoning with mildly inconsistent information.
- Computational stylistics (corpus driven syntactic and semantic analysis).
- Some general purpose utilities that can replicate standard
offerings such as "DoodlePolls" and shared calendars, but with local
data stores that accommodate varying levels of privacy and data
protection.
- Develop tools to harvest from online sources a multi-lingual
database of named entities.
- Build computational tools in support of structuralist
analysis of myth and mythic-metaphorical representation
(in the style of Levi Strauss).
- Test empirical dimensions of theories of holism in formulaic
language associated with (im)politeness expressions.
- Test empirical predictions of recent theories of (im)politeness
with respect to third-party and projected self-perception.
- Test empirical consequences of theories of gender differences
in language use (for example, see here) and gender effects, more broadly
(see here and here).
- Evaluate operationalizations of a quantitative method of
bibliographic citation analysis that attends to depth of engagement
of a published work with work that it cites in relation to
citation-count methods (e.g. h-index, I-10, etc.) of measuring
research impact.
- Analyze proxy measures of mutual understanding in dialogue
(see here, or here, or here, etc.).
- Examine parameters that influence perception and choice in
the ultimatum game (for example, see here or here).
- Topics in collaboration with Dr. Tim Fernando:
Finite state temporality (FST, see for example
here
and
here)
is computational approach to the semantics of temporal expressions in
natural language based on finite-state techniques. Of course, one
could take up a project directly in this space (possibilities are
listed here).
One might also explore ramifications of such an approach within
cognitive science. For example, given two events, what relation would
one most likely assert holds between them? How is that likelihood
changed if the two events are selected from the same narrative?
Benchmarks for this discussion are provided here.
In general, topics in this collaborative space attempt to exploit the
representational affordances and computational properties of FST
in characterizing cognitive behaviors, assessing the goodness of
fit between the system and the behaviors.
- Topics in collaboration with Dr. Maria Koutsombogera:
Analysis and modelling of multimodal and multiparty interactions. The projects will exploit a newly created corpus of multimodal interactions between three participants. The objective of the projects is to address some of the challenges in developing intelligent collaborative systems and agents that are able to hold a natural conversation with human users. A starting point in dealing with these challenges is the analysis and modelling of human-human interactions. The projects consist in the analysis of the low-level signals of speakers (e.g. gaze, head pose, gestures, speech), as well as the perception and inference of high-level features, such as the speakers' attention, the level of engagement in the discussion, and their conversational strategies.
Some examples of similar work are documented here and here. Indicative literature is available here, here, here and here. Samples of other existing corpora will also be made available to interested parties.
- Prediction of the next speaker in multiparty interactions based on multimodal information provided by the participants' (a) gaze, (b) head turn/pose, (c) mouth opening and (d) verbal content.
- Measuring participants' conversational dominance in multiparty interactions by exploring (a) turn length, (b) speech duration, (c) interruptions (d) feedback responses and (d) non-verbal signals (mouth opening, gaze, etc.)
- Create a successful attentive listener: investigate and decide upon the features that constitute an active listener, based on the analysis of feedback responses, as well as their frequency, duration, and intensity.
- Prediction of success in collaborative task-based interactions: investigate the factors on which the perception of the success on a task depends. This will involve a series of perception tests examining the team role of the speakers and their conversational behavior.
- Many, if not most, instances of laughter in dialogue function
more like discourse connectives (words and phrases like, "therefore",
"because", "before", "I disagree", and so on) than involuntary releases of
mirth. In another dimension, some instances of laughter are
ratified by others, creating durations of shared laughter, and
sometimes people laugh alone. This project seeks to determine
what accoustic properties of the voice signal separate these
and perhaps other cross-classifications of laughter.
For reference, one might consider relevant works,
here
and
here and
here.
Alternative projects linked to laughter will focus on the nearby
linguistic content and conversational dynamics in related efforts
to discern features of laughter categories.
- In interaction with Arun Thundyill
Saseendran and Professor Khurshid
Ahmad, it would be interesting to harvest UK Hansard data for the
purpose of examining a variety of linguistic complexity metrics,
longitudinally, among contributions to Parliament and the House of
Lords. A starting point in this research may be replication of a
study that uses a measure of lexical complexity to assess
the nature of parliamentary speeches before and after an expansion of
the electorate in the UK (1967) which took full effect in 1869 (Spirling,
2016). Naturally, there is more than one way to measure lexical
complexity, and these are not all perfectly correlated with structural
complexity. This project will thus include an exploration of a range
of well-motivated linguistic complexity metrics.
- In interaction with Arun Thundyill
Saseendran and Professor Khurshid
Ahmad, it would be interesting to analyze extracts of interactive
debate, even if the transcripts include editorial influence over the
exact wording of what was uttered at the time. Independent sources
may be used to identify issues that were particularly contentious in
their time, where personalities clashed, where sympathies were noted.
It would be useful to identify the manner in which the transcripts may
be analyzed for features of interaction, collaboration and conflict
and the extent to which those features interact with classifications
that arise independently. Such analyses of other data sets have been
conducted
where speech signals
are present
and
where they are not. Those approaches may be adapted to the nature of the
Hansard records of parliamentary debate.
- Topics in collaboration with Dr. Erwan Moreau:
supervised and unsupervised
methods for author verification and related application.
The author verification problem consists in identifying whether two texts A
and B (or two groups of texts) have been written by the same person. This
task is the keystone of authorship-related questions, and has a range of
applications (e.g. forensics). This problem can be addressed in a number of
different ways, in particular in a supervised or unsupervised setting: in
the former case, an annotated set of cases is provided (each case is a pair
of texts A and B, provided with "yes" or "no" depending on whether A=B); in
the latter case, no answer is provided.
Given the availability of several datasets as well as
a state of the art
authorship software system, the project consists in exploring a certain
aspect or application of the topic, for example:
- What makes a case more difficult to answer than another? The task would be
to study this question through experiments, and then implement a method to
predict the level of difficulty of a given case.
- Design and implementation of a web interface around the authorship system,
possibly presented as some kind of game with text.
- While ML systems can be good at giving the right answer, they are not
always able to give a human-understandable explanation of the result. The
task would consist in studying how to explain the results of some of the
methods.
- It is harder to answer the question of authorship verification across
genres (e.g. by comparing an email and a research paper). One way to improve
the system in this case is to distinguish the features which are related to
the author from those which are related to the genre.
-
Social media analytics open up a range of possibilities. Some are
in the development of systems that support analysts without a
computing background in scraping data that is visible to the
general public and record the (potentially multi-modal) data with
indexing supported by appropriate meta-data (e.g. location, date,
provenance, etc.). Other possibilities involve data analytics in
relation to social media content (but presuppose that appropriate
data sources are available).
- Machine learning applied to to gesture identification and
classification.
- Explore computational models of dreaming.
- Other topics to appear.
- Still other topics to be agreed upon individually.
Last Modified: Fri Aug 30 06:22:13 2024 (vogel)