Dr Carl Vogel

Professor in computational linguistics

Room

Extension

ORI.LG16

1538

Projects that I supervise align with my research interests.

All projects include review of the relevant literature, and where appropriate, argumentation in support of analyses given.

Note that implementation is not an essential component of every project in computational linguistics -- there's definitely more to the field than computer applications -- however, formal rigor is quite essential.

Don't worry if you don't recognize the name of the systems/languages mentioned. If the theme itself interests you we can sort out the technical details in person. Of course, these are all just suggestions, we're assuming that the final project description will be individually tailored in most cases.

Students who do projects with me will agree to regular weekly meetings at which we discuss the preceding week's work, and plans for the following week's. The initial weeks typically involve a considerable amount of diverse readings. Students intending to work with me on their project are encouraged to contact students who have done projects with me in the past.

Projects listed here are suitable for final year students on the CSLL/CSL course; students from other undergraduate and postgraduate courses may also find suitable topics here.

  1. Develop an HPSG (Head-driven Phrase Structure Grammar) grammar for a fragment of Irish and Implement it in the LKB focusing on the syntax of one of the following construction types:
    Some examples of comparable projects are available for Irish, French, and German.
  2. Design and implement a chart parser for a CFG grammar with a dominance interpretation for phrase structure rules. This is essentially a framework for underspecified semantics. A disambiguation method must also be provided.
  3. Extend the semantic coverage in one of one of the frameworks included in the CLEARS (Computational Linguistics Education and Research Tool for Semantics) system.
    Particular areas of interest might be: negation, spatial modifiers, belief reports. An example of a project that did this in the past is available here.
  4. Extend the functionality of a generic interface for web-based experimentation in cognitive science (this will involve empirical research in an area of cognitive science to be agreed upon).
    This offers several possible topics and with varying degrees of implementational requirements. For all, some implementational extensions to the underlying system are necessary. Some will involve more or less actual experimentation using the system. Previous stages of the system are described, among other places, here, here, and here.
  5. Extend and experitment with a platform for experimenting with models of dynamic systems, with particular attention to modeling evolution of linguistic behaviors. A starting point is described here, subsequent work is described here.
  6. Extend work on utilities for statistical analysis of linguistic corpora and apply them to specific tasks such as detection of grammatical errors, and automated correction suggestion.
  7. Develop and validate lexical resources for sentiment analysis.
  8. Develop methods within computational stylistics for investigating text-internal linguistic variables with external variables using large online textual resources. A comparable project is described. here.
  9. Develop methods for tracking events under varying descriptions in journalistic prose.
  10. Develop a Prolog implementation simulating the operation of theories in dynamic semantics.
  11. Develop a Prolog implementation of real-time belief revision systems.
  12. Extend an automatic crossword generator implemented in Java and Prolog. Documentation of its state in 2003 state is available here A more recent version is documented here. One avenue in which to extend this is to establish it as a system fully anchored on the Suns, with application in language learning and other topical areas.
  13. Develop online tools for other forms of fun with words -- an innovative anagram server, a crossword clue generator, etc.
  14. Formal syntactic and semantic analysis of dialogue. Example past attempts at this are available here and here.
  15. Implement an efficient spelling checker for Irish in java, in the context of a webserver that collects words and their frequencies of use in checked documents, along with some other utilities for corpus linguistics.
  16. Projects in psycholinguistics. Past Examples appear here, here, here and here.
    Some specific topics I would like to explore further:
    1. Linguistic priming and unconscious coordination in written communication.
    2. Degrees of grammaticality and acceptability.
    3. Human reasoning with mildly inconsistent information.
    4. Computational stylistics (corpus driven syntactic and semantic analysis).
  17. Some general purpose utilities that can replicate standard offerings such as "DoodlePolls" and shared calendars, but with local data stores that accommodate varying levels of privacy and data protection.
  18. Develop tools to harvest from online sources a multi-lingual database of named entities.
  19. Build computational tools in support of structuralist analysis of myth and mythic-metaphorical representation (in the style of Levi Strauss).
  20. Test empirical dimensions of theories of holism in formulaic language associated with (im)politeness expressions.
  21. Test empirical predictions of recent theories of (im)politeness with respect to third-party and projected self-perception.
  22. Test empirical consequences of theories of gender differences in language use (for example, see here) and gender effects, more broadly (see here and here).
  23. Evaluate operationalizations of a quantitative method of bibliographic citation analysis that attends to depth of engagement of a published work with work that it cites in relation to citation-count methods (e.g. h-index, I-10, etc.) of measuring research impact.
  24. Analyze proxy measures of mutual understanding in dialogue (see here, or here, or here, etc.).
  25. Examine parameters that influence perception and choice in the ultimatum game (for example, see here or here).
  26. Topics in collaboration with Dr. Tim Fernando: Finite state temporality (FST, see for example here and here) is computational approach to the semantics of temporal expressions in natural language based on finite-state techniques. Of course, one could take up a project directly in this space (possibilities are listed here). One might also explore ramifications of such an approach within cognitive science. For example, given two events, what relation would one most likely assert holds between them? How is that likelihood changed if the two events are selected from the same narrative? Benchmarks for this discussion are provided here. In general, topics in this collaborative space attempt to exploit the representational affordances and computational properties of FST in characterizing cognitive behaviors, assessing the goodness of fit between the system and the behaviors.
  27. Topics in collaboration with Dr. Maria Koutsombogera: Analysis and modelling of multimodal and multiparty interactions. The projects will exploit a newly created corpus of multimodal interactions between three participants. The objective of the projects is to address some of the challenges in developing intelligent collaborative systems and agents that are able to hold a natural conversation with human users. A starting point in dealing with these challenges is the analysis and modelling of human-human interactions. The projects consist in the analysis of the low-level signals of speakers (e.g. gaze, head pose, gestures, speech), as well as the perception and inference of high-level features, such as the speakers' attention, the level of engagement in the discussion, and their conversational strategies. Some examples of similar work are documented here and here. Indicative literature is available here, here, here and here. Samples of other existing corpora will also be made available to interested parties.
    1. Prediction of the next speaker in multiparty interactions based on multimodal information provided by the participants' (a) gaze, (b) head turn/pose, (c) mouth opening and (d) verbal content.
    2. Measuring participants' conversational dominance in multiparty interactions by exploring (a) turn length, (b) speech duration, (c) interruptions (d) feedback responses and (d) non-verbal signals (mouth opening, gaze, etc.)
    3. Create a successful attentive listener: investigate and decide upon the features that constitute an active listener, based on the analysis of feedback responses, as well as their frequency, duration, and intensity.
    4. Prediction of success in collaborative task-based interactions: investigate the factors on which the perception of the success on a task depends. This will involve a series of perception tests examining the team role of the speakers and their conversational behavior.
  28. Many, if not most, instances of laughter in dialogue function more like discourse connectives (words and phrases like, "therefore", "because", "before", "I disagree", and so on) than involuntary releases of mirth. In another dimension, some instances of laughter are ratified by others, creating durations of shared laughter, and sometimes people laugh alone. This project seeks to determine what accoustic properties of the voice signal separate these and perhaps other cross-classifications of laughter. For reference, one might consider relevant works, here and here and here. Alternative projects linked to laughter will focus on the nearby linguistic content and conversational dynamics in related efforts to discern features of laughter categories.
  29. In interaction with Arun Thundyill Saseendran and Professor Khurshid Ahmad, it would be interesting to harvest UK Hansard data for the purpose of examining a variety of linguistic complexity metrics, longitudinally, among contributions to Parliament and the House of Lords. A starting point in this research may be replication of a study that uses a measure of lexical complexity to assess the nature of parliamentary speeches before and after an expansion of the electorate in the UK (1967) which took full effect in 1869 (Spirling, 2016). Naturally, there is more than one way to measure lexical complexity, and these are not all perfectly correlated with structural complexity. This project will thus include an exploration of a range of well-motivated linguistic complexity metrics.
  30. In interaction with Arun Thundyill Saseendran and Professor Khurshid Ahmad, it would be interesting to analyze extracts of interactive debate, even if the transcripts include editorial influence over the exact wording of what was uttered at the time. Independent sources may be used to identify issues that were particularly contentious in their time, where personalities clashed, where sympathies were noted. It would be useful to identify the manner in which the transcripts may be analyzed for features of interaction, collaboration and conflict and the extent to which those features interact with classifications that arise independently. Such analyses of other data sets have been conducted where speech signals are present and where they are not. Those approaches may be adapted to the nature of the Hansard records of parliamentary debate.
  31. Topics in collaboration with Dr. Erwan Moreau: supervised and unsupervised methods for author verification and related application.
    The author verification problem consists in identifying whether two texts A and B (or two groups of texts) have been written by the same person. This task is the keystone of authorship-related questions, and has a range of applications (e.g. forensics). This problem can be addressed in a number of different ways, in particular in a supervised or unsupervised setting: in the former case, an annotated set of cases is provided (each case is a pair of texts A and B, provided with "yes" or "no" depending on whether A=B); in the latter case, no answer is provided.
    Given the availability of several datasets as well as a state of the art authorship software system, the project consists in exploring a certain aspect or application of the topic, for example:
    1. What makes a case more difficult to answer than another? The task would be to study this question through experiments, and then implement a method to predict the level of difficulty of a given case.
    2. Design and implementation of a web interface around the authorship system, possibly presented as some kind of game with text.
    3. While ML systems can be good at giving the right answer, they are not always able to give a human-understandable explanation of the result. The task would consist in studying how to explain the results of some of the methods.
    4. It is harder to answer the question of authorship verification across genres (e.g. by comparing an email and a research paper). One way to improve the system in this case is to distinguish the features which are related to the author from those which are related to the genre.
  32. Social media analytics open up a range of possibilities. Some are in the development of systems that support analysts without a computing background in scraping data that is visible to the general public and record the (potentially multi-modal) data with indexing supported by appropriate meta-data (e.g. location, date, provenance, etc.). Other possibilities involve data analytics in relation to social media content (but presuppose that appropriate data sources are available).
  33. Machine learning applied to to gesture identification and classification.
  34. Explore computational models of dreaming.
  35. Other topics to appear.
  36. Still other topics to be agreed upon individually.

Last Modified: Fri Aug 30 06:22:13 2024 (vogel)