Next: About this document ...
Please see papers for links to papers
This is a summary of my career and research which I had to put
together for a 10 minute presentation to non specialists.
CAREER TO DATE
Edinburgh University
1987-1992 PhD in Cognitive Science
Munich University
1993-1995 Research Assistant on the DYANA II project
1995-1996 Research Assistant in project Semiunifikation
1996-1998 Research/Lecturing position
Trinity College, Dublin
1999-today Computational Linguistics lecturer
TEACHING
in Computer Science, Linguistics and a Language courses:
fundamentals of computing
logic and Prolog programming
programming in C
fundamentals of NLP
statistical methods in NLP
RESEARCH INTERESTS
- 1.
- Linear Logic and Polymorphic Categorial Grammar
- 2.
- Polymorphic Type Inference for Programming Languages
- verification as proof theory
- 3.
- Language Engineering
- large scale knowledges sources for parsing
- robust part of speech tagging/parsing
- disambiguation techniques
- corpus tools an ongoing web-based parsing and tagging environment
can be accessed here: frontend
- information extraction from biological texts for a
brief overview of this see bioextraction
CATEGORIAL GRAMMAR
LINEAR LOGIC AND CATEGORIAL GRAMMAR
Practical Issues
- LINGUISTIC APPLICATIONS (Emms 1993, DYANA R1.3A)
- IMPLEMENTATION (Emms 1993, EACL)
- decidability with particular quantifier patterns
LINEAR LOGIC AND CATEGORIAL GRAMMAR
Logical Issues
TYPE INFERENCE FOR PROGRAMMING LANGUAGES
- MONOMORPHIC RECURSION (SML)
- POLYMORPHIC RECURSION (SML+)
TYPE INFERENCE
- MONOMORPHISM RESTRICTS TYPING for example
datatype key =
Atom of int | Pair of key * key
datatype 'a trie = Empty
| Branch of ((int * 'a) list) * (('a trie) trie)
fun find (Branch(_,t), Pair(p,q)) = find (find (t,p), q)
:
:
- find has no SML type
- find has SML+ type :
TYPE INFERENCE FOR PROGRAMMING LANGUAGES
- principal types obtained in SML
- Milner's algorithm
- type assumptions refined by solutions to local EQUATIONS
- hence UNIFICATION
- principal types obtained in SML+
- Milner+ algorithm:
- type assumptions refined by solutions to accumulated INEQUATIONS
- hence SEMIUNIFICATION
- Non-toy implementation
- : modified the compiler for SML allowing mono/poly switch
- correctness proof
- (TCS 99) tricky because
- unlike unifiers, semiunifiers are not closed under arbitrary specialisation
- quantifier introduction must respect polymorphic unknowns
LANGUAGE ENGINEERING
- LARGE KNOWEDGE SOURCES FOR PARSING
- ROBUST PART OF SPEECH TAGGING
- ROBUST PARSING
- DISAMBIGUATION TECHNIQUES
- CORPUS TOOLS
LEXICON IMPLEMENTATION
- FULL-FORM LEXICON is
walked
walk + V Past
- BASE-FORM LEXICON
generates
mow V57 |
 |
prs 3rd sing |
mows |
|
|
prs other |
mow |
|
|
past part |
mowed or mown |
|
|
|
|
- RAPID RETRIEVAL and COMPACT STORAGE
- Prolog Database
- Hashing
- Minimised Trie (50,000 w/s, 500k)
LEXICON MAINTENANCE
- Algorithms for automatic acquisition of new
entries eg
- Also used in tool for machine assisted entry creation
PARSER IMPLEMENTATION
- COMBINING KNOWLEDGE SOURCES
- standard grammar eg.
- categorial grammar eg. cut up vp/np into np
- MULTISTRATEGY
- TOP DOWN
- elimination of left-recursion (
)
- but preservation of syntactic structure
- BOTTOM UP
- chart managed as trie for rapid access/update
- COVERAGE 85 %
TAGGING: DISAMBIGUATING WORDS
PARSING: DISAMBIGUATING SENTENCES
LEXICON
cut up |
vp/np into np |
cut up |
vp/np |
sweep up |
vp/np |
|
RULES
 |
Using:
- lexical specificity
- word-tag probabilities
can achieve >90 % hitrates.
distinctive aspects of the strategy are:
- use of realistically detailed grammar
- on simple texts (no relative clauses, no passive etc).
- do disambiguation which is crucial for Machine Translation
by contrast general trend is:
- to use inexact (treebank) grammars
- on unrestricted texts
- deal with explosive numbers of analyses: > 1000
A SILVER-STANDARD CORPUS OF SIMPLE CLAUSES
- automatically extract parsed simple clauses from
'gold-standard' corpus of parsed complex clauses.
- use extracted 'silver-standard' corpus for testing
parsing/disambiguation strategies
CROSS-CHECKING SILVER STANDARD CORPUS
can compare
- extracted 'silver standard' corpus of 5000 sentences
- handmade 'gold standard' corpus of 250 sentences
- show same vital statistics, extent of ambiguity, etc
- ranking of disambiguation techniques same relative to both
CORPUS RELATED
- MORPHOLOGY IN RETRIEVAL
- automatically expand query item
ride V
rode, ridden, riding, ride
- TAGGER IN INDEXING
- intelligent automatic stemming
- COLLOCATIONS/SELECTIONAL RESTRICTIONS
- all verbal forms ride rapidly retrieved
- FSA definition of following head-noun.
%ride % their %bikes on the road;
%riding % round and round on %bikes
%ride % their %bikes, push prams,
%ride % off on %bikes with their s
:
:
%ride % around your proposed %route
%ride % along the bike %routes
- noun counts for forms of ride
145 horse
77 bike
59 horses
53 bicycle
37 winners
Next: About this document ...
Martin Thomas Emms
2001-01-18