We are please to announce the following tutorial for STIL 2009
and the collocated events:
- Robust
Wide-Coverage Parsing: Evaluations, Representations, Issues, and
Applications (Sept 8, 16:30h)
Dr. Ted
(University of Cambridge, UK)
- Fast and Practical Corpus
Processing using Standard Linux Tools (Sept 9, 17:30h)
Caroline Gasperin (USP/ICMC, Brazil)
About the Tutorials

Robust Wide-Coverage
Representations, Issues,
and Applications
Prof. Dr. Ted Briscoe
(University of Cambridge, UK) |
Description: In this tutorial, I'll
firstly define the parsing task and discuss evaluation schemes. I'll
then address some of the issues in parser design: optimal
representation of syntactic information, statistical vs. heuristic
parse ranking, efficiency vs. accuracy, degree of lexicalization, etc.
I'll evaluate the strengths and weaknesses of different approaches,
considering RASP, XLE, Enju, the C&C parser, PTB (reranking)
parsers, and greedy/efficient shift-reduce dependency parsers. Finally,
I'll describe some (rare) experiments we have recently undertaken to
rigorously quantify the contribution of parsing to various text
classification tasks, such as topic categorization, spam detection,
information extraction, and language proficiency assessment.
The tutorial will last 2
hours with a 10min break in the middle.
I'll assume introductory course level understanding of computational
linguistics and of probability theory.
I'll make my slides and bibliography available after the tutorial.
About the Speaker: Ted Briscoe has a
Linguistics degree (1980) from the University of Lancaster, UK, MSc
(1981) and PhD (1984) from the University of Cambridge, UK. He is a
Professor at the Computer Laboratory, University of Cambridge, and his
research interests include evolutionary linguistics and statistical
language processing.
He has published over 70 research articles, edited three books, and
been Principal/Co-Investigator or Coordinator of fourteen EU and UK
funded projects since 1985. He is joint editor of Computer Speech and
Language and on the editorial board of Natural Language Engineering.
Description: This two hour course
will give an introduction to some Linux commands for processing
both plain text and annotated files. The tools that will be presented
- grep - for searching specific
text passages or corpus annotations,
- sed - for replacing strings
or annotations,
- awk - for filtering a corpus
in different ways, and
- uniq - for merging identical
elements in the corpus.
course will be divided into two parts: the commands will be
presented in the first hour and the second hour will consist of a
hands-on laboratory practice.
2 hours.
- There is a limited number of
places in this course due to the size of
the laboratories.
- *These tools can also be used
on Windows
the Speaker: Caroline Gasperin has BSc and MSc degrees in
Computer Science from
the Pontifical Catholic University of Rio Grande do Sul, Brazil, and a
PhD degree from the University of Cambridge, UK. She is currently a
post-doctoral researcher at USP/ICMC, Brazil, working on the PorSimples
project in text simplification for Brazilian Portuguese. Her main
research interests include corpus-based techniques for NLP, anaphora
resolution and information extraction.