STIL 2009, 8-11 September 2009


	STIL 2009 September 8-11, 2009 São Carlos, Brazil

The 7th Brazilian Symposium in Information and Human Language Technology

THE EVENT

Information for authors

Pictures

COLLOCATED EVENTS

Workshop on Portuguese Description

II Web and Text Intelligence (WTI)

I Student Workshop on Information and Human Language Technology (TILic)

I Brazilian Olympiad on Computational Linguistics (OLinCom)

I Portuguese and Multi-lingual Ontologies Workshop

RST Brazilian Meeting

PROCEEDINGS

STIL 2009 proceedings

LOCAL INFORMATION

Conference venue

Accommodation

Travel Information

Touristic Information

CEPLN

Tutorials

We are please to announce the following tutorial for STIL 2009 and the collocated events:

Robust Wide-Coverage Parsing: Evaluations, Representations, Issues, and Applications (Sept 8, 16:30h)

Prof. Dr. Ted Briscoe (University of Cambridge, UK)

Fast and Practical Corpus Processing using Standard Linux Tools (Sept 9, 17:30h)

Dra. Caroline Gasperin (USP/ICMC, Brazil)

About the Tutorials

Robust Wide-Coverage Parsing:
Evaluations, Representations, Issues,
and Applications

Prof. Dr. Ted Briscoe (University of Cambridge, UK)

Description: In this tutorial, I'll firstly define the parsing task and discuss evaluation schemes. I'll then address some of the issues in parser design: optimal representation of syntactic information, statistical vs. heuristic parse ranking, efficiency vs. accuracy, degree of lexicalization, etc. I'll evaluate the strengths and weaknesses of different approaches, considering RASP, XLE, Enju, the C&C parser, PTB (reranking) parsers, and greedy/efficient shift-reduce dependency parsers. Finally, I'll describe some (rare) experiments we have recently undertaken to rigorously quantify the contribution of parsing to various text classification tasks, such as topic categorization, spam detection, information extraction, and language proficiency assessment.

Duration: The tutorial will last 2 hours with a 10min break in the middle.

Pre-requisites: I'll assume introductory course level understanding of computational linguistics and of probability theory.

Notes: I'll make my slides and bibliography available after the tutorial.

About the Speaker: Ted Briscoe has a Linguistics degree (1980) from the University of Lancaster, UK, MSc (1981) and PhD (1984) from the University of Cambridge, UK. He is a Professor at the Computer Laboratory, University of Cambridge, and his research interests include evolutionary linguistics and statistical language processing.

He has published over 70 research articles, edited three books, and been Principal/Co-Investigator or Coordinator of fourteen EU and UK funded projects since 1985. He is joint editor of Computer Speech and Language and on the editorial board of Natural Language Engineering.

Fast and Practical Corpus Processing using Standard Linux* Tools

Dra. Caroline Gasperin (USP/ICMC, Brazil)

Description: This two hour course will give an introduction to some Linux commands for processing both plain text and annotated files. The tools that will be presented include:

grep - for searching specific text passages or corpus annotations,
sed - for replacing strings or annotations,
awk - for filtering a corpus in different ways, and
uniq - for merging identical elements in the corpus.

The course will be divided into two parts: the commands will be presented in the first hour and the second hour will consist of a hands-on laboratory practice.

Duration: 2 hours.

Pre-requisites: none.

Notes:

There is a limited number of places in this course due to the size of the laboratories.
*These tools can also be used on Windows

About the Speaker: Caroline Gasperin has BSc and MSc degrees in Computer Science from the Pontifical Catholic University of Rio Grande do Sul, Brazil, and a PhD degree from the University of Cambridge, UK. She is currently a post-doctoral researcher at USP/ICMC, Brazil, working on the PorSimples project in text simplification for Brazilian Portuguese. Her main research interests include corpus-based techniques for NLP, anaphora resolution and information extraction.