Utility Functions

Module nlpnet.utils

This module includes some general utility functions. Most of the functions are specific for the internals of nlpnet, but the following ones can be interesting for other purposes.

nlpnet.utils.clean_text(text, correct=True)

Apply some transformations to the text, such as replacing digits for 9 and simplifying quotation marks.

Parameters:correct – If True, tries to correct punctuation misspellings.
nlpnet.utils.tokenize(text, language)

Call the tokenizer function for the given language. The returned tokens are in a list of lists, one for each sentence.

Parameters:language – two letter code (en, pt)
nlpnet.utils.contract(w1, w2)

Makes a contraction of two words (in Portuguese).

For example: contract(‘de’, ‘os’) returns ‘dos’ If a contraction between the given words doesn’t exist in Portuguese, a ValueError exception is thrown.

Table Of Contents

Previous topic

Standalone Scripts

Next topic


This Page