SentiLex-PT01 in English
This resource is no longer supported by the XLDB Group. Please check the new location.
Esta página também está disponível em Português.
SentiLex-PT01 is a sentiment lexicon for Portuguese made up of 6,321 adjective lemmas (conventionally, the masculine singular form), and 25,406 inflected forms.
The sentiment entries correspond to human predicate adjectives (i.e. adjectives modifying human nouns), compiled from various publicly available resources. The attributes for each entry are:
- the predicate polarity,
- the target of sentiment and
- the polarity assignment.
Part of the entries have their attributes automatically assigned by software that we have developed for this purpose.
SentiLex-PT01 is available for download from this page (see below) under a Creative Commons license.
Frequently Asked Questions (FAQ)
What can I do with SentiLex-PT01?
SentiLex-PT01 is especially useful for opinion mining applications involving Portuguese, in particular for detecting and classifying sentiments and opinions targeting human entities.
What is the format of SentiLex-PT01?
SentiLex-PT01 is available in two separate .txt files:
- each line includes a lemma (conventionally, the masculine singular form) and its part-of-speech (Adj), followed by its sentiment attributes:
- polarity (POL), which can be positive (1), negative (-1) or neutral (0);
- target of polarity (TG), which corresponds to a human subject (HUM);
- polarity annotation (ANOT), which was performed manually (MAN) or automatically, by the Judgment Analysis Lexicon Classifier (JALC) tool, developed by the project team.
Below are two entries of SentiLex-lem-PT01.txt:
bonito. PoS = Adj; POL = 1; TG = HUM; ANOT = MAN desligado. PoS = Adj; POL = -1; TG = HUM; ANOT = JALC
- in each line, the adjectives are inflected in gender (G) and number (N), which are associated to their corresponding lemma. In addition to the linguistic information described in dictionary of lemmas, each adjective is classified as masculine (m) or feminine (f) and singular (s) or plural (p).
Below are four entries of SentiLex-flex-PT01.txt:
bonita,bonito. PoS = Adj; GN = fs; POL = 1; TG = HUM; ANOT = MAN bonitas,bonito. PoS = Adj; GN = fp; POL = 1; TG = HUM; ANOT = MAN bonito,bonito. PoS = Adj; GN = ms; POL = 1; TG = HUM; ANOT = MAN bonitos,bonito. PoS = Adj; GN = mp; POL = 1; TG = HUM; ANOT = MAN
Can an adjective be associated with different polar attributes?
Each adjective is associated with a unique polar attribute, targeting a specific syntactic-semantic category.
Homographs are treated as distinct entries. Homograph attributes present different values. For example, the adjective fresh can be used as modifier of a human noun, and replaced by the adjectives impertinent or impudent, which have a negative semantic orientation. But, fresh can also modify non-human nouns, exhibiting an opposite polarity. For example, when combined with an abstract noun such as portrayal, the adjective fresh is interpretable as new or novel, conveying a positive semantic orientation.
What is the distribution of adjectives in SentiLex-PT01?
In SentiLex-PT01, 3,494 adjectives have a negative polarity, 1.243 adjectives have a positive polarity, and 1.584 have a neutral polarity. 3.585 adjectives were manually labeled and 2.736 adjectives were automatically labeled.
Why are adjectives the only grammatical category included in SentiLex-PT01?
At present, only adjectives are included, but other categories will be added in the future. We tuned-up the software for automatic polarity classification with adjectives, but it works with other categories as well.
What is the accuracy of the automatic polarity classification?
The algorithm used has an overall accuracy of 67%. It is more precise in classifying negative adjectives (precision of 82%) than positive adjectives (67%). The most problematic cases involve neutral polarity, which is, in average, correctly assigned only in 45% of the cases.
Contingency Table for the Automatic Polarity Classification
|Positive (%)||Neutral (%)||Negative (%)||Recall (%)|
What are the licensing terms of SentiLex-PT01?Creative Commons Attribution 3.0 License (CC-BY).
How can I obtain SentiLex-PT01?
You can download the two files directly from here:
SentiLex-PT01 was developed by the following researchers:
With support in part by the following grants from FCT:
- UTA-Est/MAI/0006/2009 (project REACTION)
Mário J. Silva, Paula Carvalho, Carlos Costa, Luís Sarmento, Automatic Expansion of a Social Judgment Lexicon for Sentiment Analysis Technical Report. TR 10-08. University of Lisbon, Faculty of Sciences, LASIGE, December 2010. doi: 10455/6694