A Python library for analogy purpose.
- Vector representation of words
- Extraction of analogical clusters from a set of words
- Construction of analogical grids
- Analogical equation solver
The main purpose of this library is to construct analogical grids from a given set of strings.
Word forms → analogical grids
Each word form is represented as a vector with the number of occurrences of all the characters in the alphabet.
Using the number of occurrences of charaters as the formal level of vector representation
Morphological features → paradigm tables
By using morphological features as the vector representation, the output is paradigm tables.
Using the morphological features, e. g. POS tag, of the word form as vector representation
User-defined features → new type of grids
It is possible to use your own features. For example, we can combine:
- the number of occurrences of all the characters
- morphological features
Using both the formal and morphological levels as vector representation to organise regular conjugations
Future work: word embeddings → semantical analogical grids
The limitation of this library is that the value of the vector representation is integer. The future work of this library is to be able to accept floating numbers. Thus, we can use word embeddings to construct semantical analogical grids.
Using word embeddings as input to construct semantical analogical grids
Code
GitHub repository: https://github.com/famrashel/nlg.git
There is a Jupyter notebook which illustrates some basic usages of the package mentioned above.
Publication
Rashel Fam and Yves Lepage. Tools for the production of analogical grids and a resource of n-gram analogical grids in 11 languages. In Proceedings of the 11th edition of the Language Resources and Evaluation Conference (LREC 2018), pages 1060–1066, Miyazaki, Japan, May 2018. Download