Spanish supported

March 08, 2023 20:34

Hi! I'm trying to work in spanish and seems that word tokenizer is not working well. Seems that the system is taking into account only one part of the words, is not working well.

Comments

1 comment

Official comment
Dmitry Paranyushkin

October 11, 2023 23:03
Hello Jorge,

In fact, it is working well, the endings of the words are removed to bring them to their lemmas. The results of your analysis, the main topics, the graph, do not suffer as a result.

This only happens in Spanish, Portugese and Italian because that's how the lemmatizers work for those languages. We are looking for better solutions, but so far didn't find anything reliable, so we will probably write our own modules to solve this aesthetic problem for our users.

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?