InfraNodus has the following default text processing algorithm, which is also highly customizeable:
1) Scan the text uploaded by the user
2) Split it into statements (separated by carriage return - new line - with a maximum of 1000 characters per statement)
3) Remove stopwords from the statement (most frequent words, such as "are", "and", "is")
4) Check if the statement contains #hashtags or [[wiki links]]. If it does, only process those entities. If it doesn't, process words.
5) When processing words, convert them to lemmas (e.g. "networks" becomes "network" and "looked" becomes "look")
6) Build a graph where each lemma (#hashtag or [[wiki link]]) is a node, and their co-occurrences within a 4-lemma window is a connection (the closer the words, the stronger is the connection):
Changing the Default Text Processing Settings
You can change the default processing settings by clicking the "Globe" icon in the left Statements menu. It has multiple options that allow you to change the processing logic:
For example, you may choose in
1) Show on Graph: "process both hashtags" and in
2) [[Wiki Links]] as "process as hashtags".
In this case, if your text contains both [[wiki links]] and words, they will all be visualized on the graph:
You may also choose to only show #hashtags or [[wiki links]] omitting the words. In this case, you choose
1) Show on Graph: "hashtags only"
2) [[Wiki Links]]: "prioritize over words" or
3) [[Wiki Links]]: "as hashtags" (in this case, as we only show hashtags they will be the only thing shown):
You can also use the Text Graph Processing Settings to add
1) A custom list of stopwords (the words that should not be shown on the graph) — In Your Own Language: Lemmatization and Stopwords Removal
2) A custom list of synonyms (words that should be converted to their root synonym form that you choose) — How to Merge Nodes into Topics and Unlock Merged Nodes
Comments
0 comments
Please sign in to leave a comment.