By default, InfraNodus creates knowledge graphs based on the words used in your text that are converted into lemmas. These words become the nodes in the graph and their co-occurrences form the connections:
Based on our research and experiments, this approach generates the highest resolution graph possible and is often the best for further interpretation using AI models as you're not influencing them with an additional layer of semantic interpretation on top (instead, we use the knowledge graph structure to provide additional context to AI models. For instance, you can see that we have two distinct clusters of topics (Space X + Tesla) identified with different colors and "elon" "musk" are the two most influential nodes because they link those two groups of concepts together.
When you click on the nodes, you'll see all the statements that describe their relation.
You can also choose to have InfraNodus extract entities from your text and build several types of knowledge graphs:
1) using both words and entities extracted (recommended setting) —
this will visualize both words and entities identified in your text — an optimal mix between the graph's granularity and semantic interpretation of content. This also helps retain a distinct topical structure.
2) using only the entities extracted (for sparse graphs) —
this will visualize only the entities extracted from your text — this builds sparse graphs that have a less pronounced topical structure, however, the graph may be more readable, especially for bigger texts.
3) using only the entities extracted that are also replaced with their root form
In this scenario, we don't only identify and tag the statements in your original text, but also extract the root form for it. So, for instance, "Tesla" becomes "Tesla, Inc", while "the USA" becomes the "United States". This ensures consistency across different texts but also generates sparser graphs that may be easier to interpret visually but contain less structural information for AI models:
While we build graphs using the lemmas of the words you provide, there are some instances where adding entity extraction insights can be useful.
For instance, when you mix "words and entities", you get the best of both worlds as the concepts that are comprised of multiple words are not broken apart.
Another advantage is when you want to perform a visual analysis of the content or when you are actually interested in feeding this additional level of interpretation (extracted entities) to an AI model.
It can also be useful for comparing texts that may use different terms that are synonyms or that relate to the same root concepts.
Finally, it is also a great way to tag your text with the most relevant concepts automatically. When an entity is detected, it will be tagged using a [[wiki link]] — the industry standard for tagging entities in markdown text format. You can then export your tagged texts from InfraNodus into PKM systems such as Obsidian or Roam Research and have an interconnected database of concepts as a result.
How does Entity Extraction Work?
We use multiple algorithms to extract entities from a text. Our system will scan your text for any named entities and concepts that are contained in either Wikipedia or Freebase, as well as a custom dictionaries of known entities. Once an entity is detected, it will be tagged using [[wiki links]].
So, for instance, if your original statement is:
Tesla is a company ran by Elon Musk who also started SpaceX -
both are based in the USA
it will be changed to the following format if you turn entity detection on:
[[Tesla]] is a company ran by [[Elon Musk]], who also started [[SpaceX]] -
both are based in [[the USA]]
Then the standard approach to processing wiki-links will apply. By default, they are the only entities that are processed, however, you can also have them treated as hashtags and then activate "both words and hashtags" processing, so you would build a graph both from the named entities encapsulated in [[wiki links]] as well as the words used in your text (e.g. "company", "start" — which were not detected as named entities).
When you choose to analyze both words and entities as you import a text, we will set this particular graph to process both "hashtags and words" while setting [[wiki links]] to be treated as standard hashtags. This is why when you visualize this graph, you'll see both the individual concepts and the entities extracted at the same time on the graph. You can change this in the Text Processing Settings.
How to Activate Entity Detection?
In order to extract entities from an analyzed text, just choose "the words and the entities extracted" or "only the detected entities" when you're adding your data:
Note, that you can also choose to not only detect entities in your text, but also replace those entities with their original root form. For instance, "ai" would be replaced with "artificial intelligence" directly in your text. To activate this feature, turn on the "Replace entities with root concepts" switch.
Use this feature with care, as it completely rewrites your original text. It may be useful, however, if you want to create a sparse knowledge graph for your original text and also make it compatible with other texts, which may refer to similar concepts but use a different language.
For instance, in the example above,
Tesla is a company ran by Elon Musk who also started SpaceX -
both are based in the USA
if we turn these both options on, we'll get:
[[Tesla, Inc]] is a company ran by [[Elon Musk]] who also started [[SpaceX]] -
both are based in the [[United States]]
as you can see, some entities are replaced with their root form, which transforms your original text, but also ensures that if any other text uses "United States" instead of the "USA", they will both become "United States of America" and thus be detected as the same entity.
Lemmatization of Entities
Lemmatization is a process of transforming words to their root lemmas. It helps reduce dimensionality and makes graphs more scarce. For instance, "words" becomes "word", while "taken" becomes "take".
By default, lemmatization is turned on for standard words in InfraNodus, which means that we convert the words to their lemmas and also remove some stopwords from the graph (like "is", "the", etc)
By default, we do not apply lemmatization to entities tagged with #hashtags or [[wiki links]], but you can turn on this feature in User Settings or in the context settings (by clicking the Globe icon). In this case, we will convert the words and phrases inside [[wiki links]] to their lemmas and also apply stopwords dictionary and remove some of the lemmas.
Exporting Converted Text / Knowledge Graph with Tagged Entities
You can export the converted text with tagged entities or your resulting knowledge graph as is shown here:
Export your Graphs, Text Data and Analytics Results
Comments
0 comments
Please sign in to leave a comment.