One of the most powerful features of InfraNodus is its ability to generate a list of topics from any text, which can be used for classification or to gain a deeper understanding of the content. This feature is available in the web interface, the browser extension, the Obsidian plugin and via our API.
For example, here are the main topics retrieved for this support portal:
These are:
- AI Networks
- Topic Selection
- Idea Generation
- Search Queries
The above table is a pretty good representation of the top four topics for this portal. But how are those topics generated? In order to do that, we use a combination of our peer-reviewed network analysis algorithm and some LLM prompting. As a result, you can generate those topics for large texts without spending a fortune on text processing. We condense your original text, extract only the most important relations and snippets of content, and generate the names for the topics as well as short summaries for each. Moreover, you can use the graph to focus on a specific topic and generate a summary and description just for that topical cluster.
Here is a description of our topic modeling algorithm, step by step.
Topical Modeling in InfraNodus
1. First, the text is represented as a network where the words (or entities, depending on your knowledge graph setting) are the nodes and their co-occurrences are the connections between them.
2. Once the text is represented this way, we identify the topical clusters using the community detection algorithm described in our peer-reviewed InfraNodus whitepaper.
3. The nodes that belong to each cluster are converted into a condensed subgraph (in the dotGraph format), so instead of using a list of words for each topic, we extract extra information about how those nodes are actually connected to each other in every cluster.
4. We also extract top statements for every topical cluster along with the statements that have the highest concentration of terms from each particular cluster.
5. Both the nodes contained in each cluster, the underlying relations, and the top statements are then used to generate an AI prompt for an LLM that will come up with the best possible name for this cluster.
6. These names are then shown next to the topics. As we use the underlying graph structure and some context (but not all of it), our system is able to generate highly precise results even for large contexts, without having to process all the topics at once.
You can think of it in the following way: instead of feeding the whole text into the LLM (which would be impossible and expensive for large texts), we traverse the graph, extracting the most important parts based on the clustering, and only feed that data to the model to produce better results.
You might find it interesting that a part of this algorithm is inspired by the way divinational narratives such as Tarot work. In Tarot specifically, each card belongs to a specific topical cluster and will be linked to other cards that represent all other topical clusters in the original narrative. So picking a random card will ensure that you also touch upon all other clusters, which represents a highly efficient way of traversing the knowledge graph of the narrative:
Generating Summaries for Each Topical Cluster
Additionally, you can generate short summaries for each of the topical cluster if you click the AI: Summarize Topics in the Analytics panel (or if you choose Topics > Summary in the 3D graph in the InfraNodus Obsidian plugin / InfraNodus browser extension.
A similar approach to generating summaries is used by Microsoft's GraphRAG implementation, except that their solution takes days to set up, uses much more resources, and implements a very bulky and inefficient way to build a graph. In fact, they seem to have been inspired by the idea of clustering presented in our 2019 paper but could not get it to work well.
You can also generate summaries for specific topic(s) by selecting them first and then clicking the AI Summary button. The summarization algorithm works the same way, using the part of the graph you selected and the underlying connectivity / statement data.
...
To generate topics and summaries for your content, create a new graph in InfraNodus.Com or use our API.
Comments
0 comments
Please sign in to leave a comment.