When you make a request to the InfraNodus API, you will be provided a JSON response object. It will provide information about the underlying graph structure with all the stats and concept relations, including the structural gaps, as well as the original statements / relations with tags, graph statistics, topical clusters, and content gaps.
The structure of the obtained JSON object will have the following main properties (not an exclusive list):
entriesAndGraphOfContext— contains the statements and the graphstatementsgraph— graph in Graphology format with statistical attributes, nodes, and edges
extendedGraphSummary— graph analytics with main topics, gaps, main concepts, and statsgraphSummary— same asextendedGraphSummarybut in string formatusername,graphUrl,graphName
Here is how the object will look like in JSON:
{
"entriesAndGraphOfContext": {
"statements": [],
"graph": {
"graphologyGraph": {
"attributes": {},
"options": {},
"nodes": [],
"edges": []
},
"extendedGraphSummary": {
"contentGaps": [],
"mainTopics": [],
"mainConcepts": [],
"topRelations": [],
"diversityStatistics": {}
},
"graphSummary": '',
"userName": '',
"graphUrl": '',
"graphName" ''
}
}
Diversity Statistics
One of the main features of InfraNodus is its ability to assess the graph diversity of a text and determine whether its structure is optimal or if there's an excessive focus on a specific topic. This can be useful for classification, enhancing features, or offering additional context to your LLM clients.
The insights on the graph's structure that include its diversity and bias are added into the entriesAndGraphOfContext.graph.graphologyGraph.attributes as
"diversity_stats": {
"diversity_score": "focused",
"modularity_score": "medium",
"too_focused_on_top_nodes": true,
"too_focused_on_top_clusters": true,
"ratio_of_top_nodes_influence_by_betweenness": 0.63,
"top_nodes_entropy": 0,
"ratio_of_top_cluster_influence_by_betweenness": 0.63,
"total_clusters": 3,
"fair_influence_by_cluster": 0.33
},
If you choose to show extendedGraphSummary this data is also provided in the entriesAndGraphOfContext.extendedGraphSummary.diversityStatistics as
"diversityStatistics": {
"diversity_score": "focused",
"modularity_score": "medium",
"too_focused_on_top_nodes": true,
"too_focused_on_top_clusters": true,
"ratio_of_top_nodes_influence_by_betweenness": 0.63,
"top_nodes_entropy": 0,
"ratio_of_top_cluster_influence_by_betweenness": 0.63,
"total_clusters": 3,
"fair_influence_by_cluster": 0.33
},
Graph and Statements
Below is a more detailed version of the return object with some sample data:
{
"entriesAndGraphOfContext": {
"statements": [
[
{
"id": 1,
"content": "Some content",
"contextId": 1,
"categories": [],
"createdAt": "2024-04-19T10:47:33.528Z",
"sordId": 1,
"statementHashtags": [
"content",
],
"statementCommunities": [
"1", // what topical cluster each concept belongs to
],
"topStatementCommunity": "1" // most common (median) value from "statementCommunities"
"topStatementOfCommunity": "1" // taken from graph.graphologyGraph.topClusters.community["1"].topStatementId - which means that this statement appears most frequently in the list of the statements that belong to a community
},
]
"graph": {
graphologyGraph {
attributes: {
"nodes_to_statements_map": { // maps nodeName to statementId where it appears.
"infranodus": [
107145146
],
}
"modularity": 0, // community structure modularity score
"top_nodes": [
"infranodus",
"tool"
],
"top_clusters": [
{
"community": "0",
"nodes": [
{
"nodeName": "infranodus",
"degree": 1,
"bc": 0,
"x": -7.081424236297607,
"y": 4.571528401768881e-16
}
],
"number": 2, // nodes inside the community cluster
"numberRatio": 1,
"bcRatio": 0.2, // cumulative betweenness centrality score for the cluster
"averagePosition": { // cluster position
"x": 0,
"y": 2.354271166867184e-17
}
"statementIds": [ // for every node in this cluster, what are the statements that contain them at least once?
107145146,
107145147,
107145146
],
"topStatementId": 107145146 // the statement that contains the widest range of nodes from that cluster
}
],
"gaps": [] // gaps between clusters with additional information
}, // end of attributes
options: {
"allowSelfLoops": true,
"multi": false
"type": "undirected"
}
"nodes": [
{
"label": "infranodus",
"id": "ab9245a5-5e98-59e2-ad79-c82bd0ea2287",
"weighedDegree": 3,
"degree": 1,
"bc": 0, // betweenness centrality
"community": 0, // the topic the node belongs to
"x": -7.081424236297607, // position on a 2D plane
"y": 4.571528401768881e-16
}
],
"edges": [
{
"source": "ab9245a5-5e98-59e2-ad79-c82bd0ea2287",
"target": "b5a3c19e-66bf-59a9-b4d3-37b1c2cdc351",
"id": "15e49fb4-02d1-5dba-88ff-e8d8e5da0407",
"weight": 3,
"context_matrix": { // which graphs contain this statement
"sample_graph": { // the name of the graph
"107145146": 3
}
]
} // end of graphologyGraph
"statementHashtags": { // concepts found in each statement — careful, may reveal your content
"107145146": [
"#infranodus",
"#tool"
]
}
}, // end of graph
"extendedGraphSummary": {
"contentGaps": [
"Gap 1: 1. Text Processing (process string statement text separate) -> 2. Symbolic Relations (symbol concept relation)"
],
"mainTopics": [
"1. Text Processing: process string statement text separate (0 | 63% | 100%)",
"2. Symbolic Relations: symbol concept relation (1 | 38% | 0%)"
],
"mainTopicNames": [
"1. Text Processing",
"2. Symbolic Relations"
],
"mainConcepts": [
"process",
"string",
"statement"
],
"topInfluentialNodes": [
{
"node": "process",
"bc": 0.09523809523809523,
"degree": 4
}
],
"topRelations": [
"1) text <-> string",
"2) string <-> process",
"3) process <-> statement"
],
"topBigrams": [
"text string",
"string process",
"process statement"
],
"topicsToDevelop": [
"string <-> process [label=\"text, statement, separate\"]",
"symbol <-> concept [label=\"relation\"]"
],
"conceptualGateways": [
"process",
"string",
"statement"],
"conceptualGatewaysGraph": [],
"diversityStatistics": {
"modularity": "0.41",
"diversity_score": "biased",
"modularity_score": "high",
"too_focused_on_top_nodes": true,
"too_focused_on_top_clusters": true,
"ratio_of_top_nodes_influence_by_betweenness": 1,
"top_nodes_entropy": 0,
"ratio_of_top_cluster_influence_by_betweenness": 1,
"total_clusters": 2,
"fair_influence_by_cluster": 0.5
},
"graphSummary": "...",
"userName": name of the user who generated the response,
"graphUrl": link to the graph url (if doNotSave query parameter is false),
"graphName": name of the graph (if doNotSave query parameter is false)
}
}
Graph and Statements Attributes
Graph attributes are used to provide statistics on the main clusters and nodes calculated by InfraNodus. Each statement is categorized by the cluster it belongs to as well as the cluster that it represents the most.
A few notes on how parameters in the entriesAndGraphOfContext > graph > graphologyGraph > attributes > top_clusters as well as entriesAndGraphOfContext > extendedGraphSummary > mainTopics are calculated:
top_clusters[0].numberRatio— The number of nodes in a cluster to the total number of nodes. Shows the relative weight of a topic in terms of linguistic variety — e.g. if a rich vocabulary consisting of multiple words is used in the topic OR if the topic contains many words, this parameter will be higher.
top_clusters[0].bcRatio— Cumulative betweenness centrality (influence) measures the relative influence of the topic to the total cumulative influence of all the concepts that appear in the graph. The higher the measure, the more influential terms this topical cluster contains.
top_clusters[0].statementIds— Each topical cluster contains a set of concepts. Each of these concepts may appear in several statements. This parameter will show all the statements that contain at least one concept from the topical cluster. All the statements listed here will touch on this topical cluster at least in some way.
top_clusters[0].topStatementId— one of these statements from the `statementIds` will appear most frequently in the list (highest frequency of occurrence). This means that one statement has a particularly high concentration of concepts that belong to a particular topic. This statement will be designated as the top statement for this particular topic. In other words, if we select this cluster on the graph, this particular statement will cover more of its nodes and edges than any other statement.
How the parameters in `statements` are calculated:
statementCommunities— Each word in a statement belongs to a certain topical cluster. This parameter lists all the clusters for all the concepts that appear in the community.topStatementCommunity— What is the most common community for most words that appear in this statement (e.g. if we look at this statement in the graph, what is the topical cluster it will cover the most of its nodes and edges). Use it to find the community that this statement is focused on the most. It is important to note that this is not the top statement for that community as there may be other statements that have a wider range of different concepts from that community and therefore are better embedded into it.
topStatementOfCommunity— when we take a cluster and see what's the most common statement ID that occurs in the list of `statementIds` (saved in the top_clusters `topStatementId` parameter) — we will then assign the community id of that cluster to that statement. This is how this parameter is formed. (e.g. if we select this topical cluster on the graph, this particular statement will cover most of its nodes and edges). Use it to find the statement (and the edges) that will have the widest range of concepts from that community.
On the difference between topStatementCommunity and topStatementOfCommunity:
- if you select a cluster and want to see the top statement that matches the most nodes in this cluster (and also has the widest range of nodes from that community represented in it), then choose the topStatementOfCommunity as it will provide exactly that match. This is going to be the one statement that spans that cluster the best way possible.
- if, however, you want to find all the statements that belong to a certain community, you can use the topStatementCommunity as it shows which community this particular statement is embedded in the best.
Comments
0 comments
Please sign in to leave a comment.