Knowledge graphs represent relations between concepts, however most of the tools do not take advantage of advanced network analysis methods that can be used to derive insights from the graph's structure. InfraNodus can be used to apply text network analysis insights to knowledge graphs and in this article, we will demonstrate how you can do that using Obama's inauguration speech as an example.
Text network graphs are a powerful tool for visualizing and analyzing complex textual data, uncovering hidden patterns, relationships, and insights that may not be apparent through traditional linear analysis. By transforming text into an interconnected network of nodes (words or phrases) and edges (relationships based on co-occurrence), InfraNodus provides a unique perspective on the structure, themes, and potential for insight and innovation within a given discourse.
This guide will walk you through the process of reading and interpreting knowledge graphs that are built as text networks. We'll explore the fundamental concepts behind text network analysis and the step-by-step process of creating and navigating these graphs. In this guide, we will use Barack Obama’s last inauguration speech made in 2013.
How Do Text Networks Work?
Before we can learn how to leverage the tools and capabilities made available within InfraNodus, it is crucial that we appreciate the nature and relationship of the component parts. The fundamental components of the graph include:
- Nodes: The basic units of analysis, representing words extracted from the text after preprocessing which includes (a) removing stop words (i.e. common words that don't contribute to the text's meaning, such as "is", "the", and "all") in order to focus on meaningful terms and (b) reducing words to their base forms (lemmatization) to ensure consistency (e.g. "taken" becomes "take", and "regarded" becomes "regard")
- Edges: The connections between nodes. Edges are created based on the proximity of words within the text (known as co-occurrence), with weights assigned according to their distance (e.g. adjacent words have a weight of 3, words separated by one word have a weight of 2, and words separated by two words have a weight of 1). If the same connection repeats multiple times, the edge weights are added, resulting in thicker edges on the graph.
With these building blocks we are able to begin to understand how the text from Barack Obama’s last inauguration speech is transformed into turn it into this:
One of the primary advantages of InfraNodus is its ability to visualize a text or group of texts in a non-linear and visual way. As such, an important first step in analyzing a discourse is to orient yourself to the graph. This initial visual inspection of the graph communicates the gestalt of the discourse thereby familiarizing us to an unfamiliar text or reacquainting us with a text we are well-versed in. The following visual cues can facilitate this graph orientation:
-
Node Size: Represents the influence of a word within the network, based on betweenness centrality. Larger nodes are more influential in connecting different contexts or topics within the text.
-
Color Coding: The nodes that have the same color are the nodes that belong to the same community and form a distinct topical cluster. This measure is based on the iterative Louvain community detection algorithm, which detects the words that co-occur more often together than with the other words in the text and assigns a specific community and color to them. With colored clusters, even a cursory inspection of the graph gives us a good feel for the existence, relationship, and structure of the discourse’s overall themes
-
Network Topology: The nodes’ alignment on the graph is based on the iterative Force-Atlas algorithm, where the most connected hubs are pushed apart, while the nodes that are connected to the hubs are pulled towards them. This correlates with the community detection algorithm above, but is less precise and is better for visual analysis. The resulting graph’s structure is indicative of a number of insights such as densely connected clusters (main themes) or bridging nodes that connect different parts of the network (key transitional terms or concepts). We can also understand the general relationship specific words have with the other words and the discourse at large based on a node’s relative location and proximal nodes - those on the periphery represent peripheral terms, nodes in the center represent central terms. In the case of this discourse, not only do we see an dense intermingling of thematic clusters, for reference the following is a graph where the topography shows a much higher distinction between clusters.
Having familiarized ourselves with the basic components and structure of the graph, we can begin to interact with the graph:
-
Zooming In and Out: Graphs can be overwhelming; while there are ways we will be able to simplify the graph as we process it, an initial exploration of the graph can be supported by the ability to zoom in and out, giving you a closer look at specific nodes and their connections
-
Node Selection and Connection Tracing: When we select specific nodes, the graph will highlight that node and all of the connection edges leading out of it. An initial inspection of this might include selecting nodes that catch our attention and a brief exploration of the words that are one or two degrees away. In the case of this text we are exploring, we may be able to get an initial idea about the role a word like “citizen” plays in the speech (i.e. the word citizen is used in conjunction with skill+learn+courage and country+obligation)
-
Contextualization and Cleanup: When we select nodes we can also contextualize them by looking through the imported statements and where those nodes appear in them. Moreover, we can choose to individually remove nodes that are irrelevant to our exploration. In the case of the text we are exploring, we quickly notice that the word “applause” is the largest node, the reason for this may not be immediately obvious but by selecting the node and viewing the context it is used in the text we can discover that it is often included as an indication of an applause break - while it may be obvious to see what clusters of words are associated with applause breaks, we may also want to remove it all together to focus on only the content of the speech.
-
Discourse Construction and De-Construction: Inside “Adjust Settings” you may adjust the Node Filter threshold, in other words you can control which nodes are shown or not shown based on the number of connections they have. In doing so, you can construct and deconstruct the graph on the basis of influential terms.
Now that we have an understanding of the nature and dynamic of the graph from a visually interactive point of view, we can dive into to the specifics of the text analytics panel.
Text Analytics Panel, Basic
MAIN IDEAS, Main Topics:
InfraNodus identifies the main themes or topics within the text using a combination of clustering and graph community detection algorithms, such as the Louvain algorithm (Blondel et al.). This process involves detecting groups of nodes (words) that tend to co-occur together in the same context (next to each other) more frequently than with other words in the text. These groups of nodes form distinct topical clusters, which are aligned closer to each other on the graph using the Force Atlas algorithm (Jacomy et al.) and assigned a unique color for easy identification. In the Text Analytics Panel, under the Main Ideas section, you can review the Main Topics.
By default, the top four clusters of terms are displayed, aligning with the colored clusters explored visually in the graph. Each cluster includes a percentage measure of the influence that the respective topic has over the entire discourse, as well as the total number of nodes contained within the cluster. To gain a more intuitive understanding of these topical clusters, InfraNodus offers the "Reveal High-Level Ideas" feature, which leverages GPT to generate thematic names for the clusters based on the words they contain. By clicking this option, you can see the difference between the Main Topics with and without the high-level ideas revealed, providing a clearer sense of the key themes present in the text. Furthermore, you can interact with the Main Topics by selecting one or more words from a cluster, which will be represented visually in the graph. This allows for a more focused exploration of specific topics and their relationships within the broader context of the discourse.
MAIN IDEAS, Most Influential Concepts:
InfraNodus identifies the most influential nodes within the network using betweenness centrality, a measure from graph theory that calculates how often a node appears on the shortest path between any two randomly chosen nodes. Nodes with high betweenness centrality are considered the most influential, as they often bridge different topical clusters and play a crucial role in the flow of meaning throughout the text. The size of the nodes in the graph is based on their betweenness centrality, with larger nodes indicating higher influence. In our example, the most influential nodes are: american, require, time, people, believe, citizen, country, journey, and america. It's important to note that nodes with higher betweenness centrality are not only the ones that occur most frequently in the text but also those that connect different contexts or topics within the discourse. While there is some correlation with frequency and tf-idf measures, betweenness centrality is more context-aware, taking into account the node's role in linking distinct communities. One of the most powerful techniques in this section is the "Reveal Underlying Ideas" function. Similar to manually selecting and hiding individual nodes in the graph, this feature allows you to batch remove all of the current most influential concepts. This is particularly useful because the most influential concepts are often quite obvious, and by hiding them from the graph, you can uncover the words or nodes that substantiate these influential concepts. By iteratively using "Reveal Underlying Ideas," you can gradually reveal more and more of the substrata of ideas, learning about the crucial but less immediately obvious ideas in the text. As you do this, you'll notice that not only does the graph change, but the naming of the Main Topics (High-Level Ideas) also adapts. Paying attention to these changes can provide valuable insights into the discourse being explored. To select the top prominent nodes that have significantly higher influence than the rest, InfraNodus employs the Jenks elbow cutoff algorithm. This ensures that the most influential concepts identified are truly standout nodes within the network.
MAIN IDEAS, Topical Diversity: InfraNodus provides a topical diversity score that assesses the level of bias or focus within a text. This score indicates whether the discourse is centered around a few central concepts (Biased), has a narrow focus on specific themes (Focused), covers a wide range of topics (Diverse), or has loosely connected themes (Dispersed).
The topical diversity score is based on several measures:
- Modularity: Calculated using the Louvain community detection algorithm (Blondel et al., 2008), modularity measures the strength of division of a network into clusters. A modularity value greater than 0.4 indicates medium diversity, while a value greater than 0.65 suggests high diversity.
- Influence Distribution: This measure looks at the entropy of the top nodes' distribution among the top clusters, assessing how evenly the influential nodes are spread across the main topics.
- Percentage of Nodes in the Top Topic: The proportion of nodes belonging to the most dominant topic is considered.
- Relative Influence of the Top 2 Topical Clusters: The balance of influence between the two most prominent themes is evaluated.
In most cases, the optimal state for a text is "Diverse," as it suggests a balanced and comprehensive coverage of multiple distinct topics. However, the desired level of topical diversity may vary depending on the specific objectives of the analysis.
Practical Applications:
- Content Optimization: By aiming for a "Diverse" topical diversity score, content creators can ensure that their texts cover a broad range of relevant themes, providing a more engaging and informative reading experience for their audience.
- Bias Detection: A "Biased" or "Focused" score may indicate an overemphasis on certain concepts or a lack of diverse perspectives. Identifying such biases can help users to critically examine their sources and strive for more balanced and inclusive coverage.
- Discourse Evolution Tracking: Monitoring changes in the topical diversity score over time can reveal shifts in the focus and scope of a discourse.
- Comparative Analysis: By comparing the topical diversity scores of different texts or collections, users can gain insights into the relative breadth and depth of coverage across various sources.
- Interdisciplinary Research: A "Diverse" topical diversity score can indicate potential for interdisciplinary connections and knowledge integration.
By considering the topical diversity score and its practical applications, users can leverage InfraNodus to assess the balance and comprehensiveness of their texts, identify potential biases or gaps, and make informed decisions about content creation, analysis, and strategic planning.
BLIND SPOTS, Topics to Connect:
InfraNodus identifies structural gaps in the graph, which represent two distinct communities (clusters of words) that are important but not well-connected. These gaps are potential goldmines for uncovering new and innovative ideas, as they highlight areas where the discourse could be expanded or enriched by bridging the divide between seemingly disparate themes.
The "Topics to Connect" feature is based on a combination of the graph's connectivity and community structure. It selects groups of nodes that would either make the graph more connected if it's too dispersed or help maintain diversity if it's too interconnected. By suggesting potential bridges across these structural gaps, InfraNodus encourages users to explore novel connections and synthesize ideas from different domains.
Leveraging this functionality can be a catalyst for expanding and enriching the discourse in several ways:
- Idea Generation: By identifying unexplored connections between topics, users can brainstorm new ideas, research questions, or hypotheses that emerge from the intersection of distinct themes. This can lead to innovative solutions, original insights, or creative combinations of existing knowledge.
- Comprehensive Understanding: Bridging structural gaps helps users develop a more robust and holistic understanding of the discourse. By considering how seemingly unrelated themes might be connected, users can uncover hidden relationships, contextual factors, or underlying patterns that contribute to a deeper, more nuanced perspective on the subject matter.
- Interdisciplinary Insights: Structural gaps often represent opportunities for interdisciplinary collaboration and knowledge integration. By connecting topics from different fields or domains, users can facilitate the exchange of ideas, methods, and perspectives, leading to novel approaches and synergistic insights that might not have been possible within a single discipline.
- Narrative Enhancement: For writers and content creators, identifying and bridging structural gaps can help enrich their narratives by introducing new subplots, characters, or themes that add depth and complexity to the story. By weaving together seemingly disparate elements, authors can create more engaging, surprising, and thought-provoking content.
- Strategic Planning: In business or organizational contexts, uncovering structural gaps can inform strategic decision-making by highlighting untapped market opportunities, potential partnerships, or areas for innovation. By connecting previously unrelated areas, companies can develop unique value propositions, differentiate themselves from competitors, and create new sources of growth.
The visual dynamic of the "Topics to Connect" feature in InfraNodus is not only helpful for developing a more robust understanding of the discourse but also exciting and inspiring to see where the graph can be more intertwined. As users cycle through the various structural gaps, they can witness the potential for new connections and insights unfold before their eyes, sparking curiosity and encouraging further exploration.
BLIND SPOTS, Conceptual Gateways:
InfraNodus identifies certain nodes as "Conceptual Gateways," which serve as effective entry points or connectors for embedding new ideas into the discourse or initiating conversations on related topics without relying on the most obvious terms. These concepts bridge the gap between the main ideas and peripheral topics, making them valuable for expanding the scope and depth of the discourse.
Conceptual Gateways have a unique combination of properties that make them particularly useful for navigating and enriching the text:
- High Influence: These nodes have a significant impact on the flow of meaning throughout the network, as measured by their betweenness centrality. They often appear on the shortest paths between other nodes, facilitating the connection between different themes and ideas.
- Moderate Connectivity: While influential, Conceptual Gateways do not have an excessive number of direct connections, which makes them less congested and more accessible entry points into the discourse. This allows users to introduce new ideas or perspectives without being overshadowed by the most dominant themes.
- Diversity: Conceptual Gateways have an unusually high ratio of influence (betweenness centrality) to frequency, meaning that while they may not appear as often as the most influential nodes, they play a crucial role in shifting the narrative and connecting diverse parts of the network.
In the context of the text we are analyzing, InfraNodus identifies the following nodes as Conceptual Gateways: believe, principle, citizen, risk, ill, nation, man, government. By examining their location and connections in the graph, we can understand why they are considered influential:
- Strategic Positioning: Conceptual Gateways are often directly connected to one or more highly influential nodes, allowing them to leverage the centrality of these hubs to amplify their own impact on the discourse.
- Cluster Bridging: These nodes frequently connect to nodes from a variety of clusters, enabling them to facilitate the flow of ideas between different themes and contexts. By bridging these clusters, Conceptual Gateways help to maintain the coherence and diversity of the discourse.
Practical Applications:
- Discourse Expansion: By focusing on Conceptual Gateways, users can introduce new ideas, perspectives, or themes into the discourse in a way that is both relevant and non-disruptive. These nodes provide a natural starting point for expanding the scope of the conversation and encouraging the exploration of adjacent topics.
- Targeted Communication: When crafting messages or arguments related to the text, emphasizing Conceptual Gateways can help to engage a broader audience and establish connections between different stakeholder groups. By framing the discussion around these influential yet accessible nodes, communicators can increase the resonance and impact of their ideas.
- Interdisciplinary Collaboration: Conceptual Gateways can serve as bridge points for facilitating collaboration and knowledge sharing between different disciplines or domains. By identifying the nodes that connect diverse clusters, researchers or practitioners can pinpoint opportunities for interdisciplinary work and foster the exchange of ideas and methods.
- Learning Strategies: In educational contexts, focusing on Conceptual Gateways can help learners to grasp the key ideas and relationships within a complex text more effectively. By using these nodes as entry points and emphasizing their connections to both main ideas and peripheral topics, educators can create more engaging and coherent learning experiences.
- Creative Ideation: For those seeking creativity or innovation, exploring the connections and potential of Conceptual Gateways can spark new creative ideas and inspire novel combinations of themes and concepts. By using these nodes as prompts or starting points, creatives can generate fresh perspectives and insights that push the boundaries of the original discourse.
Advanced Text Analytics Panel
These advanced features provide users with granular insights into the relationships between nodes, the sentiment distribution across the discourse, the evolution of themes over time, and the structural properties of the network itself. By leveraging these data-rich analytics, users can uncover hidden patterns, track the progression of ideas, and assess the overall resilience and adaptability of the discourse.
Relations:
The Relations section allows users to explore the contextual connections between nodes in the graph. The "Select Relations" feature enables the comparison of word co-occurrences for a chosen group of nodes, revealing the strength and significance of their relationships within the broader discourse. Users can expand the list of connected nodes and assess their total influence score based on betweenness centrality. This text-data oriented perspective complements the visual exploration of the graph, providing quantifiable insights into the impact of specific nodes on the overall narrative.
The "Top Relations" view displays the most prominent pairwise connections in the graph, treating it as an undirected network. The occurrence count and weight of each relationship offer a measure of its importance within the discourse. For applications beyond language analysis, users can also download directed bigrams to study the directionality of these relationships.
Sentiment Analysis:
InfraNodus includes a sentiment analysis feature that classifies each statement in the text as positive, negative, or neutral. Based on the AFINN and Emoji Sentiment Ranking approaches, this tool allows users to filter the discourse by sentiment and explore the correlations between emotional tone and specific topics or themes.
By examining the sentiment distribution across the graph, users can gain insights into the affective dimensions of the discourse, identifying areas of consensus, conflict, or ambivalence. This emotional mapping can inform strategies for crafting more resonant narratives, anticipating audience reactions, or identifying potential points of contention.
Trends and Evolution:
The Trends section offers dynamic tools for tracking the progression of ideas and themes over time. The "Emerging Keywords" feature highlights the most recently added nodes that have a high local influence, allowing users to identify nascent trends and potential shifts in the discourse.
The "Evolution of Topics" chart provides a powerful visualization of how the main themes and influential keywords evolve as the narrative unfolds. By splitting the discourse into temporal segments and plotting the cumulative occurrence of key terms, this tool enables users to pinpoint critical junctures, detect the rise and fall of specific ideas, and assess the overall trajectory of the conversation.
Network Structure Insights:
InfraNodus provides a set of metrics and visualizations to analyze the structural properties of the text network graph. The "Mind Viral Immunity" score assesses the resilience and adaptability of the discourse based on its diversity and the distribution of influence among nodes. A high mind-viral immunity suggests that the text incorporates multiple perspectives and leverages both central and peripheral themes to propagate its ideas.
The "Structure" metric evaluates the level of diversity within the network using a combination of modularity, influence distribution entropy, and the relative prominence of the top topical clusters. This multidimensional assessment helps users identify whether the discourse is biased, focused, diversified, or dispersed, guiding strategies for optimizing the balance and coherence of the narrative.
The "Narrative Influence Propagation" chart visualizes how influence spreads through the network over time. A smooth, rhythmic propagation pattern indicates the dominance of a central idea or agenda, while a more variable profile suggests a greater role of secondary themes in shaping the discourse. This temporal analysis can reveal the underlying dynamics of the narrative and inform strategies for managing the flow of influence.
Finally, the "Degree Distribution" plot allows users to characterize the network as scale-free (long-tail power law distribution) or random (normal, bell-shaped distribution). Scale-free networks are more resilient to random disruptions and efficient in spreading information, while random networks exhibit a more equitable distribution of influence. By comparing the observed distribution to ideal power-law curves, users can assess the overall structure and robustness of the discourse.
Statistics:
The Stats section provides a summary of key text statistics, such as word count, unique lemma count, and lemma density. The "Top Lemmas" view ranks the most influential nodes in the network based on their betweenness centrality, offering a quick snapshot of the central concepts driving the discourse.
Tips for Effective Knowledge Graph Interpretation
- Start with clear objectives:
- Define the purpose of your analysis, whether it's to identify main themes, uncover hidden connections, or track the evolution of ideas.
- Formulate specific questions, hypotheses, and/or initial expectations to guide your exploration of the text network graph.
- Align your objectives with the relevant features and metrics provided by InfraNodus to ensure a focused and meaningful analysis.
- Familiarize yourself with the graph's components:
- Understand the significance of nodes (words or phrases), edges (connections based on co-occurrence), colors (topical clusters), and sizes (influence or centrality).
- Recognize the implications of the graph's layout, such as the proximity of nodes, the density of connections, and the overall topology of the network.
- Explore the various centrality measures (e.g., betweenness, degree) and their interpretations in the context of text analysis.
- Iteratively explore the graph:
- Begin with a high-level overview of the graph to identify the main clusters, influential nodes, and structural patterns.
- Use InfraNodus's interactive features to zoom in on specific regions, isolate subgraphs, and trace the connections between nodes of interest.
- Experiment with different layouts, filters, and thresholds to uncover hidden patterns and test alternative perspectives on the data.
- Engage in a cyclical process of exploration, interpretation, and refinement, allowing insights from each iteration to inform the next.
- Leverage the full suite of analytical tools:
- Utilize the main ideas section to identify the dominant themes, influential concepts, and topical diversity of the discourse.
- Explore the blind spots to uncover potential gaps, bridges, or untapped opportunities within the narrative landscape.
- Analyze the sentiment distribution to understand the emotional tone and its correlation with specific topics or themes.
- Track the evolution of ideas over time using the temporal analysis tools to identify trends, shifts, or critical junctures in the discourse.
- Assess the structural properties of the network, such as its mind-viral immunity, diversity, and propagation dynamics, to gauge its resilience and adaptability.
- Synthesize multiple perspectives:
- Combine insights from various angles, such as the visual layout, quantitative metrics, sentiment analysis, and temporal dynamics.
- Identify consistencies, contradictions, or synergies between these different perspectives to develop a holistic understanding of the discourse.
- Consider how the interplay of main themes, peripheral ideas, and structural properties shapes the overall narrative trajectory and its potential implications.
- Contextualize the findings:
- Situate the insights derived from the text network analysis within the broader context of the discourse.
- Evaluate how the patterns, themes, or dynamics revealed by the analysis align with or challenge existing knowledge or assumptions about the subject matter.
- Consider the limitations and potential biases inherent in the text data and the analytical tools, and acknowledge alternative interpretations or perspectives that may not be captured by the graph.
- Integrate qualitative and quantitative analysis:
- Complement the quantitative metrics and structural insights with close reading and qualitative interpretation of key passages or clusters.
- Use the graph as a navigational aid to identify salient regions of the text for deeper exploration and exegesis.
- Validate or challenge the patterns and connections suggested by the quantitative analysis through careful consideration of the textual evidence and its nuances.
- Iterate and refine:
- Treat the analysis as an ongoing process of exploration, discovery, and refinement, rather than a one-time exercise.
- Revise your initial hypotheses or questions based on the insights gained from each iteration, and adjust your analytical approach accordingly.
- Seek feedback from other experts or stakeholders to validate your findings, challenge your assumptions, and enrich your interpretive framework.
- Communicate and apply the insights:
- Translate the technical findings into clear, actionable insights that align with your initial objectives and the needs of your target audience.
- Use visualizations, narratives, or case studies to convey the key takeaways and their implications for research, strategy, or creative endeavors.
- Develop a plan for applying the insights to inform decision-making, guide further research, or inspire new directions for exploration and innovation.
Comments
0 comments
Please sign in to leave a comment.