Analyzing Large Texts and Big Articles: InfraNodus Data Limits and Optimal File Size – Nodus Labs Support Center

InfraNodus is an iterative analysis tool. It is made to help you gain a different perspective on text and to read it in a non-linear way revealing the patterns and gaps within.

While it works for big amounts of data, it's not always best to start with as much as possible. We recommend splitting your data into smaller chunks, so you can get an insight into each fragment first. Then you can synthesize the insights you got from the different fragments.

The reason for this is that your interpretation of the text network is a very important part of the analysis. The longer the text, the higher the number of connections between the elements in the text. If the data is too big, your graph will become incomprehensible as the network will have too many connections. Only the most obvious terms and topics will stand out and you will get generic results.

That's why we first recommend understanding the kind of objectives you're looking for first and how you could achieve them with the least amount of data.

Normally, the optimal length of a document to analyze in InfraNodus is about 300 Kb (0.3 Mb), which translates into about 40 000 words. If you want to edit, write, and develop the document further, it's even better to start with something smaller, like 5 to 10 thousand words (about 40 - 80 Kb).

If you're analyzing CSV files, we would recommend starting with 300-1000 rows.

Once you get some insights from these smaller chunks of data, you can increase the size gradually and see where the point is where the graph interpretation is not helpful any longer. If you want to go beyond this point, you probably need to use other tools, but keep in mind that they will just give you generic analysis of the main topics and keywords within. If you'd like to get insights, you would still need to think of various filters and categorizations you could use to produce meaningful ideas (e.g. comparing the text by sentiment or revealing the main topics by location) — which implies splitting the text into smaller parts anyway.

Related articles