Skip to main content

Problems with Japanese

Comments

3 comments

  • Official comment
    Dmitry Paranyushkin

    Hi! Thank you for your feedback.

    I wanted to let you know that it has been taken into account and now should be working with Japanese the way you expected. 

    Could you please check it and let us know what you think?

  • kazumi kaizuka

    I’ve checked the latest version, and I can confirm that Japanese text is now being tokenized correctly. It’s successfully extracting meaningful concepts like compound words instead of single characters, which is a huge improvement.
    I’m really looking forward to seeing these improvements reflected in the Obsidian plugin as well. Being able to use this level of Japanese analysis directly within my local PKM workflow would be incredibly valuable for my research.
    Thanks for your great work!

    0
  • kazumi kaizuka

    I’ve tested the improved Japanese support, and I can confirm that the tokenization engine itself is working correctly! It is now successfully extracting important compound words like “構造” (Structure) and “観客” (Audience), which is a great improvement.
    However, I noticed that there are still many single-character nodes appearing in the graph, such as “十” (ten), “分” (minute/part), “一” (one), and “見” (see).
    In Japanese morphological analysis, this is not exactly a bug, but rather an issue of stopword optimization. These characters are technically words, but they act as “noise” in a concept graph unless they are part of a compound word.


    To make the default Japanese setting more usable out-of-the-box, I recommend adding single-character numerals and common abstract parts of speech to the default exclusion list.
    I will customize my own list for now, but I thought this feedback might help improve the default Japanese experience for future updates.

    0

Please sign in to leave a comment.