Custom word segmentation for languages with continuous scripts

I am looking for design suggestions on how to add proper word segmentation support for languages with continuous scripts. Currently Manticore can only segment Chinese through the help of ICU or Jieba. I want to segment other languages such as Korean, Japanese, Thai, Tibetan, and other Chinese languages.

I have looked at how this could be done using plugins. Segmentation can be implemented at indexation-time with a custom plugin, because a plugin can output more than one token given a single token as input, using xxx_get_extra_token() callback. However, it looks like a query-time plugin can only output a single token for each token produced by the base tokenizer. This prevents doing segmentation at query-time as well.

This is a problem because in continuous scripts, word boundaries are kind of vague, so a typical search query can easily include text that the indexer would have split into several tokens, preventing a match. Example in Japanese: 食べた is reasonably considered as a single word, but most tokenizers (such as Mecab or Sudachi) split it into two tokens 食べ and た.

Another requirement that I have is to be able to query multiple indexes at once, with each index configured to use a different language-specific tokenizer, something like SELECT * FROM japanese_index, korean_index where MATCH('here could be Japanese or Korean'). However, I cannot do that because the query-time plugin needs to be specified as OPTION token_filter='mylib.so:mylib', in other words it is not index-dependent.