Recently had renewed interest in using bigram_index
with an index, ultimately so can find common two word phrases.
Problem is, its indexing two words, across sentences.
tried using phrase_boundary
(with phrase_boundary_step=10
), but it doesn’t seem to work. I think it works without bigram_index (because its manipulated, the keyword position. )
But when using bigram_index, the two words are entered as a keyword in the dictionary. So the document matches.
I guess the question is, should phrase_boundary be affecting a bigram_index index? And if not, is there an alternative way to prevent indexing across sentence boundaries.