Wordforms implementationn warning invalid mapping (destination contains blended characters)


#1

I am using a wordforms file and I get the following warning:

WARNING: invalid mapping (destination contains blended characters) (wordform=‘β-apopicropodophyllin > beta-Apopicropodophyllin’). Fix your wordforms file ‘/var/lib/manticore/data/wordforms.txt’.

The ‘-’ character is indeed a blend character.

Can anyone explain what is the reason for this warning, is the particular pair just discarded but the rest of the file is used just fine ?

Thanks
Roberto


#2

blend character produces multiple destination tokens from single source token taking into account blend_mode index setting. Some cases described at doc

However wordform produces exact tokens at right part of wordform decription after >, ie it does not generate multiple variants from beta-Apopicropodophyllin destination tokens.

It could be better to write it as multiple tokens beta Apopicropodophyllin or single token without blend_char like beta_Apopicropodophyllin


#3

I found the best option was to replace blend chars with blank.

I am also getting lots of warnings about stopwords can I just ignore them ? Is the wordform simply skipped or it it used anyway ?


#4

stopwords got removed from wordforms however that might affect you phrase matching or matching where term position matters