Hi Adrian,
thanks for the reply, here is some further information:
- I used CALL KEYWORDS as you suggested and this is what I get:
CALL KEYWORDS (‘carbimazole macleods 20mg tablets’, ‘myHB_index1’);
±-----±--------------------------±--------------------+
| qpos | tokenized | normalized |
±-----±--------------------------±--------------------+
| 1 | carbimazole | carbimazole |
| 2 | macleods20mgtablets | macleods20mgtablets |
±-----±--------------------±--------------------+
Obviously the tokenization is incorrect.
I have a regex expression that standardizes ‘dosage dosage_units’ terms for medicines, so it converts ‘20 mg’ to ‘20mg’ (the standard format for me is without space between numbers and units).
This is the regex:
regexp_filter = (?i)(?:\s|\b)?(\pN*[.,]?\pN*)(?:\s|\b)?(mg/ml|ml/amp|mg|ml|ui|g|units|iu|mcg|µg)(?:\s|\b) => \1\2
I tested the regex and use it also in other apps so it should work fine, what may go wrong is the space before and after the regex ?
Is it possible that the regex parser does not handle non-capturing-groups like (?:\s|\b) correctly ?
Any ideas ?
Thanks
Roberto