Index-time tokenizer UDFs. XXX_begin_document


#1

Using the delivered UDF example, udfexample.c,
which makes the library libudfexample.so (I renamed to udfexample.so to get it to work).
Added it to the index test1 created via the install process
sphinx.conf

index test1
{
source = src1
path = /var/lib/manticore/test1
index_token_filter = udfexample.so:hideemail
}

I rotate the index via
sudo -u manticore indexer -c /etc/sphinx/sphinx.conf test1 --rotate

The process runs and the screen receives the output from the UDF
However, the expected output from hideemail_begin_document:
UdfLog ( “Called hideemail_begin_document” );
never occurs.

I get all the other outputs (for example)
WARNING: PLUGIN: Called hideemail_init
WARNING: PLUGIN: Called hideemail_deinit
WARNING: PLUGIN: Called hideemail_init
WARNING: PLUGIN: Called hideemail_push_token
WARNING: PLUGIN: Called hideemail_push_token
WARNING: PLUGIN: Called hideemail_get_extra_token
WARNING: PLUGIN: Called hideemail_get_extra_token

total 4 docs, 193 bytes
total 0.021 sec, 8983 bytes/sec, 186.19 docs/sec
WARNING: PLUGIN: Called hideemail_deinit

Does XXX_begin_document still work?
Why is XXX_begin_document not being called.

From the indexer rotate command output, its clear that if finds documents
"total 4 docs, 193 bytes"

I have dropped the index and recreated it.
I have dropped the data and re-added it.
No matter what I do, there never seems to be a call to XXX_begin_document.

According to the documentation XXX_begin_document is mandatory function.
What is the scoop?

Thanks
Brad


#2

seems it got called only for RT index - on indexing document

You might create ticket at Github to be informed on issue progress


#3

Thanks. Opened an issue on Github (#276)