Length document (search string)

Hi

I am creating plain index products. Some search strings can be up to 6000 characters long. But when I give an index listing (over mysql client)

SELECT * FROM index WHERE id = 1

I assume that the document string ends somewhere at 390 characters. And if I want to find something, whatever is beyond 390 characters, it won’t find anything. And in the spb file, the entire document string is not present.

Where is configuration index:

index products {
path = /var/lib/manticore/products
source = products
min_infix_len = 3
type = plain
ngram_len = 2
charset_table = 0…9, a…z->A…Z, A…Z, .
}

Where am I making mistake? Or what’s wrong?

Jaroslav

could you provide your index data source config along with a query that failed?

sure

source products
{
type = mysql
sql_host = mysql
sql_user = *
sql_pass = *
sql_db = *
sql_port = 3306
sql_query_pre = SET NAMES utf8
sql_query = SELECT id, product FROM products
sql_attr_uint = id
sql_field_string = product
}

And I’m sure the query returns strings longer than 390 characters

Jaroslav

Could you try HTTP or API client? as it could be MySQL client string field constrain

For testing the maticore I am using mysql client. But the whole string of that product is not even in that produsts.spb file. And if I’m looking for anything beyond the 390 character, nothing is found

Jaroslav

Can you provide a full example? E.g. like this:

snikolaev@dev:~$ cat csv_min.conf
searchd {
    listen = 9315:mysql41
    log = searchd.log
    pid_file = searchd.pid
    binlog_path =
}

source src {
    type = csvpipe
    csvpipe_command = echo "1,abc" && echo "2,abc" && echo "3,abc abc"
    csvpipe_field = f
}

index idx {
    type = plain
    source = src
    path = idx
    stored_fields = f
}

My full config is this:

searchd {
listen = 9306:mysql41
listen = /var/run/mysqld/mysqld.sock:mysql41
listen = $ip:9312
listen = 9308:http
listen = $ip:9315-9325:replication
log = /var/log/manticore/searchd.log
max_packet_size = 128M
pid_file = /var/run/manticore/searchd.pid
query_log_format = sphinxql
query_log = /var/log/manticore/query.log
}

source products
{
type = mysql
sql_host = mysql
sql_user = *
sql_pass = *
sql_db = *
sql_port = 3306
sql_query_pre = SET NAMES utf8
sql_query = SELECT id, product FROM products
sql_attr_uint = id
sql_field_string = product
}

index products {
path = /var/lib/manticore/products
source = products
min_infix_len = 3
type = plain
ngram_len = 2
charset_table = 0…9, a…z->A…Z, A…Z, .
}

But still nowhere did he seriously answer me whether the spb file should contain all the text that is indexed. I keep saying that not everything that is loaded from the sql server is in it. Plain text from the server has 15305852 bytes, but spb is 20 times smaller and has 7689807 bytes. This means that when saving to the index, something shortens it to just about 390 characters.

Jaroslav

could you provide minimal reproducible example with couple of documents there you indexes long document when select query from daemon returns only part of the string?

Pay attention to this:

sql_query = SELECT id, product FROM products

we don’t know what’s in product.

vs

    csvpipe_command = echo "1,abc" && echo "2,abc" && echo "3,abc abc"

we know the values.

OK, I create csv from sql and import it into searchd.

Good news, when I load data via csv pipe, all data is present. Although I did not make any changes to the products index configuration, it also loaded in its entirety. And the spb file already roughly matches the length of the csv file. So I don’t understand this at all.

Then please provide a dump of your table products, so we can reproduce it on our side and more details on

document string ends somewhere at 390 characters

i.e. what exactly string seem to end at 390 characters.

I’m sorry, but I can’t provide the dump tables, I’m using sharp data and they’re reaching sensitive business information.

In addition, after a few server starts and data loading from both csv and sql servers, all data is already present in the index. Ie. the problem somehow resolved on its own. I don’t know if a server restart or something else solved the problem. My only clue that the data was not in the index was the size of the spb file. At the time of the error, the spb index file was 20 times smaller.

Jaroslav