My use case begining from situation when the main DB is MongoDB which have id of docs as 12byte ObjectId, and that is reason why can’t use it as id for Manticore.
For example having index
CREATE TABLE test(id bigint, my_id string, value text);
INSERT INTO test(my_id, value) VALUES("a100b200", "some text for search");
INSERT INTO test(my_id, value) VALUES("a101b200", "some another text for search");
then i need to update document with new text for search. But i can’t do it using 1 request, because don’t no it id in Manticore to do like this
REPLACE INTO test VALUES(id, "a100b200", "new text for search")
I purpose that I can make this request
REPLACE INTO test SELECT id, "a100b200", "new text for search" FROM test WHERE my_id = "a100b200"
But got an error like this ERROR 1064 (42000): sphinxql: syntax error, unexpected SELECT, expecting VALUES
Note, Manticore <= 5.0.2 supports ids in the range from 1 to 2^63-1 (i.e. 63 bits)
Versions after 5.0.2 support -2^63 … 2^63 - 1 (i.e. 64 bits signed) except 0.
In future we are going to change that to 0 … 2^64 - 1 (64 bits unsigned)
The cryptographic has functions are designed such that you may truncate the output to some size and the truncated hash function remains a secure cryptographic hash function. For example, if you take the first 128 bits (16 bytes) of the output of SHA-512 applied to some input, then the first 128 bits are a cryptographic hash as strong as any other 128-bits cryptographic hash.
The solution is to choose some cryptographic hash function - SHA-256, SHA-384, and SHA-512 are good choices - and truncate the output to 128 bits (16 bytes).
So can just take the first 63 bits, of sha256 (common on most system these days would thing) or whatever - treat as signed.
So in practice, could probably quite safely, just take the first 8 bytes of that (the well 63bits!.)
Unless was inserting more than about 46k documents/second, should be quite safe
First of all thank you for attention for my question and discussion.
Yes, my first working idea was use those 5 bytes to make it is bigInt. I even tested it with 400 millions documents in production DB. I could not to find collisions.
But when i come with this solution to my CTO he give me one simple question - “What you gona do when a collision is occurred in production environment ?” I thinked about a consequences and answered “Yes, it is a bad result. And this solution not be able in project that o working for”
That is a reason why i started to find another solution.
Yes, all correct. This is will be my solution, if in next weeks i can’t find any better. But i very want to find it because one request netter than two request (i am using http endpoint of manticore)