My use case begining from situation when the main DB is MongoDB which have id of docs as 12byte ObjectId, and that is reason why can’t use it as id for Manticore.
For example having index
CREATE TABLE test(id bigint, my_id string, value text);
INSERT INTO test(my_id, value) VALUES("a100b200", "some text for search");
INSERT INTO test(my_id, value) VALUES("a101b200", "some another text for search");
then i need to update document with new text for search. But i can’t do it using 1 request, because don’t no it id in Manticore to do like this
REPLACE INTO test VALUES(id, "a100b200", "new text for search")
I purpose that I can make this request
REPLACE INTO test SELECT id, "a100b200", "new text for search" FROM test WHERE my_id = "a100b200"
But got an error like this
ERROR 1064 (42000): sphinxql: syntax error, unexpected SELECT, expecting VALUES
Is it a good way to solve this problem?
One possibility might be to use a ‘hash’ of the real key for the ‘id’.
Find something that gives a good 64bit hash. I think manticore actually uses fnv64 quite a bit.
There is a chance of hash collisions, but in practice should be very rate, probably won’t encounter it!
Note, Manticore <= 5.0.2 supports ids in the range from 1 to 2^63-1 (i.e. 63 bits)
Versions after 5.0.2 support -2^63 … 2^63 - 1 (i.e. 64 bits signed) except 0.
In future we are going to change that to 0 … 2^64 - 1 (64 bits unsigned)
Ah good point.
One point not really considered before, but Is there a hash algorithm that produces a hash size of 64 bits in C#? - Stack Overflow says…
The cryptographic has functions are designed such that you may truncate the output to some size and the truncated hash function remains a secure cryptographic hash function. For example, if you take the first 128 bits (16 bytes) of the output of SHA-512 applied to some input, then the first 128 bits are a cryptographic hash as strong as any other 128-bits cryptographic hash.
The solution is to choose some cryptographic hash function - SHA-256, SHA-384, and SHA-512 are good choices - and truncate the output to 128 bits (16 bytes).
So can just take the first 63 bits, of sha256 (common on most system these days would thing) or whatever - treat as signed.
I real case there is an ObjectId (12 byte) of doc id. That’s mean that i can’t use any 8 byte (bigint) type data without collisions.
Yes, there is a chance of collisions. But in practice, it probably won’t happen.
Birthday attack probability | hash function collisions.
So given 63 bit hash, thats 9223372036854775807 unique ids. sqrt of that is 3037000499.
… so in practice, as long as have less than 3 billion documents, unlikely to run into a collision.
In fact looking at https://www.mongodb.com/docs/manual/reference/method/ObjectId/
The first 4 bytes are timestamp (in seconds), then 5 bytes of random.
So in practice, could probably quite safely, just take the first 8 bytes of that (the well 63bits!.)
Unless was inserting more than about 46k documents/second, should be quite safe
First of all thank you for attention for my question and discussion.
Yes, my first working idea was use those 5 bytes to make it is bigInt. I even tested it with 400 millions documents in production DB. I could not to find collisions.
But when i come with this solution to my CTO he give me one simple question - “What you gona do when a collision is occurred in production environment ?” I thinked about a consequences and answered “Yes, it is a bad result. And this solution not be able in project that o working for”
That is a reason why i started to find another solution.
It a very good idea to use full range of variants of id (2^64). I hope you do it soon! Manticore is a good project
Ok, well DELETE accepts a where condition. It seems pretty easy to just run two queries
DELETE FROM test WHERE my_id = "a100b200";
INSERT INTO test(my_id, value) VALUES("a100b200", "new text for search");
Yes, all correct. This is will be my solution, if in next weeks i can’t find any better. But i very want to find it because one request netter than two request (i am using http endpoint of manticore)
Can make muliple queries in one HTTP request with the /cli endpoint
Thanks! It’s sounds as good. I will try it in next week.