One thing I’d love to be able to do is store various SimHashes per document. I could then XOR the two values and take the POPCNT of the result (the hamming distance) as a quick measure of similarity. Being able to name and order on that field could then be a quick and easy way to find similar documents.
A query would look something like this:
SELECT *, POPCNT(this.simhash ^ other.simhash) as similarity
WHERE …
AND similarity <= 5
ORDER BY similarity DESC LIMIT 10