Faster KNN search in Manticore: 2-pass HNSW, batched distances, and AVX-512

Faster KNN Search in Manticore.

We reworked the HNSW search loop: 2-pass neighbor processing, batched distance scoring, compile-time dispatch, and AVX-512. In tests: up to +29% throughput at large k.