Commit Graph

1 Commits

Author SHA1 Message Date
Salvatore Sanfilippo 27dd3b71ce
Vector Sets fixes against corrupted data in absence of checksum verification (#14102)
Vector Sets deserialization was not designed to resist corrupted data,
assuming that a good checksum would mean everything is fine. However
Redis allows the user to specify extra protection via a specific
configuration option.

This commit makes the implementation more resistant, at the cost of some
slowdown. This also fixes a serialization bug that is unrelated (and has
no memory corruption effects) about the lack of the worst index /
distance serialization, that could lower the quality of a graph after
links are replaced. I'll address the serialization issues in a new PR
that will focus on that aspect alone (already work in progress).

The net result is that loading vector sets is, when the serialization of
worst index/distance is missing (always, for now) 100% slower, that is 2
times the loading time we had before. Instead when the info will be
added it will be just 10/15% slower, that is, just making the new sanity
checks.

It may be worth to export to modules if advanced sanity check if needed
or not. Anyway most of the slowdown in this patch comes from having to
recompute the worst neighbor, since duplicated and non reciprocal links
detection was heavy optimized with probabilistic algorithms.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-06-10 21:55:09 +08:00