Commit Graph

9 Commits

Author SHA1 Message Date
Salvatore Sanfilippo bc71e8fe2d [Vector sets] More rdb loading fixes (#14032)
Hi all, this PR fixes two things:

1. An assertion, that prevented the RDB loading from recovery if there
was a quantization type mismatch (with regression test).
2. Two code paths that just returned NULL without proper cleanup during
RDB loading.
2025-05-27 15:39:18 +03:00
Salvatore Sanfilippo ad9575dd6d [Vector sets] RDB IO errors handling (#13978)
This PR adds support for REDISMODULE_OPTIONS_HANDLE_IO_ERRORS.
and tests for short read and corrupted RESTORE payload.

Please: note that I also removed the comment about async loading support
since we should be already covered. No manipulation of global data
structures in Vector Sets, if not for the unique ID used to create new
vector sets with different IDs.
2025-05-27 15:39:18 +03:00
Salvatore Sanfilippo 11947d8892
[Vector sets] fast JSON filter (#13959)
This PR replaces cJSON with an home-made parser designed for the kind of
access pattern the FILTER option of VSIM performs on JSON objects. The
main points here are:

* cJSON forces us to parse the whole JSON, create a graph of cJSON
objects, then we need to seek in O(N) to find the right field.
* The cJSON object associated with the value is not of the same format
as the expr.c virtual machine. We needed a conversion function doing
more allocation and work.
* Right now we only support top level fields in the JSON object, so a
full parser is not needed.

With all these things in mind, and after carefully profiling the old
code, I realized that a specialized parser able to parse JSON in a
zero-allocation fashion and only actually parse the value associated to
our key would be much more efficient. Moreover, after this change, the
dependencies of Vector Sets to external code drops to zero, and the
count of lines of code is 3000 lines less. The new line count with LOC
is 4200, making Vector Sets easily the smallest full featured
implementation of a Vector store available.

# Speedup achieved

In a dataset with JSON objects with 30 fields, 1 million elements, the
following query shows a 3.5x speedup:

vsim vectors:million ele ele943903 FILTER ".field29 > 1000 and .field15
< 50"
     
Please note that we get **3.5x speedup** in the VSIM command itself.
This means that the actual JSON parsing speedup is significantly greater
than that. However, in Redis land, under my past kingdom of many years
ago, the rule was that an improvement would produce speedups that are
*user facing*. This PR definitely qualifies.

What is interesting is that even with a JSON containing a single element
the speedup is of about 70%, so we are faster even in the worst case.

# Further info

Note that the new skipping parser, may happily process JSON objects that
are not perfectly valid, as soon as they look valid from the POV of
balancing [] and {} and so forth. This should not be an issue. Anyway
invalid JSON produces random results (the element is skipped at all even
if it would pass the filter).

Please feel free to ask me anything about the new implementation before
merging.
2025-05-05 09:52:42 +03:00
Pieter Cailliau d65102861f
Adding AGPLv3 as a license option to Redis! (#13997)
Read more about [the new license option](http://redis.io/blog/agplv3/)
and [the Redis 8 release](http://redis.io/blog/redis-8-ga/).
2025-05-01 14:04:22 +01:00
DvirDukhan 2c66059cde
RED-154147 make vectorset-commands.json coherent with the other command json file (#13980)
Used the augment agent to fix a given commands.json
Agent summary:

I've successfully fixed the `vectorset-commands.json` file to make it
coherent with the standard command files under `src/commands`. Here's a
summary of the changes I made:

1. Changed `type: "enum"` with `enum: ["TOKEN"]` to use the standard
format:
  - For fixed tokens: token: `"TOKEN"` and `type: "pure-token"`
  - For multiple choice options: `type: "oneof"` with nested arguments
2. Added missing fields to each command:
- `arity`: The number of arguments the command takes
- `function`: The C function that implements the command
- `command_flags`: Flags that describe the command's behavior
- Reorganized the structure to match the standard format:
3. Moved `group` and `since` to be consistent with other command files
- Properly structured the arguments with the correct types
4. Fixed the `multiple` attribute for parameters that can accept
multiple values
These changes make the vectorset-commands.json file consistent with the
standard command files under src/commands, while still keeping it as a
single file containing all the vector set commands as requested.
2025-04-27 15:38:18 +03:00
Salvatore Sanfilippo 41ecf7323e
Vector Sets: VISMEMBER and other fixes (#13941)
CI / test-ubuntu-latest (push) Failing after 31s Details
CI / test-sanitizer-address (push) Failing after 31s Details
CI / build-debian-old (push) Failing after 31s Details
CI / build-32bit (push) Failing after 31s Details
CI / build-libc-malloc (push) Failing after 31s Details
CI / build-centos-jemalloc (push) Failing after 31s Details
CI / build-old-chain-jemalloc (push) Failing after 31s Details
Codecov / code-coverage (push) Failing after 31s Details
Spellcheck / Spellcheck (push) Failing after 31s Details
CI / build-macos-latest (push) Has been cancelled Details
CodeQL / Analyze (cpp) (push) Failing after 31s Details
External Server Tests / test-external-standalone (push) Failing after 31s Details
External Server Tests / test-external-nodebug (push) Failing after 32s Details
Coverity Scan / coverity (push) Has been skipped Details
External Server Tests / test-external-cluster (push) Failing after 1m50s Details
In this PR there is also a VADD leak fixed (when wrong arity is
reported) and improvements to the test.
2025-04-15 22:58:57 +03:00
Salvatore Sanfilippo 96a0cfdea2
Vectror Sets: build fixes for the w2v test (#13919)
CI / test-ubuntu-latest (push) Failing after 32s Details
CI / build-debian-old (push) Failing after 32s Details
CI / build-32bit (push) Failing after 32s Details
CI / build-libc-malloc (push) Failing after 32s Details
CI / build-centos-jemalloc (push) Failing after 32s Details
CI / build-old-chain-jemalloc (push) Failing after 32s Details
Codecov / code-coverage (push) Failing after 33s Details
CI / test-sanitizer-address (push) Failing after 1m16s Details
Spellcheck / Spellcheck (push) Failing after 32s Details
External Server Tests / test-external-standalone (push) Failing after 31s Details
Coverity Scan / coverity (push) Has been skipped Details
External Server Tests / test-external-cluster (push) Failing after 32s Details
External Server Tests / test-external-nodebug (push) Failing after 32s Details
CI / build-macos-latest (push) Has been cancelled Details
Hi, this fixes building Vector Sets as modules. Right now the module
builds but there are issues with w2v. This PR should fix the problem.
Thanks.
2025-04-09 14:39:33 +03:00
YaacovHazan 41b1b5df18 Add vector-sets module
The vector-sets module is a part of Redis Core and is available by default,
just like any other data type in Redis.

As a result, when building Redis from the source, the vector-sets module
is also compiled as part of the Redis binary and loaded at server start-up.

This new data type added as a preview currently doesn't support
all the capabilities in Redis like:
32-bit OS
C99
Short-read that might end with memory leak
AOF rewirte
defrag
2025-04-02 15:06:24 +00:00
YaacovHazan 78e0d87177 Add 'modules/vector-sets/' from commit 'c6db0a7c20ff5638f3a0c9ce9c106303daeb2f67'
git-subtree-dir: modules/vector-sets
git-subtree-mainline: 8ea8f4220c
git-subtree-split: c6db0a7c20
2025-04-02 16:34:28 +03:00