redis

Commit Graph

Author	SHA1	Message	Date
Salvatore Sanfilippo	8948a5d2b2	[Vector Sets] IN operator for string/string operands (#14122 ) This PR introduces "IN" overloading for strings in Vector Sets VSIM FILTER expressions. Now it is possible to do something like: "foo" IN "foobar" IN continues to work as usually if the second operand is an array, checking for membership of the left operand. Ping @rowantrollope that requested this feature. I'm evaluating if to add glob matching functionalities via the `=~` operator but I need to do an optimization round in our glob matching function probably. Glob matching can be slower, at the same time the complexity of the greedy search in the graph remains unchanged, so it may be a good idea to have it. Case insensitive search will be likely not be added however, since this would require handling unicode that is kinda outside the scope of Redis filters. The user is still able to perform `"foo" in "foobar" \|\| "FOO" in "foobar"` at least.	2025-06-26 10:13:54 +08:00
Salvatore Sanfilippo	11947d8892	[Vector sets] fast JSON filter (#13959 ) This PR replaces cJSON with an home-made parser designed for the kind of access pattern the FILTER option of VSIM performs on JSON objects. The main points here are: * cJSON forces us to parse the whole JSON, create a graph of cJSON objects, then we need to seek in O(N) to find the right field. * The cJSON object associated with the value is not of the same format as the expr.c virtual machine. We needed a conversion function doing more allocation and work. * Right now we only support top level fields in the JSON object, so a full parser is not needed. With all these things in mind, and after carefully profiling the old code, I realized that a specialized parser able to parse JSON in a zero-allocation fashion and only actually parse the value associated to our key would be much more efficient. Moreover, after this change, the dependencies of Vector Sets to external code drops to zero, and the count of lines of code is 3000 lines less. The new line count with LOC is 4200, making Vector Sets easily the smallest full featured implementation of a Vector store available. # Speedup achieved In a dataset with JSON objects with 30 fields, 1 million elements, the following query shows a 3.5x speedup: vsim vectors:million ele ele943903 FILTER ".field29 > 1000 and .field15 < 50" Please note that we get 3.5x speedup in the VSIM command itself. This means that the actual JSON parsing speedup is significantly greater than that. However, in Redis land, under my past kingdom of many years ago, the rule was that an improvement would produce speedups that are user facing. This PR definitely qualifies. What is interesting is that even with a JSON containing a single element the speedup is of about 70%, so we are faster even in the worst case. # Further info Note that the new skipping parser, may happily process JSON objects that are not perfectly valid, as soon as they look valid from the POV of balancing [] and {} and so forth. This should not be an issue. Anyway invalid JSON produces random results (the element is skipped at all even if it would pass the filter). Please feel free to ask me anything about the new implementation before merging.	2025-05-05 09:52:42 +03:00
Pieter Cailliau	d65102861f	Adding AGPLv3 as a license option to Redis! (#13997 ) Read more about [the new license option](http://redis.io/blog/agplv3/) and [the Redis 8 release](http://redis.io/blog/redis-8-ga/).	2025-05-01 14:04:22 +01:00
YaacovHazan	41b1b5df18	Add vector-sets module The vector-sets module is a part of Redis Core and is available by default, just like any other data type in Redis. As a result, when building Redis from the source, the vector-sets module is also compiled as part of the Redis binary and loaded at server start-up. This new data type added as a preview currently doesn't support all the capabilities in Redis like: 32-bit OS C99 Short-read that might end with memory leak AOF rewirte defrag	2025-04-02 15:06:24 +00:00
YaacovHazan	78e0d87177	Add 'modules/vector-sets/' from commit 'c6db0a7c20ff5638f3a0c9ce9c106303daeb2f67' git-subtree-dir: modules/vector-sets git-subtree-mainline: `8ea8f4220c` git-subtree-split: `c6db0a7c20`	2025-04-02 16:34:28 +03:00

5 Commits