-
-
Notifications
You must be signed in to change notification settings - Fork 2k
MDEV-38180: Added XXH3 and XXH32 (64 bit) sql functions #4500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: bb-main-mdev-9826-v4
Are you sure you want to change the base?
MDEV-38180: Added XXH3 and XXH32 (64 bit) sql functions #4500
Conversation
PARTITION BY [LINEAR] KEY ALGORITHM={MYSQL51|MYSQL55|BASE31|CRC32C|XXH32|XXH3}
- The BASE31 algorithm uses a base-31 representation of the bytes, see
Modular hashing in https://algs4.cs.princeton.edu/34hash/. It serves
as a simple baseline
- CRC32C uses my_crc32c.
- XXH32 and XXH3 are xxhash algorithms - xxhash.h copied from latest
release (0.8.3) of https://github.com/Cyan4973/xxHash
For performance (esp. xxh) we use one-shot hash functions in binary
hash_sort, and streaming hash function otherwise for byte-by-byte
hashing. XXH is the only stateful hash function. The other hash
algorithms are stateless and homomorphic, so streaming and one-shot
functions are identical, which is reflected in the logic of fallback
from NULL m_hash_byte to m_hash_str applied to a byte.
Tested with
mtr --suite main --do-test=.*partition
mtr --suite parts
Also ran above tests with the following patch that changes the default
algorithm from MYSQL55 to CRC32C, and XXH32 (changing the patch
accordingly)
modified sql/ha_partition.cc
@@ -10336,6 +10336,8 @@ uint32 ha_partition::calculate_key_hash_value(Field **field_array)
switch ((*field_array)->table->part_info->key_algorithm)
{
case partition_info::KEY_ALGORITHM_NONE:
+ hasher.set_algorithm(HASH_ALGORITHM_CRC32C);
+ break;
case partition_info::KEY_ALGORITHM_55:
/* Hasher default to mysql55 */
break;
modified sql/partition_info.cc
@@ -2328,7 +2328,7 @@ bool partition_info::fix_parser_data(THD *thd)
if ((thd_sql_command(thd) == SQLCOM_CREATE_TABLE ||
thd_sql_command(thd) == SQLCOM_ALTER_TABLE) &&
key_algorithm == KEY_ALGORITHM_NONE)
- key_algorithm= KEY_ALGORITHM_55;
+ key_algorithm= PARTITION_INFO_DEFAULT_ALGORITHM;
}
DBUG_RETURN(FALSE);
}
@@ -2344,7 +2344,7 @@ bool partition_info::fix_parser_data(THD *thd)
if ((thd_sql_command(thd) == SQLCOM_CREATE_TABLE ||
thd_sql_command(thd) == SQLCOM_ALTER_TABLE) &&
key_algorithm == KEY_ALGORITHM_NONE)
- key_algorithm= KEY_ALGORITHM_55;
+ key_algorithm= PARTITION_INFO_DEFAULT_ALGORITHM;
}
defined_max_value= FALSE; // in case it already set (CREATE TABLE LIKE)
do
modified sql/partition_info.h
@@ -446,6 +446,8 @@ class partition_info : public DDL_LOG_STATE, public Sql_alloc
int gen_part_type(THD *thd, String *str) const;
};
+#define PARTITION_INFO_DEFAULT_ALGORITHM partition_info::KEY_ALGORITHM_CRC32C
+
void part_type_error(THD *thd, partition_info *work_part_info,
const char *part_type, partition_info *tab_part_info);
modified sql/sql_partition.cc
@@ -2471,7 +2471,7 @@ static int add_key_with_algorithm(String *str, const partition_info *part_info)
err+= str->append(STRING_WITH_LEN("KEY "));
if (part_info->key_algorithm != partition_info::KEY_ALGORITHM_NONE &&
- part_info->key_algorithm != partition_info::KEY_ALGORITHM_55)
+ part_info->key_algorithm != PARTITION_INFO_DEFAULT_ALGORITHM)
{
err+= str->append(STRING_WITH_LEN("ALGORITHM = "));
switch (part_info->key_algorithm)
@@ -2479,6 +2479,9 @@ static int add_key_with_algorithm(String *str, const partition_info *part_info)
case partition_info::KEY_ALGORITHM_51:
err+= str->append(STRING_WITH_LEN("MYSQL51"));
break;
+ case partition_info::KEY_ALGORITHM_55:
+ err+= str->append(STRING_WITH_LEN("MYSQL55"));
+ break;
case partition_info::KEY_ALGORITHM_BASE31:
err+= str->append(STRING_WITH_LEN("BASE31"));
break;
|
@FooBarrior @mariadb-YuchenPei please review🙂 |
gkodinov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution!
This is a preliminary review.
Please consider filling up a complete specification of the function into the related jira. E.g.: How does it calculate the hash for various data types, how does collation matter for strings, type and number of arguments, nullability, expected data type etc.
Please also find one additional optional arguments suggestion below.
| return 0; | ||
| } | ||
|
|
||
| uint32_t h= XXH32((const void*) res->ptr(), res->length(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, PHP (https://php.watch/versions/8.1/xxHash) supports passing "secret" and "seed" to the XXHASH function. Should you consider the same?
These can be optional too.
e781223 to
d366d55
Compare
Description
Adds native SQL functions
XXH32()andXXH3()exposing xxHash algorithms.Release Notes
Added XXH32() and XXH3() SQL functions that return 32‑bit and 64‑bit xxHash digests (unsigned integers). NULL or empty input returns SQL NULL.
How can this PR be tested?
./mysql-test/mtr main.func_xxh
If the changes are not amenable to automated testing, please explain why not and carefully describe how to test manually.
Basing the PR against the correct MariaDB version
mainbranch.PR quality check