A table is needed to de-duplicate user agent strings in the CheckUser tables. This is because, as described in T305930 and T326379, there is a lot of duplication in these columns. This has become even more of a problem since T295073: <Org-Wide Impact> Google Chrome User-Agent Deprecation Impact. For example, on enwiki there are about on average 200 rows to each distinct user agent string value. Some rough calculations suggest that by de-duplicating the column the cu_changes table on enwiki would be several gigabytes smaller.
We cannot use the comment table, because as described in T305930 Old UA strings should be removed. Therefore, we need a table that CheckUser solely controls and can perform delete operations on.
The schema for this proposed table looks like the following:
MariaDB [my_database]> describe cu_useragent; +-----------+---------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------+---------------------+------+-----+---------+----------------+ | cuua_id | bigint(20) unsigned | NO | PRI | NULL | auto_increment | | cuua_text | varbinary(255) | NO | MUL | NULL | | +-----------+---------------------+------+-----+---------+----------------+
The indexes for this proposed table are as follows:
MariaDB [my_database]> show indexes in cu_useragent; +--------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Ignored | +--------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+ | cu_useragent | 0 | PRIMARY | 1 | cuua_id | A | 0 | NULL | NULL | | BTREE | | | NO | | cu_useragent | 1 | cuua_text | 1 | cuua_text | A | 0 | NULL | NULL | | BTREE | | | NO | +--------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
Acceptance criteria
- Create the cu_useragent table