MySQL - How does InnoDB store records

Where is data stored? How is it stored? How does MySQL access data?

What is InnoDB?

InnoDB is a storage engine that reads data from the disk and stores data into the disk. However, the process of data manipulation actually happens in the RAM. The time consumes the most is the disk I/O, the time that InnoDB accesses the disk that it manages. So when we want to fetch records from a table, does InnoDB need to read the records one by one from the disk? No, that would be extremely slow. InnoDB takes the approach of dividing data into pages, the basic unit of interaction between disk and RAM. The size of a page is typically 16KB. That is to say, at least 16KB of contents is read from the disk into the RAM at a time, and at least 16KB of contents is flushed back to the disk at a time.

InnoDB row format

How does InnoDB store records?

We usually insert data into table as a piece of record. The way records are stored in the disk has various formats. Developers who designed InnoDB have come up with 4 types of row formats, COMPACT, REDUNDANT, DYNAMIC, and COMPRESSED. They are essentially the same in principle.

We can specify the row format by:

1
2
3
CREATE TABLE TABLE_NAME ROW_FORMAT = NAME_OF_THE_ROW_FORMAT
-- or
ALTER TABLE TABLE_NAME ROW_FORMAT = NAME_OF_THE_ROW_FORMAT

or for example:

1
2
3
4
5
6
mysql> CREATE TABLE record_format_demo (
-> c1 VARCHAR(10),
-> c2 VARCHAR(10) NOT NULL,
-> c3 CHAR(10),
-> c4 VARCHAR(10)
-> ) CHARSET=ascii ROW_FORMAT=COMPACT;

Let’s insert some dummy data to better understand what is going on

1
INSERT INTO record_format_demo(c1, c2, c3, c4) VALUES('aaaa', 'bbb', 'cc', 'd'), ('eeee', 'fff', NULL, NULL);

Now the table looks like:

1
2
3
4
5
6
7
mysql> SELECT * FROM record_format_demo;
+------+-----+------+------+
| c1 | c2 | c3 | c4 |
+------+-----+------+------+
| aaaa | bbb | cc | d |
| eeee | fff | NULL | NULL |
+------+-----+------+------+

Now let’s see how the records look like in the COMPACT format.

COMPACT

1
2
3
4
5
6
|                                                                  |                             |
|<-------------- additional information -------------------------->|<-------- actual data ------>|
| | |
+-------------------------------------+-------------------+--------+-------+-------+-----+-------+
| variable-length field length list | null value list | header | row 1 | row 1 | ... | row n |
+-------------------------------------+-------------------+--------+-------+-------+-----+-------+

As you can see from the table above, a record can be divided into 2 parts: additional information about the record and the actual data in the record.

Additional information field describes the actual data: the length of each variable-length string (VARCHAR, TEXT, BLOB, etc), null columns, and a header.

  • Variable-length field length list
    We know that MySQL supports some variable-length data types, such as VARCHAR, VARBINARY, and TEXT. We call them variable-length strings, meaning that the number of bytes it occupies on the disk can be as large as you want.

    In the COMPACT format, the bytes each variable-length string occupied is stored at the beginning of a record, thereby forming a list of lengths. One thing to emphasize is – they are stored in reverse order.

  • Null value list
    We know that some columns in a table may store NULL values. It is not efficient to store these NULL values in the real data set. Therefore, the COMPACT format manages columns with NULL values in one place, and it is stored as a null list. Let’s see how it processes null values:

    1. determine columns that are allowed to store NULL. Primary keys, candidate keys, and columns described by NOT NULL are not allowed to store null values, therefore InnoDB will exclude them during the process.

    2. columns that allow NULL will have a corresponding bit describing whether it currently stores NULL or not. 1 means NULL, 0 otherwise.

  • Header
    In addition to the variable-length string list and the null list, there is also a 40 bytes header that describes the record, such as type of the record, position of the next record, and whether the record has been deleted or not.

Let’s take the first record from record_format_demo for an example. Because c1, c2, and c4 columns are of type VARCHAR(10), which is a variable-length data type, the lengths of the 3 columns are saved at the beginning of the record, so:

1
2
3
4
5
6
7
8
9
+--------+---------+--------------+--------------+
| column | content | length (dec) | length (hex) |
+--------+---------+--------------+--------------+
| c1 | aaaa | 4 | 0x04 |
+--------+---------+--------------+--------------+
| c2 | bbb | 3 | 0x03 |
+--------+---------+--------------+--------------+
| c4 | d | 1 | 0x01 |
+--------+---------+--------------+--------------+

And because these length values need to be stored in reverse order, so they are expressed as:

1
01 03 04

MySQL requires the null list to occupy a whole byte(s). If the number of NULL values are not enough to occupy the entire byte, add 0s to the beginning. In the example, c1, c3, and c4 are allowed to be null, then the field becomes:

1
2
3
          c4 c3 c1
0 0 0 0 0 0 0 0
// in reverse order

Combine them as a whole, the record becomes (in hex):

1
2
3
4
5
6
|                       |                                                                            |
|<-- additional info -->|<----------------------------- actual data -------------------------------->|
| | |
+--------+--------------+-------------+-------------+----------+-------------------------------+-----+
| 010304 | 00 | header | hidden cols | 61 61 61 61 | 62 62 62 | 63 63 20 20 20 20 20 20 20 20 | 64 |
+--------+--------------+-------------+-------------+----------+-------------------------------+-----+

A few things worth pointing out:

  1. InnoDB defines 3 pieces of hidden columns

    • DB_ROW_ID: (Row ID) The strategy InnoDB uses to generate a primary key is to search user-defined primary key first. If user does not define the primary key, InnoBB will select a column that could uniquely identify the rows and use it as the primary key. If none of the columns work out, InnoDB will, by default, adds a hidden column named DB_ROW_ID as the primary key.
    • DB_TRX_ID: (Transaction ID) This is an identifier for the last transaction that InnoDB either inserted or updated a row.
    • DB_ROLL_PTR: (Roll Pointer) A pointer to the roll back data.

      We don’t need to worry about these three rows since InnoDB will handle them internally.

  2. Hex

    • This table uses ASCII character set, so 0x61616161 represents the string aaaa, 0x626262 represents the string bbb, and so on.
    • c3 occupies more spaces because its data type is CHAR(10), meaning that it has a fixed size of 10 bytes. All other 8 bytes besides 0x6363 are filled up with space characters, which is 0x20.

REDUNDANT

The REDUNDANT format is an old format that is widely used until MySQL 5.0. It has the following structure:

1
2
3
4
5
|<-------- additional information ------>|<----- actual data --------->|
| | |
+-------------------------------+----------------+-------+-----+-------+
| offsets | header | row 1 | row 1 | ... | row n |
+-------------------------------+----------------+-------+-----+-------+
  • How it is different from COMPACT

    Note that the COMPACT format begins with a variable-length field length list, while the REDUNDANT format begins with a list of offsets.

  • How to calculate offsets
    Take the first row as an example.

    1
    2
    3
    4
    5
    6
    7
    DB_ROW_ID (row ID) takes (0x06 - 0x00) bytes, 6 bytes
    DB_TRX_ID (transaction ID) takes (0x0c - 0x06) bytes, 6 bytes
    DB_ROLL_PTR (roll pointer) takes (0x13 - 0x0c) bytes, 7 bytes
    c1 takes (0x17 - 0x13), 4 bytes
    c2 takes (0x1A - 0x17), 3 bytes
    c3 takes (0x24 - 0x1A), 10 bytes
    c4 takes (0x25 - 0x24), 1 bytes

    Combines the offsets together:

    1
    25 24 1A 17 13 0C 06

    Since InnoDB stores offsets also in reverse order, the offsets become:

    1
    06 0C 13 17 1A 24 25
  • How to handle NULL values
    Because the REDUNDANT format does not have a null list, it makes use of the first bit of an offset to determine if the corresponding column is null or not. 1 is null, 0 otherwise. This also means one byte holds up to 127 (0x7F, binary 01111111) values.

Summary

  1. Page is the basic unit of interaction between disk and RAM.
  2. InnoDB currently defines 4 types of row formats. They are COMPACT, REDUNDANT, DYNAMIC, and COMPRESSED.

References

  1. High Performace MySQL
  2. MySQL 5.7 Reference Manual
  3. MySQL Internals Manual