Use TDE to Encrypt Data

To meet the demand for protecting user data security, SynxDB introduces the Transparent Data Encryption (TDE) feature. TDE is a technology used by databases to encrypt data files. Literally, it can be broken down into three parts: data, encryption, and transparency. Data refers to database data. The files are ciphertext on disk and plaintext in memory. TDE solves the problem of protecting data at rest, also known as at-rest data encryption. Transparency means that the encryption is transparent to the user, who does not need to change their original operating habits. Both users and applications do not need to be concerned with key management or the encryption/decryption process.

Prerequisites

OpenSSL needs to be installed on the SynxDB nodes. Typically, Linux distributions come with OpenSSL pre-installed.

How it works

Basic concepts

Term

Description

DEK (Data Encryption Key)

A data encryption key that exists in the memory of the data warehouse compute nodes. It is generated by the database’s internal random number function and serves as the password for actual data encryption.

Plaintext DEK

Synonymous with DEK, can only be stored in memory.

Master Key

The master key, used to encrypt the DEK.

Ciphertext DEK

The ciphertext DEK is generated by encrypting the plaintext DEK with the Master Key and is persisted to unified storage.

Encryption algorithms

Cryptosystems can be divided into two categories in principle:

  • Single-key (symmetric) system: The encryption key and decryption key are the same.

  • Dual-key (asymmetric) (public-key) system: A public key is used for encryption and is public, while a private key is used for decryption and is private. Separating the encryption and decryption capabilities allows for one-to-many and many-to-one communication.

There are two ways to encrypt plaintext messages in a single-key system:

  • Stream cipher: The plaintext message is encrypted character by character.

  • Block cipher: The plaintext message is divided into blocks (containing multiple characters), and encrypted block by block.

SynxDB supports two block cipher algorithms: AES and SM4.

AES encryption algorithm

The internationally recognized block cipher algorithm is AES (Advanced Encryption Standard). A block cipher algorithm first needs to group data. The encryption length of AES is 128 bits, which is 16 bytes. AES has 3 key lengths: 128, 192, and 256 bits. AES has 5 encryption modes:

  • ECB mode: Electronic Code Book mode

  • CBC mode: Cipher Block Chaining mode

  • CFB mode: Cipher Feedback mode

  • OFB mode: Output Feedback mode

  • CTR mode: Counter mode

SM algorithm

SM stands for State Cryptography Administration, referring to Chinese domestic cryptographic algorithms. The main ones are SM1, SM2, SM3, and SM4. The key length and block length are both 128 bits.

  • SM1 is for symmetric encryption. Its encryption strength is comparable to AES. This algorithm is not public. To call this algorithm, you need to use the interface of an encryption chip.

  • SM2 is for asymmetric encryption, based on ECC. This algorithm is public. Because this algorithm is based on ECC, its signing speed and key generation speed are faster than RSA. The security strength of ECC 256-bit (SM2 uses a type of ECC 256-bit) is higher than RSA 2048-bit, and its operation speed is also faster than RSA.

  • SM3 is for message digests. You can understand it by comparing it to MD5. This algorithm is public. The verification result is 256 bits.

  • SM4 is a block data algorithm for wireless LAN standards. It is for symmetric encryption, and both the key length and block length are 128 bits.

Implementation of transparent data encryption

The transparent data encryption feature is mainly composed of two major modules: the key management module and the storage layer encryption module.

Key management module

The key management module has a two-tier architecture. We use encryption algorithms to encrypt data, and encryption algorithms require keys. To reduce the overhead of repeated encryption and decryption caused by key changes, the industry mostly adopts a multi-layer key management structure.

Two-tier key structure: The two-tier key structure includes a master key and a data encryption key (DEK). The master key is used to encrypt the DEK and is stored outside the database. Users can also create and manage master keys in an external Key Management System (KMS), and SynxDB uses these keys for storage encryption. The plaintext DEK is used to encrypt the database data. It can be encrypted with the master key to generate a ciphertext DEK, which is stored locally in the database.

How to use

SynxDB provides the transparent data encryption feature. Users only need to enable the TDE feature when deploying SynxDB. After that, the data encryption operations are transparent to the user and require no extra attention.

To enable TDE when deploying SynxDB, the user needs to specify the -T parameter when using gpinitsystem to initialize the database. SynxDB supports two encryption algorithms, AES and SM4. The methods to enable them are as follows:

  • Enable the TDE feature and specify the encryption algorithm as AES:

    gpinitsystem -c gpinitsystem_config -T AES256
    
  • Enable the TDE feature and specify the encryption algorithm as SM4:

    gpinitsystem -c gpinitsystem_config -T SM4
    

How to verify

The transparent data encryption feature means that encryption is transparent to the user. There is no perceptible difference in read and write operations whether the feature is enabled or disabled. The key file is very important data. If the key file is lost, the database will fail to start. This is because after the database starts, it loads the key into memory. Therefore, it is necessary to start and stop the database’s transparent data encryption feature. The key file path is located on the master node.

First, find the data directory of the master node, for example:

COORDINATOR_DATA_DIRECTORY=/home/gpadmin/work/data0/master/gpseg-1

Then find the key file:

[gpadmin@i-uetggb33 gpseg-1]$ pwd
/home/gpadmin/work/data0/master/gpseg-1
[gpadmin@i-uetggb33 gpseg-1]$ ls -l pg_cryptokeys/live/
total 8
-rw------- 1 gpadmin gpadmin 48 Apr 12 10:26 relation.wkey
-rw------- 1 gpadmin gpadmin 48 Apr 12 10:26 wal.wkey

Here, relation.wkey is the key used to encrypt data files, and wal.wkey is the key used to encrypt WAL logs. Currently, only relation.wkey is effective; WAL logs are not yet encrypted.

Verification process

  1. Create a table and insert data:

    postgres=# create table ao2 (id int) with(appendonly=true);
    NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the |product_name| data distribution key for this table.
    HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
    
    CREATE TABLE
    postgres=# insert into ao2 select generate_series(1,10);
    INSERT 0 10
    
  2. First, stop the running database with gpstop -a, then simulate the scenario of a lost key file:

    cd /home/gpadmin/work/data0/master/gpseg-1/pg_cryptokeys/
    # Move the key file to another directory
    mv live backup
    
  3. Try to start the database with gpstart -a.

    Because the key file is missing, the database fails to start and cannot be read, thus protecting the data. By observing the master node’s database log, you will find the following error log, which mentions encryption keys. Losing the key file is a very serious system error, and we should not allow the database to start successfully.

    2024-04-12 10:46:04.703250 CST,,,p24962,th-1601394560,,,,,0,,,seg-1,,,,,"FATAL","XX000","cluster has no data encryption keys",,,,,,,0,,"kmgr.c",298,"Stack trace:
    1    0x7fd399ee8a28 libpostgres.so errstart + 0x3b4
    2    0x7fd399ee8672 libpostgres.so errstart_cold + 0x20
    3    0x7fd399f8eb1e libpostgres.so InitializeKmgr + 0x222
    4    0x7fd399c27bf1 libpostgres.so PostmasterMain + 0x12ef
    5    0x40280b postgres <symbol not found> (main.c:289)
    6    0x7fd398abe555 libc.so.6 __libc_start_main + 0xf5
    7    0x402289 postgres <symbol not found> + 0x402289
    "
    
  4. Restore the key file:

    cd /home/gpadmin/work/data0/master/gpseg-1/pg_cryptokeys/
    # Move the key file back
    mv backup live
    
  5. Try to start the database with gpstart -a. It starts successfully, and data can be read successfully.

    postgres=# select * from ao2 order by id;
    id
    ----
      1
      2
      3
      4
      5
      6
      7
      8
      9
    10
    (10 rows)
    

Performance evaluation

Enabling the TDE encryption feature can improve the security of data at rest, but it will also affect the read and write performance of accessing the encrypted database. Please choose whether to enable the TDE encryption feature based on your actual situation. Below are the performance test data from TPC-H. The SM4 algorithm is a Chinese domestic algorithm and has a greater performance overhead. It is recommended to use AES256.

Test environment

Huawei Cloud ECS host, compute node with 16-core CPU / 32G memory / 200G SSD, with 3 segment nodes deployed.

Test data

Encryption algorithm

Dataset size

Storage type

Compression type

Query duration (s)

Performance overhead

None

5G

AO

None

648

0% (Baseline)

AES

5G

AO

None

658

1.5%

SM4

5G

AO

None

2079

220.8%

Encryption algorithm

Dataset size

Storage type

Compression type

Query duration (s)

Performance overhead

None

5G

AO

zstd

663

0% (Baseline)

AES

5G

AO

zstd

665

0.3%

SM4

5G

AO

zstd

4000

503%

Encryption algorithm

Dataset size

Storage type

Compression type

Query duration (s)

Performance overhead

None

10G

AO

None

1160

0% (Baseline)

AES256

10G

AO

None

1212

4.48%

SM4

10G

AO

None

4000

244%

Encryption algorithm

Dataset size

Storage type

Compression type

Query duration (s)

Performance overhead

None

5G

AOCS

None

552

0%

AES

5G

AOCS

None

570

3.2%

SM4

5G

AOCS

None

3578

548%