Execute Queries in Parallel
This document describes the use cases, methods, limitations, and frequently asked questions for executing queries in parallel in SynxDB. SynxDB introduces the parallel query feature, where the number of computing nodes (including the SeqScan
operator) is dynamically adjusted based on the data volume during query execution. Parallel query aims to improve query performance by utilizing multiple CPU cores to process a single query.
Use cases
Deploy a small number of segments on a single physical machine and improve performance by dynamically adjusting the degree of parallelism instead of deploying a large number of segments.
Enabling operator parallelism provides a performance advantage when the host CPU and disk load are not high.
How to use
SynxDB supports parallel queries on AO/AOCO tables and Heap tables.
Query heap tables in parallel
Before enabling the parallel query feature, you need to disable the GPORCA optimizer.
SET enable_parallel = ON; SET optimizer = OFF;
Set the maximum degree of parallelism.
-- The setting should consider the number of CPU cores and segments. SET max_parallel_workers_per_gather = 4; Query example: .. code:: sql CREATE TABLE t1 (c1 int,c2 int, c3 int, c4 box) distributed by (c1); INSERT INTO t1 SELECT x, 2*x, 3*x, box('6,6,6,6') FROM generate_series(1,1000000) AS x; SELECT count(*) from t1;
Query AO/AOCO tables in parallel
Before enabling the parallel query feature, you need to disable the GPORCA optimizer.
SET enable_parallel = ON; SET optimizer = OFF;
Set the maximum degree of parallelism.
-- The setting should consider the number of CPU cores and segments. SET max_parallel_workers_per_gather = 4; -- Set how many files to insert data into within one transaction. A larger value might decrease performance, especially for AOCO tables. SET gp_appendonly_insert_files = 8; -- Switch to the next file for data insertion every 100,000 rows. This can be adjusted to distribute data as evenly as possible across multiple files. SET gp_appendonly_insert_files_tuples_range = 100000;
Insert and query.
CREATE TABLE ao(x INT) USING ao_row; INSERT INTO ao SELECT i FROM generate_series(1, 1000000) i; -- It is recommended to update statistics first. ANALYZE ao; SELECT COUNT(*) FROM ao;
Parameter descriptions
Parameter |
Description |
Default |
Required |
Example |
---|---|---|---|---|
|
Whether to enable the parallel feature. |
|
Yes |
|
|
Whether to enable the GPORCA optimizer. |
|
Yes |
|
|
The number of files to insert data into within one transaction. |
|
No |
|
|
The number of rows after which to switch to the next file for data insertion. |
|
No |
|
Frequently asked questions
Currently, parallel execution is supported for queries containing the following operators. SynxDB does not currently support queries with other operators.
sequence scan index scan index only scan bitmap heap scan append hash join nestloop join merge join
Parallel query does not always improve query performance. An excessively high degree of parallelism can cause an overload, leading to a decrease in performance.
Enabling parallelism multiplies memory consumption, which might lead to “out of memory” errors.