Feature Overview

SynxDB is one of the most advanced and mature open-source MPP databases available. Built on the latest PostgreSQL 14.4 kernel, it comes with multiple features, including high concurrency and high availability. It can perform quick and efficient computing for complex tasks, meeting the demands of managing and computing vast amounts of data. It is widely applied in multiple fields.

This document gives a general introduction to the features of SynxDB.

Efficient queries in different scenarios

  • SynxDB allows you to perform efficient queries in big data analysis environments and distributed environments:

    • Big data analysis environment: SynxDB uses the built-in PostgreSQL optimizer, which offers better support for distributed environments. This means that it can generate more efficient query plans when handling big data analysis tasks.

    • Distributed environment: Built in with the specially-adapted open-source GPORCA optimizer, SynxDB meets the query optimization needs in distributed environments.

  • Multiple technologies are used such as static and dynamic partition pruning, aggregate push-down, and join filtering to help you get the fastest and most accurate query results possible.

  • Both rule-based and cost-based query optimization methods are provided to help you generate more efficient query execution plans.

Polymorphic data storage

For different scenarios, SynxDB supports multiple storage formats, including Heap storage, AO row storage, and AOCS column storage. SynxDB also supports partitioned tables. You can define the partitioning of a table based on certain conditions. When executing a query, it automatically filters out the sub-tables that are not needed for the query to improve query efficiency.

  • Even data distribution: By using Hash and Random methods for data distribution, SynxDB takes better advantage of disk performance and solves I/O bottleneck issues.

  • Storage types:

    • Row-based storage: Suitable for scenarios where most fields are frequently queried, and there are many random row accesses.

    • Column-based storage: When you need to query a small number of fields, this method can greatly save I/O operations, making it ideal for scenarios where large amounts of data are accessed frequently.

  • Specialized storage modes: SynxDB has different storage modes such as Heap storage, AO row storage, AOCS column storage to optimize the performance of different types of applications. At the finest granularity level of partitioning, a table can have multiple storage modes.

  • Support for partitioned tables: You can define the partitioning of a table based on specific conditions. During querying, the system will automatically filter out the sub-tables that are not needed for the query to improve query efficiency.

  • Efficient data compression function: SynxDB supports multiple compression algorithms, such as Zlib 1-9 and Zstandard 1~19, to improve data processing performance and maintain a balance between CPU and compression ratio.

  • Data lifecycle management: SynxDB supports data lifecycle management, allowing users to define rules for regular data archiving. This feature, combined with data compression, helps manage storage costs and performance over the long term.

  • Optimization for small tables: You can choose to use the Replication Table and specify a custom Hash algorithm when creating the table, allowing for more flexible control of data distribution.

Multi-layer data security

SynxDB enhances user data protection by supporting function encryption and transparent data encryption (TDE). TDE means that the SynxDB kernel performs these processes invisibly to users. The data formats subject to TDE include Heap tables, AO row storage, and AOCS column storage. In addition to common encryption algorithms like AES, SynxDB also supports ISO/IEC secret algorithms, allowing seamless integration of your own algorithms into TDE process.

SynxDB focuses on data security and provides security protection measures. These security measures are designed to satisfy different database environment needs and offer multi-layer security protection:

  • Multi-tenant isolation: SynxDB supports multi-tenant isolation at both the database and schema levels. Data is not shared between databases, ensuring strong isolation in a multi-database environment. Within a database, each tenant can define their own schemas and set access permissions for database objects, providing granular control and security.

  • Internal data organization: The logical organization of data in the database includes data objects such as tables, views, indexes, and functions. Data access can be performed across schemas.

  • Data storage security: SynxDB offers different storage modes to support data redundancy. It uses encryption methods including AES 128, AES 192, AES 256, DES, and national secret encryption to secure data storage. It also supports ciphertext authentication, which includes encryption algorithms like SCRAM-SHA-256, MD5, LDAP, RADIUS.

  • Secure session management: To enhance security, SynxDB uses randomly generated session tokens, making sessions less predictable and more resistant to hijacking.

  • User data protection: SynxDB provides comprehensive data protection through function-level encryption and transparent data encryption (TDE). The encryption and decryption processes are handled by the SynxDB kernel, requiring no user interaction, and are supported for Heap tables, AO row storage, and AOCS column storage.

  • Key management: SynxDB supports multiple data encryption algorithms (such as pgcrypto and SSL/TLS) for data in transit and at rest. Users can create keys in an external Key Management System (KMS) and manage them independently, while SynxDB uses these keys for data encryption. In addition to common encryption algorithms like AES, SynxDB also supports ISO secret algorithms, allowing you to easily add your own algorithms into transparent data encryption. To further enhance key security, SynxDB also supports integration with third-party encryption key management systems.

  • Detailed permission settings: To satisfy different users and objects (like schemas, tables, rows, columns, views, functions), SynxDB provides a range of permission setting options, including SELECT, UPDATE, execution, and ownership.

  • Customizable disclaimer: Supports custom development of disclaimer functions related to data confidentiality to meet enterprise-specific compliance requirements.

  • Compliance with international standards: SynxDB complies with the Information Security Management Code of Conduct of ISO 27002:2005 (or subsequent versions).

  • Proactive security assurance: SynxDB undergoes regular vulnerability scanning, security assessments, and proactive vulnerability remediation to ensure the security of the platform.

  • Secure development lifecycle: The development process for SynxDB adheres to established software development standards, including the Software Development Lifecycle (SDLC), Process Model Maturity, ACS, and NIST frameworks to ensure a high level of security and quality. This includes conducting dependency package scanning to ensure that all security vulnerabilities in open-source components are addressed.

Data loading

SynxDB provides a series of efficient and flexible data loading solutions to meet various data processing needs, including parallel and persistent data loading, support for flexible data sources and file formats, integration of multiple ETL tools, and support for stream data loading and high-performance data access.

  • Parallel and persistent data loading: SynxDB supports massive parallel and persistent data loading through external table technology, and performs automatic conversion between character sets, such as from GBK to UTF-8. This feature makes data entry much smoother.

  • Flexible data source and file format support: SynxDB supports data sources such as external file servers, Hive, Hbase, HDFS or S3, and supports data formats such as CSV, Text, JSON, ORC, and Parquet. In addition, the database can also load compressed data files such as Zip.

  • Integrate multiple ETL tools: SynxDB is integrated with ETL tools such as DataStage, Informatica, and Kettle to facilitate data processing.

  • Support stream data loading: SynxDB can start multiple parallel read tasks for the subscribed Kafka topic, cache the read records, and load the records into the database via gpfdist after a certain time or number of records. This method can ensure the integrity of data without duplication or loss, and is suitable for stream data collection and real-time analysis scenarios. SynxDB supports data loading throughput of tens of millions per minute.

  • High-performance data access: PXF is a built-in component of SynxDB, which can map external data sources to external tables of SynxDB to achieve parallel and high-speed data access. PXF supports the management and access of hybrid data ecology and helps realize the Data Fabric architecture.

Multi-layer fault tolerance

To ensure data security and service continuity, SynxDB provides high availability and disaster recovery capabilities with a multi-level fault-tolerant mechanism from hardware to software. It consists of disk redundancy, data page checksums, segment mirroring, and coordinator high availability. The longest duration of business interruption varies for different failure scenarios.

  • Disk redundancy: At the hardware level, disk reliability is ensured through configurations such as RAID 5.

  • Data page checksum: In the underlying storage, SynxDB uses the checksum mechanism to detect bad blocks to ensure data integrity.

  • Segment high availability: Segment nodes can be deployed with mirror nodes. If a primary segment fails, the system automatically fails over to its corresponding mirror node to ensure service continuity. The longest duration of business interruption caused by segment node downtime and mirror node failover is in seconds.

  • Coordinator high availability: As the gateway to the SynxDB system, the coordinator node accepts client connections, processes SQL queries, and distributes tasks. To ensure high availability for this critical component, it can be deployed with a standby node to support client failover. The longest duration of business interruption caused by the coordinator node crashing and standby node failover is in minutes. The mechanism includes:

    • Active-standby configuration: The coordinator node has a standby node.

    • Fault monitoring: The system continuously monitors the status of the primary coordinator.

    • Automatic failover with VIP: In case of a failure, the system automatically fails over to the standby node. This process can include a virtual IP (VIP) failover, which typically completes in seconds.

    • Client reconnection: After a failover, client applications must reconnect to the newly promoted standby node (often via the VIP) to continue submitting queries.

  • Disaster recovery: For data center-level failures, the system supports failover to a backup data center. The longest duration of business interruption caused by the downtime of the data center and the failover to a backup data center is in hours.

  • Online operational changes: In SynxDB v4.0, implementation and configuration changes do not affect the normal operation of business services, ensuring operational continuity.

Rich data analysis support

SynxDB provides powerful data analysis features. These features make data processing, query and analysis more efficient, and meets multiple complex data processing, analysis and query requirements.

  • Parallel optimizer and executor: The SynxDB kernel has a built-in parallel optimizer and executor, which is not only compatible with the PostgreSQL ecosystem, but also supports data partition pruning and multiple indexing technologies (including B-Tree, Bitmap, Hash, Brin, GIN), and JIT (expression just-in-time compilation processing).

  • Machine learning components MADlib: SynxDB integrates MADlib components, providing users with fully SQL-driven machine learning features, enabling deep integration of algorithms, computing power, and data.

  • Support multiple programming languages: SynxDB provides developers with rich programming languages, including R, Python, Perl, Java, and PostgreSQL, so that they can easily write custom functions. For example, SynxDB allows execution of Java code within the database. However, this feature relies on JVM security policies, which must be configured by the user.

  • High-performance parallel computing based on MPP engine: The MPP engine of SynxDB supports high-performance parallel computing, seamlessly integrated with SQL, and can perform fast computing and analysis on SQL execution results.

    • Support for object storage: supports directly loading large-capacity geospatial data from object storage (OSS) into the database.

    • Comprehensive spatial data type support: including geometry, geography, and raster.

    • Spatio-temporal index: Provides spatio-temporal index technology, which can effectively accelerate spatial and temporal queries.

    • Complex spatial and geographic calculations: including sphere length calculations as well as spatial aggregation functions (such as contain, cover, intersect).

  • Text component: This component supports using ElasticSearch to accelerate file retrieval capabilities. Compared with traditional GIN data text query performance, this component has an order of magnitude improvement. It supports multiple word segmentation, natural language processing, and query result rendering.

Flexible workload management

SynxDB provides comprehensive workload management capabilities designed to effectively utilize and optimize database resources to ensure efficient and stable operations. Its workload management includes three levels of control: connection level management, session level management, and SQL level management.

  • Connection pool PGBouncer (connection-level management): Through the connection pool, SynxDB manages user access in a unified manner, and limits the number of concurrently active users to improve efficiency, and avoid wasting resources caused by frequently creating and destructing service processes. The connection pool has a small memory footprint and can support high concurrent connections, using libevent for Socket communication to improve communication efficiency.

  • Resource Group (session-level management): Through resource groups, SynxDB can analyze and categorize typical workloads, and quantify the CPU, memory, concurrency and other resources required by each workload. In this way, according to the actual requirements of the workload, you can set a suitable resource group and dynamically adjust the resource usage to ensure the overall operating efficiency. At the same time, you can use rules to clean up idle sessions and release unnecessary resources.

  • Dynamic resource group allocation (SQL-level management): Through dynamic resource group allocation, SynxDB can flexibly allocate resources before or during the execution of SQL statements, which can give priority to specific queries and shorten the execution time.

Advanced diagnostics and maintainability

To facilitate fault analysis and resolution, SynxDB provides advanced diagnostic capabilities:

  • Runtime log level adjustment: Supports the ability to switch process log levels at runtime through signals (traps), enabling detailed diagnostics without service interruption.

  • Selective process step execution: Supports the ability to exclude the execution of individual process steps when analyzing or fixing errors, which helps to isolate faults.

  • Configuration and version alignment alarms: (Planning) To ensure cluster stability and consistency, SynxDB will support alarms for misaligned configuration data and/or software versions. This feature will be configurable down to the individual parameter level, providing timely alerts when inconsistencies are detected across the cluster.

Multiple compatibility

The compatibility of SynxDB is reflected in multiple aspects such as SQL syntax, components, tools and programs, hardware platforms and operating systems. This makes the database flexible enough to deal with different tools, platforms and languages.

  • SQL compatibility: SynxDB is compatible with PostgreSQL and Greenplum syntax, supports SQL-92, SQL-99, and SQL 2003 standards, including SQL 2003 OLAP extensions, such as window functions, rollup, and cube.

  • Component compatibility: Based on the PostgreSQL 14.4 kernel, SynxDB is compatible with most of the PostgreSQL components and extensions commonly used.

  • Tool and program compatibility: Good connectivity with various BI tools, mining forecasting tools, ETL tools, and J2EE/.NET applications.

  • Hardware platform compatibility: Can run on a variety of hardware architectures, including X86, ARM, Phytium, Kunpeng, and Haiguang.

  • Operating system compatibility: Compatible with multiple operating system environments, such as CentOS and RHEL.