Migrating Data with cbcopy

You can use the cbcopy utility to transfer data between databases in different SynxDB clusters.

cbcopy is a high-performance utility that can copy metadata and data from one SynxDB database to another SynxDB database. You can migrate the entire contents of a database, or just selected tables. The clusters can have different SynxDB versions. For example, you can use cbcopy to migrate data from a Greenplum version 4.3.26 (or later) system to a SynxDB 1 or 2 system, or from a SynxDB version 1 system to a SynxDB 2 system.

The cbcopy interface includes options to transfer one or more full databases, or one or more database tables. A full database transfer includes the database schema, table data, indexes, views, roles, user-defined functions, resource queues, and resource groups. If a copied table or database does not exist in the destination cluster, cbcopy creates it automatically, along with indexes as necessary.

Configuration files, including postgresql.conf and pg_hba.conf, must be transferred manually by an administrator. Extensions such as MADlib and programming language extensions must be installed in the destination database by an administrator.

cbcopy is a command-line tool that includes these features:

  • cbcopy can migrate data between systems where the source and destination systems are configured with a different number of segment instances.

  • cbcopy provides detailed reporting and summary information about all aspects of the copy operation.

  • cbcopy allows the source table data to change while the data is being copied. A lock is not acquired on the source table when data is copied.

  • The cbcopy utility includes the --truncate option to help migrate data from one system to another on the same hardware, requiring minimal free space available.

How does cbcopy work?

cbcopy_arch

Metadata migration

The metadata migration feature of cbcopy is based on gpbackup. Compared to the built-in pg_dump utility, cbcopy has the advantage of being able to retrieve metadata in batches isntead of only a few rows at a time. This batch processing approach significantly enhances performance, especially when handling large volumes of metadata, making it much faster than pg_dump.

Data migration

SynxDB supports starting programs via SQL commands, and cbcopy utilizes this feature. During data migration, it uses SQL commands to start a program on the target database to receive and load data, while simultaneously using SQL commands to start a program on the source database to unload data and send it to the program on the target database.

Migrating Data with cbcopy

Before migrating data, you need to copy cbcopy_helper to the $GPHOME/bin directory on all nodes of both the source and target databases. Then you need to find a host that can connect to both the source database and the target database, and use the cbcopy command on that host to initiate the migration. Note that database superuser privileges are required for both source and target databases to perform the migration.

By default, both metadata and data are migrated. You can use --metadata-only to migrate only metadata, or --data-only to migrate only data. As a best practice, migrate metadata first using --metadata-only, and then migmigrate data with --data-only. This two-step approach helps ensure a more controlled and reliable migration process.

Database version requirements

cbcopy relies on the “COPY ON SEGMENT” command of the database, so it has specific version requirements for the database.

  • GPDB 4.x - A minimum of GPDB version 4.3.17 or higher is required. If your version does not meet this requirement, you can upgrade to GPDB 4.3.17.
  • GPDB 5.x - A minimum of GPDB version 5.1.0 or higher is required. If your version does not meet this requirement, you can upgrade to GPDB 5.1.0.
  • GPDB 6.x - cbcopy is compatible with all versions of GPDB 6.x.
  • GPDB 7.x - cbcopy is compatible with all versions of GPDB 7.x.
  • CBDB 1.x - cbcopy is compatible with all versions of CBDB 1.x.

Migration Modes

cbcopy supports seven migration modes.

  • --full - Migrate all metadata and data from the source database to the target database.
  • --dbname - Migrate a specific database or multiple databases from the source to the target database.
  • --schema - Migrate a specific schema or multiple schemas from the source database to the target database.
  • --schema-mapping-file - Migrate specific schemas specified in a file from the source database to the target database.
  • --include-table - Migrate specific tables or multiple tables from the source database to the target database.
  • --include-table-file - Migrate specific tables specified in a file from the source database to the target database.
  • --global-metadata-only - Migrate global objects from the source database to the target database.

Data Loading Modes

cbcopy supports two data loading modes.

  • --append - Insert the migrated records into the table directly, regardless of the existing records.
  • --truncate - First, clear the existing records in the table, and then insert the migrated records into the table.

Object dependencies

If the tables you are migrating depend on certain global objects (such as tablespaces), there are two ways to handle this:

  1. Include the --with-global-metadata option (default: false) during migration, which will automatically create these global objects in the target database.

  2. If you choose not to use --with-global-metadata, you must manually create these global objects in the target database before running the migration. For example:

    -- If your tables use custom tablespaces, create them first:
    CREATE TABLESPACE custom_tablespace LOCATION '/path/to/tablespace';
    

If neither option is taken, the creation of dependent tables in the target database will fail with errors like “tablespace ‘custom_tablespace’ does not exist”.

Roles

If you want to change the ownership of the tables during migration without creating identical roles in the target database (by disabling the --with-global-metadata option), you need to:

  1. First create the target roles in the target database
  2. Use the --owner-mapping-file to specify the mapping between source and target roles

For example, if you have a mapping file with:

source_role1,target_role1
source_role2,target_role2

The migration process executes statements like:

ALTER TABLE table_name OWNER TO target_role1;

If the target role doesn’t exist in the target database, these ownership change statements will fail with an error like “role ‘target_role1’ does not exist”.

Tablespaces

cbcopy provides three ways to handle tablespace migration:

  1. Default Mode - When no tablespace options are specified, objects will be created in the same tablespace names as they were in the source database. You have two options to ensure the tablespaces exist in the target database:

    • Use --with-global-metadata to automatically create matching tablespaces
    • Manually create the tablespaces in the target database before migration:
      CREATE TABLESPACE custom_space LOCATION '/path/to/tablespace';
      
  2. Single Target Tablespace (--dest-tablespace) - Migrate all source database objects into a single specified tablespace on the target database, regardless of their original tablespace locations. For example:

    cbcopy --dest-tablespace=new_space ...
    
  3. Tablespace Mapping (--tablespace-mapping-file) - Map source tablespaces to different target tablespaces using a mapping file. This is useful when you want to maintain separate tablespaces or map them to different locations. The mapping file format is:

    source_tablespace1,target_tablespace1
    source_tablespace2,target_tablespace2
    

Note:

  • For the default mode, either use --with-global-metadata or ensure all required tablespaces exist in the target database before migration
  • If you need to migrate objects from different schemas into different tablespaces, you can either:
    1. Use --tablespace-mapping-file to specify all mappings at once
    2. Migrate one schema at a time using --dest-tablespace with different target tablespaces

Parallel Jobs

  • --copy-jobs - The maximum number of tables that concurrently copies.

Validate Migration

During migration, we will compare the number of rows returned by COPY TO from the source database (i.e., the number of records coming out of the source database) with the number of rows returned by COPY FROM in the target database (i.e., the number of records loaded in the target database). If the two counts do not match, the migration of that table will fail.

Copy Strategies

cbcopy internally supports three copy strategies for tables.

  • Copy On Coordinator - If the table’s statistics pg_class->reltuples is less than --on-segment-threshold, cbcopy will enable the Copy On Coordinator strategy for this table, meaning that data migration between the source and target databases can only occur through the coordinator node.
  • Copy On Segment - If the table’s statistics pg_class->reltuples is greater than --on-segment-threshold, and both the source and target databases have the same version and the same number of nodes, cbcopy will enable the Copy On Segment strategy for this table. This means that data migration between the source and target databases will occur in parallel across all segment nodes without data redistribution.
  • Copy on External Table - For tables that do not meet the conditions for the above two strategies, cbcopy will enable the Copy On External Table strategy. This means that data migration between the source and target databases will occur in parallel across all segment nodes with data redistribution.

Log Files and Migration Results

After cbcopy completes its execution, it generates several files in the $USER/gpAdminLogs directory:

  1. Log File

    • cbcopy_$timestamp.log - Contains all execution logs, including:
      • Debug messages
      • Error messages
      • Operation details
  2. Migration Result Files

    • cbcopy_succeed_$timestamp - Lists all successfully migrated tables
    • cbcopy_failed_$timestamp - Lists all tables that failed to migrate

These files are useful for:

  • Monitoring the migration process
  • Troubleshooting any issues
  • Planning retry attempts for failed migrations

Handling Failed Migrations

When a migration fails partially (some tables succeed while others fail), cbcopy generates two files:

  • cbcopy_succeed_$timestamp - Lists all successfully migrated tables
  • cbcopy_failed_$timestamp - Lists all tables that failed to migrate

For retry attempts, you can skip previously successful tables by using the success file:

cbcopy --exclude-table-file=cbcopy_succeed_$timestamp ...

This approach helps you:

  • Save time by not re-migrating successful tables
  • Reduce the risk of data inconsistency
  • Focus only on resolving failed migrations

Examples

Basic Migration

# Migrate specific schemas
cbcopy --with-global-metadata --source-host=127.0.0.1 \
    --source-port=45432 --source-user=gpadmin \
    --dest-host=127.0.0.1 --dest-port=55432 \
    --dest-user=cbdb --schema=source_db.source_schema \
    --dest-schema=target_db.target_schema \
    --truncate

cbcopy reference

See the cbcopy reference page for information about each command-line option.