greenplum_fdw

The greenplum_fdw module is a foreign-data wrapper (FDW) that you can use to run queries between one or more SynxDB clusters.

The SynxDB greenplum_fdw module is an MPP extension of the PostgreSQL postgres_fdw module.

This topic includes the following sections:

Installing and Registering the Module

The greenplum_fdw module is installed when you install SynxDB. Before you can use this FDW, you must register the greenplum_fdw extension in each database in the local SynxDB cluster in which you plan to use it:

CREATE EXTENSION greenplum_fdw;

Refer to Installing Additional Supplied Modules for more information about installing and registering modules in SynxDB.

About Module Dependencies

greenplum_fdw depends on the gp_parallel_retrieve_cursor module.

Note You must register the gp_parallel_retrieve_cursor module in each remote SynxDB database with tables that you plan to access using the greenplum_fdw foreign-data wrapper.

About the greenplum_fdw Module

greenplum_fdw is an MPP version of the postgres_fdw foreign-data wrapper. While it behaves similarly to postgres_fdw in many respects, greenplum_fdw uses a SynxDB parallel retrieve cursor to pull data directly from the segments of a remote SynxDB cluster to the segments in the local SynxDB cluster, in parallel.

By supporting predicate pushdown, greenplum_fdw minimizes the amount of data transferred between the SynxDB clusters by sending a query filter condition to the remote SynxDB server where it is applied there.

Using the greenplum_fdw Module

You will perform the following tasks when you use greenplum_fdw to access data that resides in a remote SynxDB cluster(s):

  1. Create a server to represent each remote SynxDB database to which you want to connect.
  2. Create a user mapping for each (local) SynxDB user that you want to allow to access each server.
  3. Create a foreign table for each remote SynxDB table that you want to access.
  4. Construct and run queries.

Creating a Server

To access a remote SynxDB cluster, you must first create a foreign server object which specifies the host, port, and database connection details. You provide these connection parameters in the OPTIONS clause of the CREATE SERVER command.

A foreign server using the greenplum_fdw foreign-data wrapper accepts and disallows the same options as that of a foreign server using the postgres_fdw FDW; refer to the Connection Options topic in the PostgreSQL postgres_fdw documentation for more information about these options.

To obtain the full benefits of the parallel transfer feature provided by greenplum_fdw, you must also specify:

mpp_execute 'all segments'

and

num_segments '<num>'

in the OPTIONS clause when you create the server. Set num to the number of segments in the the remote SynxDB cluster. If you do not provide the

num_segments

option, the default value is the number of segments on the local SynxDB cluster.

The following example command creates a server named gpc1_testdb that will be used to access tables residing in the database named testdb on the remote 8-segment SynxDB cluster whose master is running on the host gpc1_master, port 5432:

CREATE SERVER gpc1_testdb FOREIGN DATA WRAPPER greenplum_fdw
    OPTIONS (host 'gpc1_master', port '5432', dbname 'testdb', mpp_execute 'all segments', num_segments '8');

Creating a User Mapping

After you identify which users you will allow to access the remote SynxDB cluster, you must create one or more mappings between a local SynxDB user and a user on the remote SynxDB cluster. You create these mappings with the CREATE USER MAPPING command.

User mappings that you create may include the following OPTIONS:

Option NameDescriptionDefault Value
userThe name of the remote SynxDB user to connect as.The name of the current (local) SynxDB user.
passwordThe password for user on the remote SynxDB system.No default value.

Only a SynxDB superuser may connect to a SynxDB foreign server without password authentication. Always specify the password option for user mappings that you create for non-superusers.

The following command creates a default user mapping on the local SynxDB cluster to the user named bill on the remote SynxDB cluster that allows access to the database identified by the gpc1_testdb server. Specifying the PUBLIC user name creates a mapping for all current and future users when no user-specific mapping is applicable.

CREATE USER MAPPING FOR PUBLIC SERVER gpc1_testdb
    OPTIONS (user 'bill', password 'changeme');

The remote user must have the appropriate privileges to access any table(s) of interest in the database identified by the specified SERVER.

If the mapping is used to access a foreign-data wrapper across multiple SynxDB clusters, then the remote user also requires SELECT access to the pg_catalog.gp_endpoints view. For example:

GRANT SELECT ON TABLE pg_catalog.gp_endpoints TO bill;

Creating a Foreign Table

You invoke the CREATE FOREIGN TABLE command to create a foreign table. The column data types that you specify when you create the foreign table should exactly match those in the referenced remote table. It is also recommended that the columns be declared with exactly the same collations, if applicable, as the referenced columns of the remote table.

Because greenplum_fdw matches foreign table columns to the remote table by name, not position, you can create a foreign table with fewer columns, or with a different column order, than the underlying remote table.

Foreign tables that you create may include the following OPTIONS:

Option NameDescriptionDefault Value
schema_nameThe name of the schema in which the remote SynxDB table resides.The name of the schema in which the foreign table resides.
table_nameThe name of the remote SynxDB table.The name of the foreign table.

The following command creates a foreign table named f_gpc1_orders that references a table named orders located in the public schema of the database identified by the gpc1_testdb server (testdb):

CREATE FOREIGN TABLE f_gpc1_orders ( id int, qty int, item text )
    SERVER gpc1_testdb OPTIONS (schema_name 'public', table_name 'orders');

You can additionally specify column name mappings via OPTIONS that you provide in the column declaration of the foreign table. The column_name option identifies the name of the associated column in the remote SynxDB table, and defaults to the foreign table column name when not specified.

Constructing and Running Queries

You SELECT from a foreign table to access the data stored in the underlying remote SynxDB table. By default, you can also modify the remote table using the INSERT command, provided that the remote user specified the user mapping has the privileges to perform these operations. (Refer to About the Updatability Option for information about changing the updatability of foreign tables.)

greenplum_fdw attempts to optimize remote queries to reduce the amount of data transferred from foreign servers. This is achieved by sending query WHERE clauses to the remote SynxDB server for execution, and by not retrieving table columns that are not needed for the current query. To reduce the risk of misexecution of queries, greenplum_fdw does not send WHERE clauses to the remote server unless they use only built-in data types, operators, and functions. Operators and functions in the clauses must be IMMUTABLE as well.

You can run the EXPLAIN VERBOSE command to examine the query that is actually sent to the remote SynxDB server for execution.

Additional Information

For more information about greenplum_fdw updatability and cost estimation options, connection management, and transaction management, refer to the individual topics below.

About the Updatability Option

By default, all foreign tables created with greenplum_fdw are assumed to be updatable. You can override this for a foreign server or a foreign table using the following option:

updatable : Controls whether greenplum_fdw allows foreign tables to be modified using the INSERT command. The default is true.

Setting this option at the foreign table-level overrides a foreign server-level option setting.

About the Cost Estimation Options

greenplum_fdw supports the same cost estimation options as described in the Cost Estimation Options topic in the PostgreSQL postgres_fdw documentation.

About Connection Management

greenplum_fdw establishes a connection to a foreign server during the first query on any foreign table associated with the server. greenplum_fdw retains and reuses this connection for subsequent queries submitted in the same session. However, if multiple user identities (user mappings) are used to access the foreign server, greenplum_fdw establishes a connection for each user mapping.

About Transaction Management

greenplum_fdw manages transactions as described in the Transaction Management topic in the PostgreSQL postgres_fdw documentation.

About Using Resource Groups to Limit Concurrency

You can create a dedicated user and resource group to manage greenplum_fdw concurrency on the remote SynxDB clusters. In the following example scenario, local cluster 2 reads data from remote cluster 1.

Remote cluster (1) configuration:

  1. Create a dedicated SynxDB user/role to represent the greenplum_fdw users on cluster 2 that initiate queries. For example, to create a role named gpcluster2_users:

    CREATE ROLE gpcluster2_users;
    
  2. Create a dedicated resource group to manage resources for these users:

    CREATE RESOURCE GROUP rg_gpcluster2_users with (concurrency=2, cpu_rate_limit=20, memory_limit=10);
    ALTER ROLE gpcluster2_users RESOURCE GROUP rg_gpcluster2_users;
    

    When you configure the remote cluster as described above, the rg_gpcluster2_users resource group manages the resources used by all queries that are initiated by gpcluster2_users.

Local cluster (2) configuration:

  1. Create a greenplum_fdw foreign server to access the remote cluster. For example, to create a server named gpc1_testdb that accesses the testdb database:

    CREATE SERVER gpc1_testdb FOREIGN DATA WRAPPER greenplum_fdw
        OPTIONS (host 'gpc1_master', port '5432', dbname 'testdb', mpp_execute 'all segments', );
    
  2. Map local users of the greenplum_fdw foreign server to the remote role. For example, to map specific users of the gpc1_testdb server on the local cluster to the gpcluster2_users role on the remote cluster:

    CREATE USER MAPPING FOR greenplum_fdw_user1 SERVER gpc1_testdb
        OPTIONS (user ‘gpcluster2_users’, password ‘changeme’);
    CREATE USER MAPPING FOR greenplum_fdw_user2 SERVER gpc1_testdb
        OPTIONS (user ‘gpcluster2_users’, password ‘changeme’);
    
  3. Create a foreign table referencing a table on the remote cluster. For example to create a foreign table that references table t1 on the remote cluster:

    CREATE FOREIGN TABLE table_on_cluster1 ( tc1 int )
      SERVER gpc1_testdb
      OPTIONS (schema_name 'public', table_name 't1', mpp_execute 'all segments');
    

All local queries on foreign table table_on_cluster1 are bounded on the remote cluster by the rg_gpcluster2_users resource group limits.

Known Issues and Limitations

The greenplum_fdw module has the following known issues and limitations:

  • The SynxDB Query Optimizer (GPORCA) does not support queries on foreign tables that you create with the greenplum_fdw foreign-data wrapper.
  • greenplum_fdw does not support UPDATE and DELETE operations on foreign tables.

Compatibility

You can use greenplum_fdw to access other remote SynxDB clusters .

Example

In this example, you query data residing in a database named rdb on the remote 16-segment SynxDB cluster whose master is running on host gpc2_master, port 5432:

  1. Open a psql session to the master host of the remote SynxDB cluster:

    psql -h gpc2_master -d rdb
    
  2. Register the gp_parallel_retrieve_cursor extension in the database if it does not already exist:

    CREATE EXTENSION IF NOT EXISTS gp_parallel_retrieve_cursor;
    
  3. Exit the session.

  4. Initiate a psql session to the database named testdb on the local SynxDB master host:

    $ psql -d testdb
    
  5. Register the greenplum_fdw extension in the database if it does not already exist:

    CREATE EXTENSION IF NOT EXISTS greenplum_fdw;
    
  6. Create a server to access the remote SynxDB cluster:

    CREATE SERVER gpc2_rdb FOREIGN DATA WRAPPER greenplum_fdw
        OPTIONS (host 'gpc2_master', port '5432', dbname 'rdb', mpp_execute 'all segments', num_segments '16');
    
  7. Create a user mapping for a user named jane on the local SynxDB cluster and the user named john on the remote SynxDB cluster and database represented by the server named gpc2_rdb:

    CREATE USER MAPPING FOR jane SERVER gpc2_rdb OPTIONS (user 'john', password 'changeme');
    
  8. Create a foreign table named f_gpc2_emea to reference the table named emea that is resides in the public schema of the database identified by the gpc2_rdb server (rdb):

    CREATE FOREIGN TABLE f_gpc2_emea( bu text, income int )
        SERVER gpcs2_rdb OPTIONS (schema_name 'public', table_name 'emea');
    
  9. Query the foreign table:

    SELECT * FROM f_gpc2_emea;
    
  10. Join the results of a foreign table query with a local table named amer that has similarly-named columns:

    SELECT amer.bu, amer.income as amer_in, f_gpc2_emea.income as emea_in
        FROM amer, f_gpc2_emea
        WHERE amer.bu = f_gpc2_emea.bu;