gpmapreduce

Runs SynxDB MapReduce jobs as defined in a YAML specification document.

Note SynxDB MapReduce is deprecated and will be removed in a future SynxDB release.

Synopsis

gpmapreduce -f <config.yaml> [dbname [<username>]] 
     [-k <name=value> | --key <name=value>] 
     [-h <hostname> | --host <hostname>] [-p <port>| --port <port>] 
     [-U <username> | --username <username>] [-W] [-v]

gpmapreduce -x | --explain 

gpmapreduce -X | --explain-analyze

gpmapreduce -V | --version 

gpmapreduce -h | --help 

Requirements

The following are required prior to running this program:

  • You must have your MapReduce job defined in a YAML file. See gpmapreduce.yaml for more information about the format of, and keywords supported in, the SynxDB MapReduce YAML configuration file.
  • You must be a SynxDB superuser to run MapReduce jobs written in untrusted Perl or Python.
  • You must be a SynxDB superuser to run MapReduce jobs with EXEC and FILE inputs.
  • You must be a SynxDB superuser to run MapReduce jobs with GPFDIST input unless the user has the appropriate rights granted.

Description

MapReduce is a programming model developed by Google for processing and generating large data sets on an array of commodity servers. SynxDB MapReduce allows programmers who are familiar with the MapReduce paradigm to write map and reduce functions and submit them to the SynxDB parallel engine for processing.

gpmapreduce is the SynxDB MapReduce program. You configure a SynxDB MapReduce job via a YAML-formatted configuration file that you pass to the program for execution by the SynxDB parallel engine. The SynxDB system distributes the input data, runs the program across a set of machines, handles machine failures, and manages the required inter-machine communication.

Options

-f config.yaml

Required. The YAML file that contains the SynxDB MapReduce job definitions. Refer to gpmapreduce.yaml for the format and content of the parameters that you specify in this file.

-? | –help

Show help, then exit.

-V | –version

Show version information, then exit.

-v | –verbose

Show verbose output.

-x | –explain

Do not run MapReduce jobs, but produce explain plans.

-X | –explain-analyze

Run MapReduce jobs and produce explain-analyze plans.

-k | –keyname=value

Sets a YAML variable. A value is required. Defaults to “key” if no variable name is specified.

Connection Options

-h host | –host host

Specifies the host name of the machine on which the SynxDB master database server is running. If not specified, reads from the environment variable PGHOST or defaults to localhost.

-p port | –port port

Specifies the TCP port on which the SynxDB master database server is listening for connections. If not specified, reads from the environment variable PGPORT or defaults to 5432.

-U username | –username username

The database role name to connect as. If not specified, reads from the environment variable PGUSER or defaults to the current system user name.

-W | –password

Force a password prompt.

Examples

Run a MapReduce job as defined in my_mrjob.yaml and connect to the database mydatabase:

gpmapreduce -f my_mrjob.yaml mydatabase

See Also

gpmapreduce.yaml