Rosetta DB YAML Config

YAML Config File

Rosetta by default expects the YAML config file to be named main.conf and looks for it by default in the current folder. The configuration file can be overridden by using the --config, -c command line argument (see Command Line Arguments below for more details).

Here is the list of available configurations in the main.conf file:

connections:
  # The name of the connection
  - name: bigquery_prod

    # The name of the default database to use
    databaseName: bigquery-public-data

    # The name of the default schema to use
    schemaName: breathe

    # The type of the database
    dbType: bigquery

    # The connection uri for the database
    url: jdbc:bigquery://[Host]:[Port];ProjectId=[Project];OAuthType= [AuthValue];[Property1]=[Value1];[Property2]=[Value2];...

    # The name of the database user
    userName: user

    # The password of the database user
    password: password

    # The name of tables to include which is optional
    tables:
      - table_one
      - table_two

In the YAML config file you can also use environment variables. An example usage of environment variables in config file:

connections:
  - name: snowflake_weather_prod
    databaseName: SNOWFLAKE_SAMPLE_DATA
    schemaName: WEATHER
    dbType: snowflake
    url: jdbc:snowflake://<account_identifier>.snowflakecomputing.com/?<connection_params>
    userName: ${USER}
    password: ${PASSWORD}

Example connection string configurations for databases

BigQuery (service-based authentication OAuth 0)

url: jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<PROJECT-ID>;AdditionalProjects=bigquery-public-data;OAuthType=0;OAuthServiceAcctEmail=<EMAIL>;OAuthPvtKeyPath=<SERVICE-ACCOUNT-PATH>

BigQuery (pre-generated token authentication OAuth 2)

jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=2;ProjectId=<PROJECT-ID>;OAuthAccessToken=<ACCESS-TOKEN>;OAuthRefreshToken=<REFRESH-TOKEN>;OAuthClientId=<CLIENT-ID>;OAuthClientSecret=<CLIENT-SECRET>;

BigQuery (application default credentials authentication OAuth 3)

jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=3;ProjectId=<PROJECT-ID>;

Snowflake

url: jdbc:snowflake://<HOST>:443/?db=<DATABASE>&user=<USER>&password=<PASSWORD>

PostgreSQL

url: jdbc:postgresql://<HOST>:5432/<DATABASE>?user=<USER>&password=<PASSWORD>

MySQL

url: jdbc:mysql://<USER>:<PASSWORD>@<HOST>:3306/<DATABASE>

Kinetica

url: jdbc:kinetica:URL=http://<HOST>:9191;CombinePrepareAndExecute=1

Google Cloud Spanner

url: jdbc:cloudspanner:/projects/my-project/instances/my-instance/databases/my-db;credentials=/path/to/credentials.json

Google CLoud Spanner (Emulator)

url: jdbc:cloudspanner://localhost:9010/projects/test/instances/test/databases/test?autoConfigEmulator=true

SQL Server

url: jdbc:sqlserver://<HOST>:1433;databaseName=<DATABASE>

DB2

url: jdbc:db2://<HOST>:50000;<DATABASE>

ORACLE

url: jdbc:oracle:thin:<HOST>:1521:<SID>

Translation

This module will read the database structure from the source and map it to a target type. For example, source metadata was BigQuery and we want to convert it to Snowflake. This will be done by using a CSV file that contain mappings like in the following example:

344;;bigquery;;string;;snowflake;;string
345;;bigquery;;timestamp;;snowflake;;timestamp
346;;bigquery;;int64;;snowflake;;integer
347;;bigquery;;float64;;snowflake;;float
348;;bigquery;;array;;snowflake;;array
349;;bigquery;;date;;snowflake;;date
350;;bigquery;;datetime;;snowflake;;datetime
351;;bigquery;;boolean;;snowflake;;boolean
352;;bigquery;;time;;snowflake;;time
353;;bigquery;;geography;;snowflake;;geography
354;;bigquery;;numeric;;snowflake;;numeric
355;;bigquery;;bignumeric;;snowflake;;number
356;;bigquery;;bytes;;snowflake;;binary
357;;bigquery;;struct;;snowflake;;object

Using external translator

RosettaDB allows users to use their own translator. For the supported databases you can extend or create your version of translation CSV file. To use an external translator you need to set the EXTERNAL_TRANSLATION_FILE ENV variable to point to the external file.

Set the ENV variable EXTERNAL_TRANSLATION_FILE to point to the location of your custom translator CSV file.

export EXTERNAL_TRANSLATION_FILE=<path_to_csv_translator>

example:

export EXTERNAL_TRANSLATION_FILE=/Users/adaptivescale/translation.csv

Make sure you keep the same format as the CSV example given above.

Translation Attributes

Rosetta uses an additional file to maintain translation specific attributes. It stores translation_id, the attribute_name and attribute_value:

1;;302;;columnDisplaySize;;38
2;;404;;columnDisplaySize;;30
3;;434;;columnDisplaySize;;17

The supported attribute names are: - ordinalPosition - autoincrement - nullable - primaryKey - primaryKeySequenceId - columnDisplaySize - scale - precision

Set the ENV variable EXTERNAL_TRANSLATION_ATTRIBUTE_FILE to point to the location of your custom translation attribute CSV file.

export EXTERNAL_TRANSLATION_ATTRIBUTE_FILE=<path_to_csv_translator>

example:

export EXTERNAL_TRANSLATION_ATTRIBUTE_FILE=/Users/adaptivescale/translation_attributes.csv

Make sure you keep the same format as the CSV example given above.

Indices (Index)

Indices are supported in Google Cloud Spanner. An example on how they are represented in model.yaml

tables:
- name: "ExampleTable"
  type: "TABLE"
  schema: ""
  indices:
  - name: "PRIMARY_KEY"
    schema: ""
    tableName: "ExampleTable"
    columnNames:
    - "Id"
    - "UserId"
    nonUnique: false
    indexQualifier: ""
    type: 1
    ascOrDesc: "A"
    cardinality: -1
  - name: "IDX_ExampleTable_AddressId_299189FB00FDAFA5"
    schema: ""
    tableName: "ExampleTable"
    columnNames:
    - "AddressId"
    nonUnique: true
    indexQualifier: ""
    type: 2
    ascOrDesc: "A"
    cardinality: -1
  - name: "TestIndex"
    schema: ""
    tableName: "ExampleTable"
    columnNames:
    - "ClientId"
    - "DisplayName"
    nonUnique: true
    indexQualifier: ""
    type: 2
    ascOrDesc: "A"
    cardinality: -1