Rosetta DB YAML Config
Rosetta DB YAML Config
YAML Config File
Rosetta by default expects the YAML config file to be named main.conf
and looks for it by default in the current folder. The configuration file can be overridden by using the --config, -c
command line argument (see Command Line Arguments below for more details).
Here is the list of available configurations in the main.conf
file:
connections:
# The name of the connection
- name: bigquery_prod
# The name of the default database to use
databaseName: bigquery-public-data
# The name of the default schema to use
schemaName: breathe
# The type of the database
dbType: bigquery
# The connection uri for the database
url: jdbc:bigquery://[Host]:[Port];ProjectId=[Project];OAuthType= [AuthValue];[Property1]=[Value1];[Property2]=[Value2];...
# The name of the database user
userName: user
# The password of the database user
password: password
# The name of tables to include which is optional
tables:
- table_one
- table_two
In the YAML config file you can also use environment variables. An example usage of environment variables in config file:
connections:
- name: snowflake_weather_prod
databaseName: SNOWFLAKE_SAMPLE_DATA
schemaName: WEATHER
dbType: snowflake
url: jdbc:snowflake://<account_identifier>.snowflakecomputing.com/?<connection_params>
userName: ${USER}
password: ${PASSWORD}
Example connection string configurations for databases
BigQuery (service-based authentication OAuth 0)
url: jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<PROJECT-ID>;AdditionalProjects=bigquery-public-data;OAuthType=0;OAuthServiceAcctEmail=<EMAIL>;OAuthPvtKeyPath=<SERVICE-ACCOUNT-PATH>
BigQuery (pre-generated token authentication OAuth 2)
jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=2;ProjectId=<PROJECT-ID>;OAuthAccessToken=<ACCESS-TOKEN>;OAuthRefreshToken=<REFRESH-TOKEN>;OAuthClientId=<CLIENT-ID>;OAuthClientSecret=<CLIENT-SECRET>;
BigQuery (application default credentials authentication OAuth 3)
jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=3;ProjectId=<PROJECT-ID>;
Snowflake
url: jdbc:snowflake://<HOST>:443/?db=<DATABASE>&user=<USER>&password=<PASSWORD>
PostgreSQL
url: jdbc:postgresql://<HOST>:5432/<DATABASE>?user=<USER>&password=<PASSWORD>
MySQL
url: jdbc:mysql://<USER>:<PASSWORD>@<HOST>:3306/<DATABASE>
Kinetica
url: jdbc:kinetica:URL=http://<HOST>:9191;CombinePrepareAndExecute=1
Google Cloud Spanner
url: jdbc:cloudspanner:/projects/my-project/instances/my-instance/databases/my-db;credentials=/path/to/credentials.json
Google CLoud Spanner (Emulator)
url: jdbc:cloudspanner://localhost:9010/projects/test/instances/test/databases/test?autoConfigEmulator=true
SQL Server
url: jdbc:sqlserver://<HOST>:1433;databaseName=<DATABASE>
DB2
url: jdbc:db2://<HOST>:50000;<DATABASE>
ORACLE
url: jdbc:oracle:thin:<HOST>:1521:<SID>
Translation
This module will read the database structure from the source and map it to a target type. For example, source metadata was BigQuery and we want to convert it to Snowflake. This will be done by using a CSV file that contain mappings like in the following example:
344;;bigquery;;string;;snowflake;;string
345;;bigquery;;timestamp;;snowflake;;timestamp
346;;bigquery;;int64;;snowflake;;integer
347;;bigquery;;float64;;snowflake;;float
348;;bigquery;;array;;snowflake;;array
349;;bigquery;;date;;snowflake;;date
350;;bigquery;;datetime;;snowflake;;datetime
351;;bigquery;;boolean;;snowflake;;boolean
352;;bigquery;;time;;snowflake;;time
353;;bigquery;;geography;;snowflake;;geography
354;;bigquery;;numeric;;snowflake;;numeric
355;;bigquery;;bignumeric;;snowflake;;number
356;;bigquery;;bytes;;snowflake;;binary
357;;bigquery;;struct;;snowflake;;object
Using external translator
RosettaDB allows users to use their own translator. For the supported databases you can extend or create your version
of translation CSV file. To use an external translator you need to set the EXTERNAL_TRANSLATION_FILE
ENV variable
to point to the external file.
Set the ENV variable EXTERNAL_TRANSLATION_FILE
to point to the location of your custom translator CSV file.
export EXTERNAL_TRANSLATION_FILE=<path_to_csv_translator>
example:
export EXTERNAL_TRANSLATION_FILE=/Users/adaptivescale/translation.csv
Make sure you keep the same format as the CSV example given above.
Translation Attributes
Rosetta uses an additional file to maintain translation specific attributes. It stores translation_id, the attribute_name and attribute_value:
1;;302;;columnDisplaySize;;38
2;;404;;columnDisplaySize;;30
3;;434;;columnDisplaySize;;17
The supported attribute names are: - ordinalPosition - autoincrement - nullable - primaryKey - primaryKeySequenceId - columnDisplaySize - scale - precision
Set the ENV variable EXTERNAL_TRANSLATION_ATTRIBUTE_FILE
to point to the location of your custom translation attribute CSV file.
export EXTERNAL_TRANSLATION_ATTRIBUTE_FILE=<path_to_csv_translator>
example:
export EXTERNAL_TRANSLATION_ATTRIBUTE_FILE=/Users/adaptivescale/translation_attributes.csv
Make sure you keep the same format as the CSV example given above.
Indices (Index)
Indices are supported in Google Cloud Spanner. An example on how they are represented in model.yaml
tables:
- name: "ExampleTable"
type: "TABLE"
schema: ""
indices:
- name: "PRIMARY_KEY"
schema: ""
tableName: "ExampleTable"
columnNames:
- "Id"
- "UserId"
nonUnique: false
indexQualifier: ""
type: 1
ascOrDesc: "A"
cardinality: -1
- name: "IDX_ExampleTable_AddressId_299189FB00FDAFA5"
schema: ""
tableName: "ExampleTable"
columnNames:
- "AddressId"
nonUnique: true
indexQualifier: ""
type: 2
ascOrDesc: "A"
cardinality: -1
- name: "TestIndex"
schema: ""
tableName: "ExampleTable"
columnNames:
- "ClientId"
- "DisplayName"
nonUnique: true
indexQualifier: ""
type: 2
ascOrDesc: "A"
cardinality: -1