Skip to content

RosettaDB Commands

Rosetta Commands

Available commands

  • init
  • extract
  • compile
  • dbt
  • diff
  • test
  • apply
  • generate

init

This command will generate a project (directory) if specified, a default configuration file located in the current directory with example connections for bigquery and snowflake, and the model directory.

rosetta init [PROJECT_NAME]
Parameter Description
(Optional) PROJECT_NAME Project name (directory) where the configuration file and model directory will be created.

Example:

#example with 2 connections
connections:
  - name: snowflake_weather_prod
    databaseName: SNOWFLAKE_SAMPLE_DATA
    schemaName: WEATHER
    dbType: snowflake
    url: jdbc:snowflake://<account_identifier>.snowflakecomputing.com/?<connection_params>
    userName: bob
    password: bobPassword
  - name: bigquery_prod
    databaseName: bigquery-public-data
    schemaName: breathe
    dbType: bigquery
    url: jdbc:bigquery://[Host]:[Port];ProjectId=[Project];OAuthType= [AuthValue];[Property1]=[Value1];[Property2]=[Value2];...
    userName: user
    password: password
    tables:
      - bigquery_table

extract

This is the command that extracts the schema from a database and generates declarative DBML models that can be used for conversion to alternate database targets.

rosetta [-c, --config CONFIG_FILE] extract [-h, --help] [-s, --source CONNECTION_NAME] [-t, --convert-to CONNECTION_NAME]
Parameter Description
-h, --help Show the help message and exit.
-c, --config CONFIG_FILE YAML config file. If none is supplied it will use main.conf in the current directory if it exists.
-s, --source CONNECTION_NAME The source connection name to extract schema from.
-t, --convert-to CONNECTION_NAME (Optional) The target connection name in which source DBML converts to.

Example:

---
safeMode: false
databaseType: bigquery
operationLevel: database
tables:
- name: "profiles"
  type: "TABLE"
  schema: "breathe"
  columns:
  - name: "id"
    typeName: "INT64"
    jdbcDataType: "4"
    ordinalPosition: 0
    primaryKeySequenceId: 1
    columnDisplaySize: 10
    scale: 0
    precision: 10
    primaryKey: false
    nullable: false
    autoincrement: true
  - name: "name"
    typeName: "STRING"
    jdbcDataType: "12"
    ordinalPosition: 0
    primaryKeySequenceId: 0
    columnDisplaySize: 255
    scale: 0
    precision: 255
    primaryKey: false
    nullable: false
    autoincrement: false

compile

This command generates a DDL for a target database based on the source DBML which was generated by the previous command (extract).

rosetta [-c, --config CONFIG_FILE] compile [-h, --help] [-t, --target CONNECTION_NAME] [-s, --source CONNECTION_NAME]
Parameter Description
-h, --help Show the help message and exit.
-c, --config CONFIG_FILE YAML config file. If none is supplied it will use main.conf in the current directory if it exists.
-s, --source CONNECTION_NAME (Optional) The source connection name where models are generated.
-t, --target CONNECTION_NAME The target connection name in which source DBML converts to.
-d, --with-drop Add query to drop tables when generating ddl.

Example:

CREATE SCHEMA breathe;
CREATE TABLE breathe.profiles(id INTEGER not null AUTO_INCREMENT, name STRING not null);

dbt

This is the command that generates dbt models for a source DBML which was generated by the previous command (extract).

rosetta [-c, --config CONFIG_FILE] dbt [-h, --help] [-s, --source CONNECTION_NAME]
Parameter Description
-h, --help Show the help message and exit.
-c, --config CONFIG_FILE YAML config file. If none is supplied it will use main.conf in the current directory if it exists.
-s, --source CONNECTION_NAME The source connection name where models are generated.

diff

Show the difference between the local model and the database. Check if any table is removed, or added or if any columns have changed.

rosetta [-c, --config CONFIG_FILE] diff [-h, --help] [-s, --source CONNECTION_NAME]
Parameter Description
-h, --help Show the help message and exit.
-c, --config CONFIG_FILE YAML config file. If none is supplied it will use main.conf in the current directory if it exists.
-s, --source CONNECTION_NAME The source connection is used to specify which models and connection to use.
-m, --model MODEL_FILE (Optional) The model file to use for apply. Default is model.yaml

Example:

There are changes between local model and targeted source
Table Changed: Table 'actor' columns changed
Column Changed: Column 'actor_id' in table 'actor' changed 'Precision'. New value: '1', old value: '5'
Column Changed: Column 'actor_id' in table 'actor' changed 'Autoincrement'. New value: 'true', old value: 'false'
Column Changed: Column 'actor_id' in table 'actor' changed 'Primary key'. New value: 'false', old value: 'true'
Column Changed: Column 'actor_id' in table 'actor' changed 'Nullable'. New value: 'true', old value: 'false'
Table Added: Table 'address'

test

This command runs tests for columns using assertions. Then they are translated into query commands, executed, and compared with an expected value. Currently supported assertions are: equals(=), not equals(!=), less than(<), more than(>), less than or equals(<=), more than or equals(>=), contains(in), is null, is not null, like, between. Examples are shown below:

rosetta [-c, --config CONFIG_FILE] test [-h, --help] [-s, --source CONNECTION_NAME]
Parameter Description
-h, --help Show the help message and exit.
-c, --config CONFIG_FILE YAML config file. If none is supplied it will use main.conf in the current directory if it exists.
-s, --source CONNECTION_NAME The source connection is used to specify which models and connections to use.

Note: Value for BigQuery Array columns should be comma separated value ('a,b,c,d,e').

Example:

---
safeMode: false
databaseType: "mysql"
operationLevel: database
tables:
  - name: "actor"
    type: "TABLE"
    columns:
      - name: "actor_id"
        typeName: "SMALLINT UNSIGNED"
        ordinalPosition: 0
        primaryKeySequenceId: 1
        columnDisplaySize: 5
        scale: 0
        precision: 5
        nullable: false
        primaryKey: true
        autoincrement: false
        tests:
          assertion:
            - operator: '='
              value: 16
              expected: 1
      - name: "first_name"
        typeName: "VARCHAR"
        ordinalPosition: 0
        primaryKeySequenceId: 0
        columnDisplaySize: 45
        scale: 0
        precision: 45
        nullable: false
        primaryKey: false
        autoincrement: false
        tests:
          assertion:
            - operator: '!='
              value: 'Michael'
              expected: 1

Output example:

Running tests for mysql. Found: 2

1 of 2, RUNNING test ('=') on column: 'actor_id'                                                    
1 of 2, FINISHED test on column: 'actor_id' (expected: '1' - actual: '1')  ......................... [PASS in 0.288s]
2 of 2, RUNNING test ('!=') on column: 'first_name'                                                 
2 of 2, FINISHED test on column: 'first_name' (expected: '1' - actual: '219')  ..................... [FAIL in 0.091s]

apply

Gets current model and compares with state of database, generates ddl for changes and applies to database.

rosetta [-c, --config CONFIG_FILE] apply [-h, --help] [-s, --source CONNECTION_NAME]
Parameter Description
-h, --help Show the help message and exit.
-c, --config CONFIG_FILE YAML config file. If none is supplied it will use main.conf in the current directory if it exists.
-s, --source CONNECTION_NAME The source connection is used to specify which models and connection to use.
-m, --model MODEL_FILE (Optional) The model file to use for apply. Default is model.yaml

Example:

(Actual database)

---
safeMode: false
databaseType: "mysql"
operationLevel: database
tables:
  - name: "actor"
    type: "TABLE"
    columns:
      - name: "actor_id"
        typeName: "SMALLINT UNSIGNED"
        ordinalPosition: 0
        primaryKeySequenceId: 1
        columnDisplaySize: 5
        scale: 0
        precision: 5
        nullable: false
        primaryKey: true
        autoincrement: false
        tests:
          assertion:
            - operator: '='
              value: 16
              expected: 1

(Expected database)

---
safeMode: false
databaseType: "mysql"
operationLevel: database
tables:
  - name: "actor"
    type: "TABLE"
    columns:
      - name: "actor_id"
        typeName: "SMALLINT UNSIGNED"
        ordinalPosition: 0
        primaryKeySequenceId: 1
        columnDisplaySize: 5
        scale: 0
        precision: 5
        nullable: false
        primaryKey: true
        autoincrement: false
        tests:
          assertion:
            - operator: '='
              value: 16
              expected: 1
      - name: "first_name"
        typeName: "VARCHAR"
        ordinalPosition: 0
        primaryKeySequenceId: 0
        columnDisplaySize: 45
        scale: 0
        precision: 45
        nullable: false
        primaryKey: false
        autoincrement: false
        tests:
          assertion:
            - operator: '!='
              value: 'Michael'
              expected: 1

Description: Our actual database does not contain first_name so we expect it to alter the table and add the column, inside the source directory there will be the executed DDL and a snapshot of the current database.

generate

This command will generate Spark Python (file) or Spark Scala (file), firstly it extracts a schema from a source database and gets connection properties from the source connection, then it creates a python (file) or scala (file) that translates schemas, which is ready to transfer data from source to target.

rosetta [-c, --config CONFIG_FILE] generate [-h, --help] [-s, --source CONNECTION_NAME] [-t, --target CONNECTION_NAME] [--pyspark] [--scala]
Parameter Description
-h, --help Show the help message and exit.
-c, --config CONFIG_FILE YAML config file. If none is supplied it will use main.conf in the current directory if it exists.
-s, --source CONNECTION_NAME The source connection name to extract schema from.
-t, --target CONNECTION_NAME The target connection name where the data will be transfered.
--pyspark Generates the Spark SQL file.
--scala Generates the Scala SQL file.