Prejoining

Prejoins are described as dictionaries with the following keys:

Key Data Type Explanation
extract_columns string | list of strings Single column or list of columns that will be extracted from the prejoin. If no alias is given the columns will have the same name as in the target-relation.
aliases (optional) string | list of strings Optionally, aliases for the extract_columns can be defined. If given, it needs to have the same number of columns as in extract_columns, otherwise a compilation error will be thrown.
this_column_name string | list of strings Specifies one or more columns within the source model of this stage, that should be used as the JOIN condition of the Prejoin. Can be multiple columns, but must match the number of columns defined in ‘ref_column_name’.
ref_column_name string | list of strings Specifies one or more columns within the referenced object of this prejoin, that should be used as the JOIN condition of the Prejoin. Can be multiple columns, but must match the number of columns defined in ‘this_column_name’.
ref_model string Name of the other dbt model that should be referred. Either this, or the parameters ‘src_name’ and ‘src_table’ must be defined.
src_name string If a dbt source should be prejoined, use this parameter to set the name of the source, as defined in the sources section of a yml file. Must be used together with ‘src_table’, and instead of ‘ref_model’.
src_table string If a dbt source should be prejoined, use this parameter to set the table of the source ‘src_name’, as defined in the sources section of a yml file. Must be used together with ‘src_name’, and instead of ‘ref_model’.
operator (optional) string Only used when multiple columns are defined for ‘this_col_name’ and ‘ref_col_name’. Influences which logical operator is used to combine multiple JOIN conditions. Default is ‘AND’, use only if other operator is desired.

Prejoining is used to enrich source data by attributes from other database objects. In general, it should only be used when the source data does not hold the Business Key, but the technical Key of an object.

Configuring Prejoins

Within one Stage model, users can setup extraction of one to many columns from other database objects. Per prejoin one dictionary is defined:

prejoined_columns:
    - extract_columns: 
         - <name_of_column_to_be_selected>
      aliases:
         - <extracted_column_alias>
      ref_model: <name_of_other_dbt_model>
      this_column_name: <name_of_col_in_this_object>
      ref_column_name: <name_of_col_in_ref_object>
    - extract_columns: 
         - <name_of_other_column_to_be_selected>
      aliases:
         - <other_extracted_column_alias>
      src_name: <name_of_dbt_source>
      src_table: <name_of_table_within_dbt_source>
      this_column_name: 
          - <name_of_col_1_in_this_object>
          - <name_of_col_2_in_this_object>
      ref_column_name: 
          - <name_of_col_1_in_ref_object>
          - <name_of_col_2_in_ref_object>

Example of a definition of prejoined_columns parameter in an example stage:

prejoined_columns:
    - extract_columns: 
         - id
      aliases:
         - businessid
      ref_model: business_raw
      this_column_name: ContractId
      ref_column_name: ContractId
    - extract_columns: 
         - contractnumber
         - contractkey
      aliases:
         - contractnumber_pj
         - contractkey_pj
      src_name: stg_prod
      src_table: contract
      this_column_name: 
          - contract_id
          - other_column
      ref_column_name: 
          - id
          - other_column
      operator: OR 

 

Legacy Syntax (before v1.9.0)

This is the legacy syntax which was used before datavault4dbt v1.9.0.
It is still usable for backwards compatibility.

The configuration includes the following parameters per prejoined column:

Parameters Data Type Explanation
ref_model string Name of the other dbt model that should be referred. Either this, or the parameters ‘src_name’ and ‘src_table’ must be defined.
src_name string If a dbt source should be prejoined, use this parameter to set the name of the source, as defined in the sources section of a yml file. Must be used together with ‘src_table’, and instead of ‘ref_model’.
src_table string If a dbt source should be prejoined, use this parameter to set the table of the source ‘src_name’, as defined in the sources section of a yml file. Must be used together with ‘src_name’, and instead of ‘ref_model’.
bk string Name of the column inside the referred object that should be extracted.
this_column_name string / list Specifies one or more columns within the source model of this stage, that should be used as the JOIN condition of the Prejoin. Can be multiple columns, but must match the number of columns defined in ‘ref_column_name’.
ref_column_name string / list Specifies one or more columns within the referenced object of this prejoin, that should be used as the JOIN condition of the Prejoin. Can be multiple columns, but must match the number of columns defined in ‘this_column_name’.
(optional) operator string Only used when multiple columns are defined for ‘this_col_name’ and ‘ref_col_name’. Influences which logical operator is used to combine multiple JOIN conditions. Default is ‘AND’, use only if other operator is desired.

Defines information about information that needs to be prejoined. Most commonly used to create links, when the source data does not hold the Business Key, but the technical key of the referred object. The values of the dict are the aliases you want to give the prejoined columns. Typically, but not always, this should be the same as the name of the prejoined column inside the prejoined entity. For each prejoined column a few things need to be defined inside another dictionary now. ‘src_name’ holding the name of the source of the prejoined entity, as defined in the .yml file. ‘src_table’ holds the name of the prejoined table, as defined inside the .yml file. ‘bk’ Holds the name of the business key column inside the prejoined table. ‘this_column_name’ holds the name of the column inside the original source data, that refers to the prejoined table. ‘ref_column_name’ holds the name of the column, that is refered by ‘this_column_name’ inside the prejoined table.

Prejoining is used to enrich source data by attributes from other database objects. In general, it should only be used when the source data does not hold the Business Key, but the technical Key of an object.

Configuring Prejoins

Within one Stage model, users can setup extraction of one to many columns from other database objects. Per column, one key-value pair of a dictionary needs to be defined:

prejoined_columns:
    <col_alias_1>:
        ref_model: <name_of_other_dbt_model>
        bk: <name_of_column_to_be_selected>
        this_column_name: <name_of_col_in_this_object>
        ref_column_name: <name_of_col_in_ref_object>
    <col_alias_2>:
        src_name: <name_of_dbt_source>
        src_table: <name_of_table_within_dbt_source>
        bk: <name_of_column_to_be_selected>
        this_column_name: 
            - <name_of_col_1_in_this_object>
            - <name_of_col_2_in_this_object>
        ref_column_name: 
            - <name_of_col_1_in_ref_object>
            - <name_of_col_2_in_ref_object>

This configuration will roughly translate to the following SQL part:

SELECT 
    src.*,
    pj_1.<name_of_column_to_be_selected> AS <col_alias_1>,
    pj_2.<name_of_column_to_be_selected> AS <col_alias_2>
FROM  {{ this }} src
LEFT JOIN {{ ref(<name_of_other_dbt_model>) }} pj_1
    ON src.<name_of_col_in_this_object> = pj_1.<name_of_col_in_ref_object>
LEFT JOIN {{ source(<name_of_dbt_source>, <name_of_table_within_dbt_source>) }} pj_2
    ON src.<name_of_col_1_in_this_object> = pj_2.<name_of_col_1_in_ref_object>
    AND src.<name_of_col_2_in_this_object> = pj_2.<name_of_col_2_in_ref_object>

Example of a definition of prejoined_columns parameter in an example stage:

prejoined_columns:
    businessid:
        ref_model: 'business_raw'
        bk: 'ID'
        this_column_name: 'ContractId'
        ref_column_name: 'ContractId'
    contractnumber:
        src_name: 'source_data'
        src_table: 'contract'
        bk: 'contractnumber'
        this_column_name: 'ContractId'
        ref_column_name: 'Id'
    master_account_key:
        src_name: 'source_data'
        src_table: 'account'
        bk: 'account_key'
        this_column_name: 'master_account_id'
        ref_column_name: 'Id'