Prejoining
Prejoins are described as dictionaries with the following keys:
Key | Data Type | Explanation |
---|---|---|
extract_columns | string | list of strings | Single column or list of columns that will be extracted from the prejoin. If no alias is given the columns will have the same name as in the target-relation. |
aliases (optional) | string | list of strings | Optionally, aliases for the extract_columns can be defined. If given, it needs to have the same number of columns as in extract_columns, otherwise a compilation error will be thrown. |
this_column_name | string | list of strings | Specifies one or more columns within the source model of this stage, that should be used as the JOIN condition of the Prejoin. Can be multiple columns, but must match the number of columns defined in ‘ref_column_name’. |
ref_column_name | string | list of strings | Specifies one or more columns within the referenced object of this prejoin, that should be used as the JOIN condition of the Prejoin. Can be multiple columns, but must match the number of columns defined in ‘this_column_name’. |
ref_model | string | Name of the other dbt model that should be referred. Either this, or the parameters ‘src_name’ and ‘src_table’ must be defined. |
src_name | string | If a dbt source should be prejoined, use this parameter to set the name of the source, as defined in the sources section of a yml file. Must be used together with ‘src_table’, and instead of ‘ref_model’. |
src_table | string | If a dbt source should be prejoined, use this parameter to set the table of the source ‘src_name’, as defined in the sources section of a yml file. Must be used together with ‘src_name’, and instead of ‘ref_model’. |
operator (optional) | string | Only used when multiple columns are defined for ‘this_col_name’ and ‘ref_col_name’. Influences which logical operator is used to combine multiple JOIN conditions. Default is ‘AND’, use only if other operator is desired. |
Prejoining is used to enrich source data by attributes from other database objects. In general, it should only be used when the source data does not hold the Business Key, but the technical Key of an object.
Configuring Prejoins
Within one Stage model, users can setup extraction of one to many columns from other database objects. Per prejoin one dictionary is defined:
prejoined_columns:
- extract_columns:
- <name_of_column_to_be_selected>
aliases:
- <extracted_column_alias>
ref_model: <name_of_other_dbt_model>
this_column_name: <name_of_col_in_this_object>
ref_column_name: <name_of_col_in_ref_object>
- extract_columns:
- <name_of_other_column_to_be_selected>
aliases:
- <other_extracted_column_alias>
src_name: <name_of_dbt_source>
src_table: <name_of_table_within_dbt_source>
this_column_name:
- <name_of_col_1_in_this_object>
- <name_of_col_2_in_this_object>
ref_column_name:
- <name_of_col_1_in_ref_object>
- <name_of_col_2_in_ref_object>
Example of a definition of prejoined_columns parameter in an example stage:
prejoined_columns:
- extract_columns:
- id
aliases:
- businessid
ref_model: business_raw
this_column_name: ContractId
ref_column_name: ContractId
- extract_columns:
- contractnumber
- contractkey
aliases:
- contractnumber_pj
- contractkey_pj
src_name: stg_prod
src_table: contract
this_column_name:
- contract_id
- other_column
ref_column_name:
- id
- other_column
operator: OR
Legacy Syntax (before v1.9.0)
This is the legacy syntax which was used before datavault4dbt v1.9.0.
It is still usable for backwards compatibility.
The configuration includes the following parameters per prejoined column:
Parameters | Data Type | Explanation |
---|---|---|
ref_model | string | Name of the other dbt model that should be referred. Either this, or the parameters ‘src_name’ and ‘src_table’ must be defined. |
src_name | string | If a dbt source should be prejoined, use this parameter to set the name of the source, as defined in the sources section of a yml file. Must be used together with ‘src_table’, and instead of ‘ref_model’. |
src_table | string | If a dbt source should be prejoined, use this parameter to set the table of the source ‘src_name’, as defined in the sources section of a yml file. Must be used together with ‘src_name’, and instead of ‘ref_model’. |
bk | string | Name of the column inside the referred object that should be extracted. |
this_column_name | string / list | Specifies one or more columns within the source model of this stage, that should be used as the JOIN condition of the Prejoin. Can be multiple columns, but must match the number of columns defined in ‘ref_column_name’. |
ref_column_name | string / list | Specifies one or more columns within the referenced object of this prejoin, that should be used as the JOIN condition of the Prejoin. Can be multiple columns, but must match the number of columns defined in ‘this_column_name’. |
(optional) operator | string | Only used when multiple columns are defined for ‘this_col_name’ and ‘ref_col_name’. Influences which logical operator is used to combine multiple JOIN conditions. Default is ‘AND’, use only if other operator is desired. |
Defines information about information that needs to be prejoined. Most commonly used to create links, when the source data does not hold the Business Key, but the technical key of the referred object. The values of the dict are the aliases you want to give the prejoined columns. Typically, but not always, this should be the same as the name of the prejoined column inside the prejoined entity. For each prejoined column a few things need to be defined inside another dictionary now. ‘src_name’ holding the name of the source of the prejoined entity, as defined in the .yml file. ‘src_table’ holds the name of the prejoined table, as defined inside the .yml file. ‘bk’ Holds the name of the business key column inside the prejoined table. ‘this_column_name’ holds the name of the column inside the original source data, that refers to the prejoined table. ‘ref_column_name’ holds the name of the column, that is refered by ‘this_column_name’ inside the prejoined table.
Prejoining is used to enrich source data by attributes from other database objects. In general, it should only be used when the source data does not hold the Business Key, but the technical Key of an object.
Configuring Prejoins
Within one Stage model, users can setup extraction of one to many columns from other database objects. Per column, one key-value pair of a dictionary needs to be defined:
prejoined_columns:
<col_alias_1>:
ref_model: <name_of_other_dbt_model>
bk: <name_of_column_to_be_selected>
this_column_name: <name_of_col_in_this_object>
ref_column_name: <name_of_col_in_ref_object>
<col_alias_2>:
src_name: <name_of_dbt_source>
src_table: <name_of_table_within_dbt_source>
bk: <name_of_column_to_be_selected>
this_column_name:
- <name_of_col_1_in_this_object>
- <name_of_col_2_in_this_object>
ref_column_name:
- <name_of_col_1_in_ref_object>
- <name_of_col_2_in_ref_object>
This configuration will roughly translate to the following SQL part:
SELECT
src.*,
pj_1.<name_of_column_to_be_selected> AS <col_alias_1>,
pj_2.<name_of_column_to_be_selected> AS <col_alias_2>
FROM {{ this }} src
LEFT JOIN {{ ref(<name_of_other_dbt_model>) }} pj_1
ON src.<name_of_col_in_this_object> = pj_1.<name_of_col_in_ref_object>
LEFT JOIN {{ source(<name_of_dbt_source>, <name_of_table_within_dbt_source>) }} pj_2
ON src.<name_of_col_1_in_this_object> = pj_2.<name_of_col_1_in_ref_object>
AND src.<name_of_col_2_in_this_object> = pj_2.<name_of_col_2_in_ref_object>
Example of a definition of prejoined_columns parameter in an example stage:
prejoined_columns: businessid: ref_model: 'business_raw' bk: 'ID' this_column_name: 'ContractId' ref_column_name: 'ContractId' contractnumber: src_name: 'source_data' src_table: 'contract' bk: 'contractnumber' this_column_name: 'ContractId' ref_column_name: 'Id' master_account_key: src_name: 'source_data' src_table: 'account' bk: 'account_key' this_column_name: 'master_account_id' ref_column_name: 'Id'