From time to time, Stitch will encounter data that can’t be loaded losslessly into the destination table in your destination. When this happens, Stitch may have to alter the structure of the table in order to successfully load the data.


Reasons for table structural changes

Stitch may need to perform table alterations for several reasons, including:

In this guide are examples of how Stitch will behave in each of these scenarios for each currently supported destination type.


Examples in this guide

The examples in this guide will use an example table named customers to demonstrate Stitch’s behavior for each scenario.

Excluding the _sdc columns, Stitch determines this is the structure of the customers table:

Column name Data type
id BIGINT
name STRING
age BIGINT
has_magic BOOLEAN

VARCHAR column widening

To preserve your destination’s performance and reduce disk usage, Stitch uses the smallest possible VARCHAR column when storing string data.

For example: If the maximum width of a string column across all records is currently 127, Stitch will type the destination column as VARCHAR(128).

As string data can vary in width, Stitch will take different actions to accommodate the data, depending on the destination in use. See below for an example.

VARCHAR column widening example: First replication job

During the first replication job, Stitch extracts the following records for the customers table:

id (BIGINT) name (STRING) age (BIGINT) has_magic (BOOLEAN)
1 Finn 15 false
2 Jake 7 true

VARCHAR column widening example: Second replication job

During the next replication job, Stitch extracts the records in the table below. In this example, the name column contains data that exceeds its previously known width:

id (BIGINT) name (STRING) age (BIGINT) has_magic (BOOLEAN)
3 PrincessBubblegumWhoIsTheCurrentIncarnationOfTheCandyElementalAndRulesOverTheCandyKingdomWasBornFromTheMotherGumAfterTheMushroomWarAndHasARivalWhoIsAVampireNamedMarcellineTheVampireQueen 16 true
4 BMO 11 false

VARCHAR column widening example: New table structure

How Stitch loads the data depends on the type destination being used. Click the tabs below to see how accommodating this data works for each destination.

When Stitch detects string data that exceeds its previous maximum width in Azure Synapse Analytics destinations, Stitch will automatically widen the column to accommodate the data.

In this example, the name column would have originally been NVARCHAR(128). Stitch will widen the column to NVARCHAR(256).

The structure of the customers table would be altered to the following:

Column name Data type
id [pk] BIGINT
name NVARCHAR(256)
age BIGINT
has_magic BOOLEAN

When Stitch detects string data that exceeds its previous maximum width in BigQuery destinations, Stitch will not widen the column.

All string data is stored as TEXT in BigQuery destinations, which do not require setting a width.

The structure of the customers would remain unchanged:

Column name Data type
id [pk] BIGINT
name TEXT
age BIGINT
has_magic BOOLEAN

When Stitch detects string data that exceeds its previous maximum width in Panoply and Redshift destinations, Stitch will automatically widen the column to accommodate the data.

In this example, the name column would have originally been VARCHAR(128). Stitch will widen the column to VARCHAR(256), which requires dropping and re-creating the column.

Note: Dropping and re-creating columns will affect views dependent on the column.

The structure of the customers table would be altered to the following:

Column name Data type
id [pk] BIGINT
name VARCHAR(256)
age BIGINT
has_magic BOOLEAN

When Stitch detects string data that exceeds its previous maximum width in PostgreSQL destinations, Stitch will not widen the column.

All string data is stored as TEXT in PostgreSQL destinations, which do not require setting a width.

The structure of the customers would remain unchanged:

Column name Data type
id [pk] BIGINT
name TEXT
age BIGINT
has_magic BOOLEAN

When Stitch detects string data that exceeds its previous maximum width in Snowflake destinations, Stitch will automatically widen the column to accommodate the data.

In this example, the name column would have originally been VARCHAR(128). Stitch will widen the column to VARCHAR(256).

The structure of the customers table would be altered to the following:

Column name Data type
id [pk] BIGINT
name VARCHAR(256)
age BIGINT
has_magic BOOLEAN

Columns with mixed data types

Stitch requires that there only be one data type per column to properly type, load, and store data. If a column contains multiple data types, Stitch will create additional columns and append the data type to the column name. See below for an example.

Mixed data types example: First replication job

During the first replication job, the following rows are extracted for the customers table:

id (BIGINT) name (STRING) age (BIGINT) has_magic (BOOLEAN)
1 Finn 15 false
2 Jake 7 true

Mixed data types example: Second replication job

During the next replication job, the following rows are extracted:

id (BIGINT) name (STRING) age (DOUBLE) has_magic (STRING)
3 Bubblegum 16.0 yes
4 BMO 11 false

Stitch will detect that the data types in these newly replicated rows differ than the ones from the initial replication job. In this case:

  • age was originally a BIGINT, but can sometimes be a decimal
  • has_magic was originally a BOOLEAN, but can sometimes be a string

Mixed data types example: New table structure

To accommodate the data, Stitch will create a new column for the newly detected data type and store the data for that data type in the new column.

How columns are named as a result of “splitting” mixed data types depends on the type of destination being used. Click the tabs below to see how accommodating this data works for each destination.

When mixed data types are detected in the same column in Azure Synapse Analytics destinations:

  • Original columns: The column name remains as-is. The column’s data type will be the first data type Stitch detected for the column. For example: age will only store BIGINT data. If a different data type is detected for age, it will be stored in an additional column and this column will be null.
  • Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example: age__fl will only store float/double data. If a different data type is detected for age__fl, it will be stored in the correctly typed column and this column will be null.

The structure of the customers table would be altered to the following:

Column name Data type
id [pk] BIGINT
name TEXT
age BIGINT
age__fl FLOAT
has_magic BOOLEAN
has_magic__st TEXT

When mixed data types are detected in the same column in BigQuery destinations:

  • Original columns: The column name remains as-is. The column’s data type will be the first data type Stitch detected for the column. For example: age will only store BIGINT data. If a different data type is detected for age, it will be stored in an additional column and this column will be null.
  • Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example: age__nu will only store numeric data. If a different data type is detected for age__nu, it will be stored in the correctly typed column and this column will be null.

The structure of the customers table would be altered to the following:

Column name Data type
id [pk] BIGINT
name TEXT
age BIGINT
age__nu NUMERIC
has_magic BOOLEAN
has_magic__st TEXT

When mixed data types are detected in the same column in Panoply and Redshift destinations:

  • Original columns: The column is re-named to include the original (first detected) data type and only data of the original type will be stored in this column. For example: age__bi will only store BIGINT data. If a different data type is detected for age__bi, it will be stored in an additional column and this column will be null.
  • Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example: age__do will only store double data. If a different data type is detected for age__do, it will be stored in the correctly typed column and this column will be null.

The structure of the customers table would be altered to the following:

Column name Data type
id [pk] BIGINT
name TEXT
age__bi BIGINT
age__do DOUBLE
has_magic__bo BOOLEAN
has_magic__st VARCHAR

When mixed data types are detected in the same column in PostgreSQL destinations:

  • Original columns: The column is re-named to include the original (first detected) data type and only data of the original type will be stored in this column. For example: age__bi will only store BIGINT data. If a different data type is detected for age__bi, it will be stored in an additional column and this column will be null.
  • Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example: age__do will only store double data. If a different data type is detected for age__do, it will be stored in the correctly typed column and this column will be null.

The structure of the customers table would be altered to the following:

Column name Data type
id [pk] BIGINT
name TEXT
age BIGINT
age__do DOUBLE
has_magic BOOLEAN
has_magic__st TEXT

When mixed data types are detected in the same column in Snowflake destinations:

  • Original columns: The column name remains as-is. The column’s data type will be the first data type Stitch detected for the column. For example: age will only store BIGINT data. If a different data type is detected for age, it will be stored in an additional column and this column will be null.
  • Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example: age__fl will only store numeric data. If a different data type is detected for age__fl, it will be stored in the correctly typed column and this column will be null.

The structure of the customers table would be altered to the following:

Column name Data type
ID [pk] BIGINT
name TEXT
age BIGINT
age__fl FLOAT
has_magic BOOLEAN
has_magic__st TEXT

Adding and removing columns

  • Adding columns: When a new column is added to a source table and selected for replication, Stitch will append the column to the end of the destination table.

    For Key-based Incremental tables, data for the column will be replicated onward from the saved Replication Key . Default NULLs will be placed in existing rows unless:

    1. A historical backfill in the source updated the records’ Replication Key values, or
    2. A table-level reset is performed and a full re-replication is queued
  • Removing columns: When a column is removed in the source or de-selected from replication, Stitch will place default NULLs in the column going forward. Columns will not be removed from the destination.



Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.