Insert array into table in Snowflake

Trying to insert array into a table in Snowflake using VALUES clause may result in error similar to this one:

SQL compilation error: Invalid expression [ARRAY_CONSTRUCT(‘a1’, ‘a2’, ‘a3’, ‘a4’, ‘b1’)] in VALUES clause

To deal with it you need to use SELECT instead of VALUES:

insert into my_table_with_array_column 
SELECT 1,  ['a1','a2','a3','a4','b1']
UNION ALL 
SELECT 2, [];

Alternatively you can use ARRAY_CONSTRUCT:

insert into my_table_with_array_column 
SELECT 3, ARRAY_CONSTRUCT('a1','b1')
UNION ALL
SELECT 4, ARRAY_CONSTRUCT();

Terraform for Snowflake – add boolean column with default true or default false

The documentation of chanzuckerberg provider might not be very clear about specifying default for a column, so let me present a piece of working code:

resource "snowflake_table" "this" {
  for_each        = var.tables
  database        = "${var.db}_${upper(var.env)}"
  schema          = var.raw_data_schema
  name            = each.value
  comment         = "Table for events from ${each.key} topic"

  column {
    name     = "ID"
    type     = "NUMBER(38,0)"
    nullable = true

    identity {
      start_num = 1
      step_num  = 1
    }
  }

  column {
    ...
  }

  column {
    name     = "DELETED"
    type     = "BOOLEAN"
    nullable = true

    default {
      constant = "false"
    }
  }
}

Unstable behavior of TRY_TO_TIMESTAMP function in Snowflake

TRY_TO_TIMESTAMP_NTZ will try to determine the format based on the first data it encounters. When there is only one row containing int, eg. ‘99997’, it assumes it’s a number of seconds from the beginning of the epoch, and expects all rows to have the same type of value. But if first row contains actual date, the result will be different:

select TRY_TO_TIMESTAMP_NTZ( $1 ) from values 
('99997'), -- will return 1970-01-02T03:46:37Z
('2021-04-05 10:00'); -- will return null


select TRY_TO_TIMESTAMP_NTZ( $1 ) from values 
('2021-04-05 10:00'), 
('99997'); -- will return null

Passing current date for Snowflake sub-procedure

Passing parameters between JavaScript and SQL happens to be problematic in Snowflake.

If you need to pass current date as a parameter to a sub-procedure, you can use a string parameter, and pass Date.now() converted to string:

Date.now().toString()

In subprocedure use SQL functions to parse the date, eg. TO_TIMESTAMP_TZ().

Example:

CREATE or replace PROCEDURE log_transformation (
    MSG       string, 
    START_TIMESTAMP    string
)
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS $$

    snowflake.createStatement( { sqlText: 
        `insert into ingestion_log (
            message,
            start_dt
        )
        values (:1, TO_TIMESTAMP_TZ(:2))`
        , binds:[
            MSG,
            START_TIMESTAMP
         ] } 
   ).execute();
$$;

create or replace procedure transform_nb_responses()
    returns string null
    language javascript
    as
    $$

		var execution_beginning = Date.now().toString();
		var accumulated_log_messages = "";

		function log_success() {
			snowflake.execute({
				sqlText: `call LOG_TRANSFORMATION(:1, :2)`, 
				binds:[accumulated_log_messages, execution_beginning]
			});
		}

		// some transformations

		return 'success';
    $$
    ;

Change column data type in Snowflake with Terraform

Terraform providers for snowflake are not perfect, neither is Snowflake in this case, because it doesn’t allow data type change in most of cases. Solution is a kind of hack, and includes multiple steps:

1. Use get_ddl() function to easily retrieve create statement:

select get_ddl('table', 'RESPONSES');

2. Copy the result, paste into worksheet and IMMEDIATELY delete the “or replace” and change its name. Also adjust data type that needs to be changed.

create TABLE RESPONSES_tmp (ID NUMBER(38,0), col1 NUMBER(12,2));

3. If the table is used by a task it is time to stop it:

ALTER TASK raw_data.RESPONSES_FROM_RAW_DATA SUSPEND;

4. Copy data into new table:

insert into RESPONSES_tmp select * from RESPONSES;

5. Rename old table:
alter table RESPONSES rename to RESPONSES_old;

6. Rename new table:
alter table RESPONSES_tmp rename to RESPONSES;

7. Task can be resumed now:
alter task raw_data.NB_RESPONSES_FROM_RAW_DATA resume;

8. Time for terraform – in the snowflake_table resource just adjust column types and run. The plan should say, that column data type was changed outside of it, and no changes should be required.

9. Drop old table.
drop table RESPONSES_old;

Snowflake task that runs exactly once

Strange as it may sound, a task that will run exactly one might sometimes save you. In my situation it is a hack for missing features in a terraform for Snowflake provider.

According to the Snowflake documentation, a task can only run regularly on schedule, or not run at all.

To get a task run exactly once, it needs to suspend (or drop) itself. With a little help from stored procedures it is perfectly doable.

create or replace procedure destroy_task_test()
returns string null
language javascript
as
$$
// 1. do what the task needs to do
// …

// 2. drop the task
var sql_command = `DROP TASK SELF_DESTROY_TASK;`;
var sql_statement = snowflake.createStatement({sqlText: sql_command});
sql_statement.execute();
$$

Then create the suicidal task:

create task SELF_DESTROY_TASK schedule = '1 minute'
as
call destroy_task_test();

alter task SELF_DESTROY_TASK resume;

Now run SHOW TASKS and watch your task disappear after a minute;

Instead of DROP TASK you may want to use ALTER TASK <task name> SUSPEND.