

Timestamp 'epoch' +cast(oh."O_TS" as bigint) * interval '1 second' as ORDER_DATE, Select cast(oh."OID" as varchar) as ORDER_ID,

Timestamp 'epoch' +cast(o."DATE_OF_BIRTH" as bigint) * interval '1 second' as BIRTH_DATE,ĭatediff(years,timestamp 'epoch' +cast(o."DATE_OF_BIRTH" as bigint) * interval '1 second',current_date) as CUST_AGE,Ĭast(o."CONTACT"."PHONE" as varchar) as PHONE,Ĭast(o."CONTACT"."EMAIL" as varchar) as EMAIL, Timestamp 'epoch' +cast(o."START_DATE" as bigint) * interval '1 second' as START_DATE, customer attributes (N.B the source start_date and birth_date elements are in epoch seconds so converting it to timestamp)Ĭast(o."CUSTOMER_ID" as varchar) as CUSTOMER_ID,Ĭast(o."FIRST_NAME" as varchar) as FIRST_NAME,Ĭast(o."LAST_NAME" as varchar) as LAST_NAME, Now that we have some data, let’s start flattening and separating it out into some logical views to enable some relational queries, and enriching the content so we can generate some insights. Without it, I would get lots of null / empty columns. Querying the dataįirst things first – due to the mixed case employed through out the JSON data, I had to run set enable_case_sensitive_identifier to true in order to get consistent results through out this excercise. So at this point I’m happy that the data has loaded satisfactorily and can be queried as JSON. Provided there are no errors we can inspect the data:Īfter loading, here is the info stored in pg_table_def and svv_table_info: Below is the DDL for creating the table and the copy command: create table cust_ord_hist (load_date timestamp default current_timestamp, order_data super) Ĭredentials 'aws_access_key_id= aws_secret_access_key=' I have pushed the JSON files to an S3 bucket, so will use the copy command to load the data to a table in the database. We’ll start off by loading the data into a table with a single super type column, and use the ‘noshred’ option. We’ll generate a medium sized dataset first in JSON text format and see how this works with the super type (we’ll address avro later). We also have a map type to store the customer’s current contact information. Within each order record we can have numerous order lines, so 2 levels of nesting. Effectively for every customer record we are storing their order history as a sub-record/nested table. The above is an avro schema representation of the recordset. The super type offers an advantage over character types as it currently supports upto 1MB of data per field. Redshift has long provided support for querying and manipulating JSON formatted data, and previously you might have used a varchar type to store this, or accessed and unnested formatted files via Spectrum and external tables so this is functionality is a welcome addition. The super type became generally available in Redshift recently, making semi-structured data more manageable in-database.
