Get a fast start on your automation journey with Coalesce!

Jakub Jirka
WhereScape Architect
4
minutes to read
November 29, 2023
Big Data
Business Intelligence
Data Warehouse
Coalesce
Data Warehouse Solutions

In previous articles, we’ve looked at automation that mostly uses WhereScape. This time we’ll look at things from a different angle. For those on the Snowflake platform, I will serve up some tips for quickly getting into the world of automation. Today, we’ll talk about Coalesce.

Coalesce is an easy-to-start-with automation tool that’s simple to understand. It has a great learning curve, and it’s customizable in so many ways. Let’s quickly list a few reasons why you should try it. It’s easy to set up, it’s fast in delivery from the outset, and its customization is high-level.

What is Coalesce?

Coalesce is an automation tool that’s now focused purely on Snowflake. It can be described as a data transformation tool with GUI-driven experience and column-aware architecture.

With its column level linage, working with Coalesce is on a great level. Adding a new column from the source to all existing tables? That usually isn’t easy or a user-friendly experience, but here on Coalesce you’ll fare much better. Just right-click on the column, click an option to propagate it downstream of the linage and you can then choose all the tables to which you want to add the column. Then you simply regenerate them. It’s all doable with just a few clicks.

There’s one important thing to state at this point. Coalesce won’t bring data to your Snowflake. The team behind Coalesce focus on the transformation, and they do it really well. You can do anything that Snowflake supports.

Coalesce is best-suited for the data-driven development approach. Go, create objects, add the transformations. All of this is done in a GUI-based way with nice bulk option operations. Now we’re getting to one reason why I stated that this is a tool to start with when moving into automation. The GUI helps a lot when it comes to both newcomers and experienced users. And since it’s right there on your Snowflake, you can get started fast.

Easy set-up

Setting up Coalesce takes just a few hours, and maybe just minutes for some fast individuals. You need your Snowflake account, GIT repository (cloud-accessible), and a scheduler that will push the jobs on your production. For development purposes, you can push via the tool, so the scheduler can be set up later.

Basically, if you’ve got your Snowflake and GIT accesses right, you simply connect them to your Coalesce, create a workspace and you are ready to go.

Fast first delivery

Coalesce has a really powerful and easy-to-orient GUI. All the SQL options are there with a few clicks, and the templated behavior of the nodes is easy to read. You can do bulk actions easily, so tasks like creating 50 historical tables take seconds. You can easily process all your pipeline based on the actual column linage automatically. It means that, while developing, you will get your test data to your newly built object in two clicks, no matter how complicated the prior path is.

The first delivery of a small project can be performed in a few days of work. We had one POC where a data engineer who’d been through just a two or three-hour training intro delivered a small project within two days with the occasional help of a more experienced colleague. It is really intuitive to use. I really think it’s well worth giving it a go.

The customization

What would it be for an automation tool to not have extensive options of customization. As mentioned at the outset, it’s all based on the nodes. These are the basic building blocks of your workflow. Most often it’s a table or a view, but it can, for example, be a procedure or stream. The nodes are configured, stored, and versioned in YAML files. You can not only define the structures, types, and behavior, you can also make custom GUI options for specific nodes!

And from the nodes, the actual calls to Snowflake are created based on Jinja 2 templating. With this, your options are almost limitless; it’s simply about getting the coding right. And the best thing is that if your one power user writes it, the rest don’t need to know how to do Jinja or write YAML. All of the stuff needed will be shown in the customizable node GUI.

A great thing is that you can do this, but you don’t have to. You can deliver a lot with the default node types, and there is a reasonable number of pre-created ones that you can obtain. For example, a pack with Data Vault 2.0 support.

Outro

If you are on Snowflake and you are looking for an automation tool or just a way of giving your engineers and architects an easier life, you may give Coalesce a shot. With an easy setup and a great learning curve, you can see the results pretty quickly.

Yep, it won’t solve all of your problems, you will still need to get the data to Snowflake somehow, and you will need to set up the scheduler, but usually you already have those in the stack.

Coalesce also doesn’t support a model-driven developer way of working, but not everybody takes that approach. And with the amount of customization that you have with the node types and templating, you can overcome this difference.

I didn’t get to the details of how the node system works, or how exactly you can create your transformations or workflows. That would take another article. Or perhaps a nice video tutorial. I wanted to talk a little about the reasons why you should try Coalesce. But let’s summarize why with one final sentence: Coalesce will give you more than enough of a toolset to start “right from the box” and the options to scale well.

More like this