In this article, we will cover the cloud-native platform Snowflake (SF), looking at its key features. We will also compare SaaS products with on-premises products and discuss some of the differences between OLAP optimized database systems, such as SF, and traditional OLTP databases such as ORACLE or PostgreSQL.
SF is essentially a cloud-based data warehousing platform and one of the representatives of the endless list of relational database management systems (RDBMS). It’s a cloud-agnostic service, which means that it can be hosted in three main cloud computing providers — Amazon Web Services (AWS), Microsoft Azure (Azure), and Google Cloud Platform (GCP). The decision, on where the storage will be held, depends entirely on the firm’s stance in terms of these main cloud providers.
As SF is a true Software-as-a-Service (SaaS) product, the big advantage compared to an on-premises solution is that there is no hardware and software to install and configure. That means that the firm does not have to spend resources on the maintenance of on-premises servers. Every aspect of software installation is handled automatically by Snowflake and no additional costs are incurred. This means a decrease in the costs for IT department employees, who would usually have to set up and maintain the hardware. Also, the effortless scalability of the storage and compute resources is a huge advantage in comparison to on-premises solutions on the market. One of the disadvantages of an SaaS solution is the potential dependency on the cloud provider. A second disadvantage potentially might be the data security.
One differentiating factor between Snowflake and more traditional databases is that SF architecture separates the storage (Storage layer) and compute (Compute layer) to scale independently of each other. This means that if the company needs to store a high volume of data, but at the same time the high performance of the loads is not crucial, there is no need to pay for an integrated bundle and the company can simply scale the storage automatically without paying for higher compute. If certain parts of the load in the firm’s Data warehouse (DWH) need to scale up compute processing power due to a high data volume in data pipelines, or if there is a high complexity of data transformations, it’s possible to seamlessly scale up the compute resources without any downtime.
These compute resources are made up of SF virtual warehouses, which are responsible for the processing of the queries. Each warehouse (or cluster) provides the resources (CPU, memory) for the execution of all the DML and DDL statements. The warehouse can be scaled based on T-shirt sizes (XS to 6X-Large) without affecting the storage even during runtime, and it’s possible to create different warehouses for different purposes, such as separated warehouses for DWH load, reporting, and data ingestion. The higher the T-Shirt size, the more compute resources are being used for query processing and the better the performance of the query typically is.
Other important features of Snowflake include:
As you can see, Snowflake is a very versatile data warehousing solution that offers modern features and great support. Snowflake constantly releases new features (such as hybrid tables for OLTP purposes or python procedure support, known as snowpark for python) that add more and more usability to the product.