Cloud Data Warehouses - Mar 27 2020
The Guide to Snowflake Data Warehouses
Snowflake data warehouses can be a significant advantage for business because they allow business analysts, data scientists, and data engineers to work with any data to gain insights. Per the Snowflake website: “It’s designed with a patented new architecture to be the centerpiece for data pipelines, data warehousing, data lakes, data application development, and for building data exchanges to easily and securely share governed data. The result? A platform delivered as a service that’s powerful but simple to use.”
What is Snowflake?
Snowflake is a tool that helps companies share data from one system to another, even if the systems are different and couldn’t “speak” to each other before. Snowflake is a cloud data platform. So, it lives on cloud servers, including Amazon Web Services (AWS), Microsoft Azure, and Google Cloud.
What is a Snowflake data warehouse?
Snowflake is a data warehouse, simply meaning that it stores data. It uses a multi-cluster shared data architecture that allows businesses to securely share data sets across multiple platforms and systems. Data warehouses allow businesses to store and share data for sales funnel processes, and data analytics. With Snowflake, a large company can create and share a single source of truth that enables self serve analytics as well as Advanced Analytics leveraging Snowflake’s computing power. The traditional ETL process to migrate date to Snowflake is transformed to ELT where the transformation of the data is done in Snowflake. Snowflake supports a range of partner connects to enable insights driven decision making
Why use a Snowflake data warehouse in the cloud?
Speed, ease, and cost. Snowflake creates a virtual data warehouse on AWS or other cloud server platforms. Because it’s a virtual data warehouse, it can scale in real-time, based on your needs. The ability to separate storage and compute layer differentiates it from other cloud data warehouses. Therefore, a Snowflake data warehouse is “faster, easier to use, and more flexible” than old school data warehouses. Per Snowflake.com:
“Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). Snowflake provides a data warehouse that is faster, easier to use, and far more flexible than traditional data warehouse offerings.”https://www.snowflake.com/product/
What is Snowflake ETL?
An ETL tool allows different databases to communicate through Extracting Data, Transforming Data, and Learning. Per Wikipedia:
“In computing, extract, transform, load (ETL) is the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source(s) or in a different context than the source(s). The ETL process became a popular concept in the 1970s and is often used in data warehousing.”https://en.wikipedia.org/wiki/Extract,_transform,_load
Snowflake ELT (unlike traditional ETL) is a multi-step process that extracts data from a data feed or source, then creates files that allow that data to be used in a new environment and then transform it. Per the Snowflake website:
“Snowflake eliminates the need for lengthy and sometimes labor intensive extract, transform. load processes by making data easily accessible for internal and external partners via data sharing and Snowflake Secure Data Sharing.”https://www.snowflake.com/trending/etl-tools
What are the advantages of snowflake schema in a data warehouse?
Fundamentally, the advantages of Snowflake schema are better data while using less disk space. Per Vertabelo:
“There are two main advantages to the snowflake schema: Better data quality (data is more structured, so data integrity problems are reduced) Less disk space is used then in a denormalized model.”https://www.vertabelo.com/blog/data-warehouse-modeling-the-snowflake-schema/
In addition, Snowflake schema offers multiple dimension tables. This makes Snowflake schema better long term, with simpler maintenance and fewer update requirements. Virtually eliminating redundant files and data.
Does Snowflake work with Tableau?
Yes. Snowflake and Tableau allow businesses to scale the size of their data warehouse per demand. Because Snowflake is a virtual data warehouse (built on a cloud like AWS), this offers an upside protection of massive computing power, and the downside protection of reduced costs. Per Tableau:
“Using Snowflake, organizations have the ability to scale their data warehouse up and down as the situation demands. … Snowflake’s fully relational SQL data warehouse is built for the cloud, making it efficient to store and access all your data from one integrated location.”https://www.tableau.com/solutions/snowflake
Possibly the biggest advantage of Snowflake and Tableau is that the historic use of predictive analytics can be replaced with real data. Eliminating the guesswork, and allowing businesses to make important decisions based on facts, not estimates.
How much does a Snowflake data warehouse cost?
The pricing of a Snowflake virtual data warehouse varies widely, depending on the needs of your company and size of your customer base. It can be as little as a couple hundred dollars per month, to several thousand. Luckily, there won’t be too many surprises. Your real-world costs will be determined before you pay a penny for Snowflake. Here’s a link to Snowflake’s pricing page, which unfortunately doesn’t show any actual dollar numbers. So, here’s our general estimate:
- Startup Business $200/month
- Medium Business $800/month
- Large Corporation $2000/month
What are some Snowflake data warehouse best practices?
Snowflake best practices are the same as you might follow with any virtual data warehouse. First and foremost, understanding the basics of Snowflake can have a huge impact on your monthly costs, and help you focus on the data you need, while avoiding storing data you don’t need. HevoData.com has outlined a fantastic list:
“(A) Understanding Credit charges on Snowflake
Snowflake offers pay per second billing. They allow the different size of the warehouse (Large, X-Large, 2X-Large, etc)… Having an understanding of this gives you the flexibility to run as many clusters as you want and suspend them when not in need. This, in turn, would help you act smartly and save costs. Try out combinations of different query types and warehouse sizes. This will help you arrive at the right combination for your workloads.
(B) Impact of Query Composition on Snowflake
Like in most Data Warehouses, the size and the complexity of the query determines the number of servers needed to process the query. For complex queries, the number of rows has less impact in comparison to the overall data size. The number of table joins, filtering using predicates, etc. has an impact on query processing hence utilize them carefully.
(C) Impact of Caching on Queries
“Caching is standard practice in Data Warehousing as it improves the performance of the Warehouse as it enables subsequent queries to read from the cache instead of source tables.”https://hevodata.com/blog/snowflake-etl-best-practices-cloud-data-warehouse/
Bottomline, when we work with clients, we tend to architect their data warehouse to address their primary needs. Then, over time, we pull additional tables as new questions arise, and new data needs are discovered. This avoids wasting disk space and cost on storing and analyzing data that is proven ineffective over time.
Who are the top Snowflake data warehouse competitors?
Snowflake vs Microsoft Azure SQL Data Warehouse:
Microsoft Azure is, in effect, a cloud-based version of a traditional SQL server. This has its up- and downsides. The biggest advantage of a cloud-based server (Azure or Snowflake) is its ability to scale up and down, based on real-time need. Theoretically, you only pay for what you need. Both Azure SQL and Snowflake enable this. However, we see real-world costs of a Snowflake data warehouse always tending to be dramatically lower. Plus, the performance of a Snowflake data warehouse is dramatically better than an Azure SQL architecture.
Snowflake vs Oracle Autonomous Data Warehouse:
Snowflake can be deployed from anywhere in minutes. Oracle is still primarily an on-premise solution. Plus, the cost of Snowflake is dramatically lower. While Oracle is still the database market share leader, they’re barely in the lead. Just as telling, the demand for Oracle Database Engineers has dropped off a cliff. So, the market would seem to indicate that new virtual data warehouse solutions are more appealing to companies of all sizes, versus the Oracle Autonomous solution. We tend to avoid Oracle data warehouse architecture, unless the client has an existing investment in Oracle solutions.
Do I need to hire a Snowflake data engineer or specialist?
Larger companies have the funds to consider hiring a full-time Snowflake Data Engineer. Startups and medium-sized businesses should consider a specialist Snowflake consultant to help them implement and get rolling.
At Nextphase.ai, we help all sorts of companies architect Snowflake data warehouses — from global companies to Bay Area seed startups. The common issues we’ve discovered is teaching teams to maintain their virtual data warehouse, and setup new dimension tables. We get a lot of follow up calls and emails asking for tweaks and additions. Our recommendation is to do exactly this way. Get an expert consultant, like NextPhase, to set up your Snowflake data warehouse, and get everything functioning perfectly. After you discover the power of this new advanced data analytics system, only then should you consider whether the ROI warrants additional investment — whether that be a full-time Snowflake Data Engineer, training for an existing engineer, or simply continuing to work with a third-party like NextPhase.ai.
Do I need a Cloud Migration Service or Company Partner near me?
Working with a cloud migration service or partner near you is an advantage for communication and workflow. Generally, close proximity makes those things more efficient. If you preferred cloud migration service is not nearby, make sure you select a cloud migration service that has an established virtual workflow, and who clearly communicates expectations and process. At NextPhase.ai, we work with local and global clients efficiently, and the key is a well-established process and communication plan.
Have questions? Email anytime for a free consultation.
NextPhase.ai is a cloud data management and analytics services provider. For 10 years, we’ve helped global companies harness the power to turn data into insights that drive growth. Whether you’re migrating to the cloud or implementing a cloud data warehouse, contact us to schedule a workshop.
Have questions? We help companies like yours, every day.
Email us at email@example.com