READ
Data News

ProjectiveGroup publishes first app on Snowflake Marketplace

Date:September 20, 2023

We are very proud to announce that we have recently published a Snowflake Native App on the Snowflake Marketplace. In this article we will introduce our data transformation framework, TiPS, and summarise our journey converting it into a Native App and achieving a listing on the Snowflake Marketplace.

Background

Snowflake originally emerged as a revolutionary cloud data warehouse before evolving into ‘The Data Cloud’ in recognition of the six additional workloads supported (see diagram below). One of the capabilities that underpins all workload is the Snowflake Marketplace and now, in addition to buying curated datasets, the marketplace supports the purchase of apps that can interact with the data you hold in Snowflake.

At launch (public preview in June 2023) there were 23 Snowflake Native Apps. As of 30th August  2023, this number has increased to 31 (a complete list can be found here), providing various capabilities such as:

  • Identifying data stored in Snowflake but never accessed.
  • Generating reports on Snowflake credit consumption and performance.
  • Making it easier to access an existing offering (e.g., our partner Coherent.global and their fantastic Microsoft Excel model risk remediation platform ‘Spark’).

Introducing TiPS

TiPS is a metadata driven data transformation framework which can be used to build and execute data pipelines. As TiPS relies only on first class database objects, there is almost no learning curve and so engineers already proficient in SQL can focus on writing transformation logic instead of spending their valuable time learning yet another tool.

TiPS is a metadata driven data transformation framework which can be used to build and execute data pipelines.

Once your data source is created in the database (generally a view encapsulating appropriate business logic) and a target is in place (generally a table), a step of your data pipeline is defined and TiPS will take care of dynamically generating all the DML statements needed to execute that step and move the data across.

TiPS was originally built for a client around two years ago, where their specific requirements precluded other data transformation options.  Its capability has evolved in the intervening period and today it supports all of their Snowflake data transformation needs.  We’ve since used TiPS with other clients and have observed that it works particularly well as an ‘accelerator’ in a Proof of Concept to minimise the number of new vendors or technologies the client needs to get to grips with and/or be reviewed by their InfoSec team.

TiPS works particularly well as an ‘accelerator’ in a Proof of Concept to minimise the number of new vendors or technologies.

Features of TiPS

Some of the salient features of TiPS include:

Security
  • Data never leaves your secure Snowflake environment.
  • The only permission required is execution permission on one TiPS stored procedure, meaning the user (or role) executing the pipeline does not require access to read or write data.
Visibility
  • All data pipelines are defined in common database tables. This supports:
  • Object level lineage can be determined by querying pipeline metadata, so impact assessment of any upcoming changes is very easy to discern.
  • Opportunity to drill down to column level lineage by leveraging custom queries or views utilising data available in the Snowflake information schema in conjunction with TiPS metadata.
Infrastructure
  • TiPS runs entirely within Snowflake and so no additional infrastructure is required to build and execute data pipelines with TiPS.
Data Quality
  • Data quality checks can be written in-line within a data pipeline, with configuration options to abort when a check fails or to continue with a warning.
  • Data quality checks can also be defined as a stand-alone pipeline that can be executed at a defined schedule.
Execution Logs
  • TiPS captures all logging information during the execution of data pipeline (in JSON), storing this in a table once the pipeline execution is complete.  A dedicated view has been created to flatten out the JSON, making it easier to interpret with human eyes.

TiPS: What it isn’t

  1. Scheduling: TiPS does not support scheduling or orchestration. For this we use tools such as Airflow, Control-M, Snowflake tasks or Unix Cron.
  2. Data Ingestion: TiPS supports data transformation (where data is already in accessible staged files or in Snowflake). For data ingestion from external source systems, we use tools such as Fivetran / HVR, Matillion, Rivery.
  3. Notifications: TiPS captures detailed logs for each data pipeline execution, and notifications based upon the content of these logs is left to the scheduling tool.

TiPS was originally written in JavaScript stored procedures to meet our client’s requirement of a tool that would run entirely within Snowflake. The next evolution saw a Python Version of TiPS created where SQL generation took place outside of Snowflake. This was a steppingstone towards a Snowpark (Python) version that restored compliance with the requirement to run entirely within Snowflake.  This exact same Python code, subject to one or two changes as you’ll see below, has now become the basis for our Snowflake Native Apps version of TiPS.

For more detailed information about TiPS including a guide to getting started and our road-map for the future, see our documentation.

TiPS as a Snowflake Native App

Accessing TiPS as a Snowflake Native App means you benefit from the stringent and secure process Snowflake has put in place to review all apps before they are accepted onto the marketplace. This includes checks such as:

  1. Provider Suitability: Every app provider is subject to an assessment process wherein Snowflake makes sure they have a clear vision about the purpose and usability of their app. In our case, a meeting was arranged with a Snowflake Marketplace Specialist before we were approved.
  2. Security Scan: Each app is subject to a rigorous security scan to look for code that could compromise security for anyone using the app. Initially our app was rejected after the scan identified one case of the Python ‘eval’ function, but fortunately refactoring to remove this function was easy to achieve.
  3. Manual end-to-end Test: Finally, the app is tested by a Snowflake Marketplace specialist to ensure it works as intended and in line with the provided documentation. We even received feedback on one or two details within our documentation which we’ve now addressed.

Having passed the above checks, we are delighted that TiPS is now approved and listed in the marketplace, meaning all you need to do is click a button to install it in your Snowflake account. Installation also includes a sample pipeline that can be executed immediately, which checks installation has been successful and shows TiPS in action.  Since everything is hosted on Snowflake, you can rest assured that it is all running in your trusted secure environment and your data is not moving anywhere.

Considerations when developing a Snowflake Native App

Snowflake Native Apps is currently in public preview and still evolving. There are some database capabilities not yet supported but that will hopefully be available in future releases, some that may never be supported (due to security), and some process improvements that would be useful. Our key observations include:

  • Security Scan Notifications: There is currently no notification process, so once you submit your app for review you must periodically check its status manually. We understand that status update notifications will be added soon.
  • Native App as a Virtual Database: Each Native App is installed as a logical entity like a database within Snowflake. Originally TiPS was built on the assumption that it is an entity inside a database with no visibility of objects outside of that database.  Consequently, database objects referenced in data pipelines were qualified with the schema name only (this works seamlessly when cloning). Given the way Snowflake has implemented Native Apps, we had to change TiPS to accept the database name as a parameter to fully qualify all database objects used.
  • Temporary Tables: Support for temporary tables is currently not available in Native Apps. TiPS did use temporary tables to allow concurrent runs of the same data pipelines, each in its own session, and we had to refactor the code with slight loss of capability. This capability will be restored once temporary tables are supported (which we understand could be later this year).
  • Streamlit Custom Components: Native Streamlit components are supported, but not custom Streamlit components. The original TiPS UI relied heavily on custom components (ReactJS), so in our first version of the Native App we had to remove the UI.  We’re currently redeveloping the UI using entirely native Streamlit components which we will release once Snowflake Native Apps support a more recent version of Streamlit; at present only Streamlit 1.22 is supported.

Native Apps are a very exciting new capability in Snowflake. It has been an interesting journey to get to grips with them and we’ve learned a lot along the way.  Aside from Snowflake’s documentation (which is also evolving), there isn’t much information in the public domain right now. However, we would encourage you not to be put off by this as the support offered by the Snowflake Marketplace team is fantastic.   

All you need to do is click a button to install it in your Snowflake account.

If you’d like to know more about TiPS, discuss your own ideas for a Snowflake Native App, or talk about anything related to Snowflake and Data Engineering, please get in touch.

Install TiPS here.

About Projective Group

Established in 2006, Projective Group is a leading Financial Services change specialist. With deep expertise across practices in Data, Payments, Transformation and Risk & Compliance.

We are recognised within the industry as a complete solutions provider, partnering with clients in Financial Services to provide resolutions that are both holistic and pragmatic.  We have evolved to become a trusted partner for companies that want to thrive and prosper in an ever-changing Financial Services landscape.