Add Snowflake Horizon community contributed connector#30
Conversation
managed-connectivity/community-contributed-connectors/snowflake-horizon-connector/README.md
Outdated
Show resolved
Hide resolved
...ectors/snowflake-horizon-connector/custom-connector/snowflake_to_dataplex_metadata_loader.py
Show resolved
Hide resolved
...ectors/snowflake-horizon-connector/custom-connector/snowflake_to_dataplex_metadata_loader.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Can this file be removed since it doesn't contain any useful info?
There was a problem hiding this comment.
Done. Please review.
managed-connectivity/community-contributed-connectors/snowflake-horizon-connector/README.md
Outdated
Show resolved
Hide resolved
managed-connectivity/community-contributed-connectors/snowflake-horizon-connector/README.md
Outdated
Show resolved
Hide resolved
| Foreword | ||
| In today's complex data landscape, organizations increasingly recognize data as a critical asset. The ability to effectively discover, understand, and govern this data is paramount for informed decision-making, regulatory compliance, and innovation. As data ecosystems grow, spanning various platforms and technologies, maintaining a holistic view of data assets becomes challenging. | ||
| This connector addresses a key need for many enterprises: bridging the gap between their data warehousing capabilities and the comprehensive data governance and discovery features offered by Google Cloud's Dataplex. Dataplex provides a unified data fabric to manage, monitor, and govern data across diverse environments within Google Cloud. | ||
| The Snowflake to Dataplex Data Catalog Connector, detailed in this guide, is a testament to the power of seamless integration. It's designed to automate metadata synchronization, bringing the rich context of your data into sataplex. This not only enhances data visibility and accessibility for all stakeholders but also strengthens your data governance by centralizing metadata management, lineage tracking, and data quality initiatives. |
There was a problem hiding this comment.
Fix type: sataplex -> dataplex.
There was a problem hiding this comment.
Done. Please review.
managed-connectivity/community-contributed-connectors/snowflake-horizon-connector/README.md
Outdated
Show resolved
Hide resolved
| SNOWFLAKE_WAREHOUSE = 'Enter your Warehouse' | ||
| SNOWFLAKE_DATABASE = 'Enter your Database' | ||
| SNOWFLAKE_SCHEMA = 'Enter your Schema' |
There was a problem hiding this comment.
Move all user input to top and mention in README.
There was a problem hiding this comment.
Done in both Script and README as well. Please review.
...ectors/snowflake-horizon-connector/custom-connector/snowflake_to_dataplex_metadata_loader.py
Outdated
Show resolved
Hide resolved
...ectors/snowflake-horizon-connector/custom-connector/snowflake_to_dataplex_metadata_loader.py
Outdated
Show resolved
Hide resolved
...ectors/snowflake-horizon-connector/custom-connector/snowflake_to_dataplex_metadata_loader.py
Show resolved
Hide resolved
| print(entry_id) | ||
|
|
||
| entry_group_id = "snowflakehorizongrp" #enter your entry group id for snowflake | ||
| entry_type_id = "snowhorizondb" #enter your entry type id for snowflake dbs |
There was a problem hiding this comment.
These are also user configs? Please move to top and elaborate.
There was a problem hiding this comment.
These are part of one-time set up activity to be performed by the User(Step 3 in README). I have updated this in a detail now in the Step 3 in README file as well. Also, these cannot be moved to the top as these are section specific i.e for ex: for Snowflake Databases it is "snowhorizondb", for Snowflake Tags it is "snowhorizontag". Hence, I have added a detailed comment in the script now so that the user knows that it is coming from one-time setup steps in README file.
managed-connectivity/community-contributed-connectors/snowflake-horizon-connector/README.md
Show resolved
Hide resolved
managed-connectivity/community-contributed-connectors/snowflake-horizon-connector/README.md
Outdated
Show resolved
Hide resolved
|
|
||
| #### Step 2: Storing the connecting details in Secret Manager. | ||
|
|
||
| So, In **Google Cloud Console** -> Navigate to **Secret Manager** -> **Create Secret** -> |
There was a problem hiding this comment.
Done. Please Review.
managed-connectivity/community-contributed-connectors/snowflake-horizon-connector/README.md
Outdated
Show resolved
Hide resolved
managed-connectivity/community-contributed-connectors/snowflake-horizon-connector/README.md
Show resolved
Hide resolved
managed-connectivity/community-contributed-connectors/snowflake-horizon-connector/README.md
Show resolved
Hide resolved
| * SNOWFLAKE_DATABASE = 'Enter your Database' | ||
| * SNOWFLAKE_SCHEMA = 'Enter your Schema' | ||
|
|
||
| This process involves several steps: |
There was a problem hiding this comment.
This can be removed since the steps are detailed above.
There was a problem hiding this comment.
Done. Please Review.
| * Then, retrieving Snowflake connection details from the secret manager. | ||
| * Following this, a connection to Snowflake is established to extract Horizon catalog data, which is subsequently loaded directly into Dataplex. | ||
|
|
||
| #### Step 5: Validate everything in Dataplex. No newline at end of file |
There was a problem hiding this comment.
Please add details on how to validate, like navigate to the EntryGroup you specified or search using a particular syntax etc.
There was a problem hiding this comment.
Done. Please Review.
...ectors/snowflake-horizon-connector/custom-connector/snowflake_to_dataplex_metadata_loader.py
Show resolved
Hide resolved
...ectors/snowflake-horizon-connector/custom-connector/snowflake_to_dataplex_metadata_loader.py
Show resolved
Hide resolved
|
|
||
| This connector addresses a key need for many enterprises: bridging the gap between their data warehousing capabilities and the comprehensive data governance and discovery features offered by Google Cloud's Dataplex. Dataplex provides a unified data fabric to manage, monitor, and govern data across diverse environments within Google Cloud. | ||
|
|
||
| The **Snowflake to Dataplex Data Catalog Connector**, detailed here, is a testament to the power of seamless integration. It's designed to automate metadata synchronization, bringing the rich context of your data into dataplex. This not only enhances data visibility and accessibility for all stakeholders but also strengthens your data governance by centralizing metadata management, lineage tracking, and data quality initiatives. |
There was a problem hiding this comment.
Please use Dataplex Universal Catalog instead of Dataplex Data Catalog everywhere to avoid confusion with the legacy Data Catalog product.
There was a problem hiding this comment.
Done. Have checked it everywhere now. Thanks!
managed-connectivity/community-contributed-connectors/snowflake-horizon-connector/README.md
Show resolved
Hide resolved
| #### Step 1: Setting up Snowflake Environment from where you have to load the metadata. | ||
| To access the Horizon catalog in Snowflake, you will need to use the **ACCOUNT_USAGE** views located under the **SNOWFLAKE** database. | ||
|
|
||
|  |
There was a problem hiding this comment.
Images are not loading correctly. Please use syntax: <img alt="create workflow" src="images/create_workflow.png" width="600">. Please see example here: https://github.com/GoogleCloudPlatform/cloud-dataplex/blob/main/managed-connectivity/cloud-workflows/README.md?plain=1
There was a problem hiding this comment.
Done. Have used the above mentioned syntax for all the images. Please Review.
|  | ||
| * Similarly, add all the fields(metadata that you are trying to bring from Snowflake Horizon) and click on "Save":- | ||
|  | ||
| Similarly, create all the aspect types mentioned above one by one. Required Fields are mentioned in the python script for each aspect type(if you plan to execute the script as it is). |
There was a problem hiding this comment.
It might be better to list all the aspectType templates and entryType definitions here, rather than asking users to find them in the script. Better yet, if you can create a bash script to create these entryTypes, aspectTypes and entryGroups, it might provide a more seamless experience. For example: https://github.com/GoogleCloudPlatform/cloud-dataplex/blob/main/managed-connectivity/cloud-workflows/samples/scripts/gcloud/execute.sh
There was a problem hiding this comment.
Thank you for the feedback. I have updated the README.md to explicitly list all AspectType and EntryType definitions with screenshots so users don't have to search the script.
Regarding the suggestion for a bash script: while I agree automation is generally beneficial, I purposefully opted for a manual setup guide for this specific connector to prioritize customer flexibility:
- Flexibility & Customization: Snowflake environments and business metadata requirements vary widely across organizations. By providing clear visual templates in the README, customers can easily adapt the naming conventions, field types, and descriptions to align with their specific business use cases before creation.
- Avoiding "Locked" Schemas: A bash script often creates a black box experience. If customers execute a script and later realize the schema doesn't fit their needs, modifying the established Dataplex types can be more complex than getting it right the first time via a guided manual process.
- Empowerment: This approach guides the user through the "why" of the setup, ensuring they maintain full control over their metadata governance from the start, making it easier for them to troubleshoot or extend the connector in the future.
6ef778a to
5f74ee6
Compare
No description provided.