Unlocking the Power of Public Data: Leveraging Bright Data and Snowflake

Anthony Alteirac
3 min readSep 26, 2023

--

As a partner solution engineer, I became intrigued by the capabilities of the Bright Data platform when it recently became part of the Powered by Snowflake program. In a compelling demonstration, the techTFQ YouTube channel, led by the adept instructor Thoufiq Ahmed, showcased how Bright Data and Snowflake can be harnessed to analyze the Polish rental market. TechTFQ, known for its educational content on SQL, Python, and database concepts in data analytics and data science, delved into a practical use case that answered crucial questions about the rental market in Poland.

Use Case: Analyzing the Polish Rental Market

Under the guidance of Thoufiq, Bright Data played a pivotal role in gathering data from the popular Polish real estate website, Otodom. Subsequently, Snowflake was employed to cleanse, transform, and analyze this data. The use case provided valuable insights into two critical questions:

1. Determining Average Rent in Polish Cities
— What is the average rent in various cities across Poland?

2. Finding the Best Value for Rental Properties
— What is the ideal district or city in Poland to secure the best value for renting a flat within a fixed budget?

The Journey of Data: From Extraction to Analysis

Bright Data serves as a web data platform, allowing businesses to leverage public web data. From this standpoint, it acts as a formidable catalyst for projects and, more broadly, represents an exceptionally convenient avenue for acquiring customized public data to bolster various use cases, including enrichment and AI/ML applications. Notably, Bright Data has integrated Snowflake as its backend engine, providing a connector for direct data publication into your Snowflake account.

Thoufiq’s approach to this data problem was straightforward, starting with the acquisition of data from the Otodom website using Bright Data technology, import in Snowflake and work locally with Python, which implies transporting data again from Snowflake to his development environment. However, this workflow, while suitable for individual analysis, can become problematic for enterprise projects as data volumes grow.

Introducing Snowpark: The Enterprise Solution

Enter Snowpark, the enterprise-grade solution that addresses the challenges of industrializing data pipelines. Snowpark offers a unified programming model that enables the development of data processing applications in Python, Java, or Scala, directly within the Snowflake environment. This eliminates the need for data movement between external systems, resulting in time savings and improved performance.

Key Benefits of Snowpark:
1. Performance: Snowpark code executes within Snowflake, eliminating the need for data transfer. This substantially enhances performance, particularly with large datasets, while also bolstering data governance and reducing risk.

2. Scalability: Designed to scale alongside your data requirements, Snowpark allows for seamless resource expansion as data volumes increase.

3. Flexibility: Snowpark supports a wide array of data processing operations, including data cleaning, transformation, and machine learning. It serves as a versatile tool for diverse data processing tasks.

4. Ease of Use: If you are proficient in Python, Java, or Scala, learning and utilizing Snowpark is a straightforward process, enabling swift development of data processing applications.

A message to Thoufiq: I’d be delighted to collaborate with you on creating a Snowpark video.

Additional Tips and Resources

In conclusion, here are two valuable tips and resources for enhancing your data analytics endeavors:

1. Geolocation Integration: For enterprise projects, consider integrating geolocation data within Snowflake through the Snowflake marketplace: MarketPlace.

2. Streamlined File Uploads: Thoufiq’s use of SnowCLI for file uploads to Stages can now be accomplished directly within Snowsight for tactical file uploads.

Explore these resources for further insights into data analytics and Snowpark:
SQL Data Analytics Project (PART 1) | Data Analyst Portfolio Project

SQL Data Analytics Project (PART 2) | Data Analyst Portfolio Project

SQL Data Analytics Project (PART 3) | Data Analyst Portfolio Project

Snowpark for Python

Bright Data Platform

Bright Data Thoufiq Mohammed

--

--