🏠 Zillow Housing Data Scraper & Cleaner

This project is a Python-based web scraping and data processing pipeline for collecting and cleaning real estate listing data from Zillow-style property sites. It handles session management, parsing structured listing data, and producing analysis-ready CSV files for downstream modeling or analytics.

It is released here as a standalone, reusable data collection tool.

🚀 Features

Scrapes housing listings from Zillow-style endpoints
Handles session setup and request flow
Parses and normalizes listing data
Outputs clean CSV files for analysis
Includes backup + recovery JSON snapshots
Modular architecture for reuse in other projects

📁 Project Structure

.
├── data/
│   ├── raw/                # Raw scraped CSV output
│   ├── processed/         # Cleaned / analysis-ready CSVs
│   └── tmp/               # Backup JSON + intermediate files
├── notebooks/
│   └── sanity_check.ipynb # Quick inspection & validation
├── src/
│   ├── scraper.py         # Core scraping logic
│   ├── session.py         # Session & request handling
│   ├── parser.py          # Parsing + field normalization
│   ├── build.py           # Data pipeline orchestration
│   └── main.py            # Entry point / CLI-style runner
├── .env                   # Local environment variables
├── .gitignore
├── README.md

⚙️ Setup

Create a virtual environment

python -m venv .venv

source .venv/bin/activate
Install dependencies

pip install -r requirements.txt

▶️ Usage

Run the main pipeline:

python -m src.main

This will: • Initialize a session

• Scrape housing data

• Parse data

• Clean data

• Save results to data/raw/ and data/processed/

📊 Output

data/raw/irving_tx_housing.csv → raw scraped data

data/processed/irving_tx_housing_clean.csv → cleaned dataset

data/tmp/*.json → backup snapshots for recovery / debugging

🧠 Design Philosophy

This project is structured to be:

• Modular

• Reusable

• Analysis-friendly

• Easy to extend for other cities or data sources

It separates concerns between:

Session handling
Scraping
Parsing
Pipeline orchestration

⚠️ Legal & Ethical Note

This project is for educational and research purposes only. Users are responsible for complying with the terms of service of any website they access and for respecting robots.txt, rate limits, and local laws.

🔧 Future Ideas

CLI arguments for city, state, and filters

Support for additional listing platforms

Database export (Postgres / DuckDB)

Geospatial enrichment (school zones, crime, transit, etc.)

👤 Author

Built by Bilal Haroon

Data Science · ML · Systems · Open Source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏠 Zillow Housing Data Scraper & Cleaner

🚀 Features

📁 Project Structure

⚙️ Setup

▶️ Usage

📊 Output

🧠 Design Philosophy

⚠️ Legal & Ethical Note

🔧 Future Ideas

👤 Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
zillowscraper.py		zillowscraper.py

Bilalharoon/HousingDataScraper

Folders and files

Latest commit

History

Repository files navigation

🏠 Zillow Housing Data Scraper & Cleaner

🚀 Features

📁 Project Structure

⚙️ Setup

▶️ Usage

📊 Output

🧠 Design Philosophy

⚠️ Legal & Ethical Note

🔧 Future Ideas

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages