Chapter 3 — Clipping and Extracting Spatial Data
--
Learning Objectives:
- Use one vector feature to clip another vector feature.
- Filter features based on their attributes.
- Filter features based on their spatial location.
Subsetting and extracting data is a valuable technique for selecting and analyzing a subset of a dataset based on a feature’s location, attribute, or spatial relationship with another dataset.
This article will cover three methods for subsetting and extracting data from a GeoDataFrame: clipping, selecting locations by attribute, and selecting locations based on spatial criteria.
Import libraries:
# Import modules
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
from shapely.geometry import Polygon
We will use shapefiles containing county boundaries of the San Francisco Bay Area and wells located within the Bay Area and the surrounding 50 km radius. To begin, we will load the data and then reproject it.
# Load data
# County boundaries
# Source: https://opendata.mtc.ca.gov/datasets/san-francisco-bay-region-counties-clipped?geometry=-125.590%2C37.123%2C-119.152%2C38.640
counties = gpd.read_file("../_static/e_vector_shapefiles/sf_bay_counties/sf_bay_counties.shp")
# Well locations
# Source: https://gis.data.ca.gov/datasets/3a3e681b894644a9a95f9815aeeeb57f_0?geometry=-123.143%2C36.405%2C-119.230%2C37.175
# Modified by author so that only the well locations within the counties and the surrounding 50 km were kept
wells = gpd.read_file("../_static/e_vector_shapefiles/sf_bay_wells_50km/sf_bay_wells_50km.shp")
# Reproject data to NAD83(HARN) / California Zone 3
# https://spatialreference.org/ref/epsg/2768/
proj = 2768
counties = counties.to_crs(proj)
wells = wells.to_crs(proj)
Additionally, we will generate a rectangle that covers a specific area of the Bay Area. We have already identified the coordinates to be used for this rectangle.
# Create list of coordinate pairs
coordinates = [[1790787, 736108], [1929652, 736108], [1929652, 598414], [1790787, 598414]]
# Create a Shapely polygon from the coordinate-tuple list
poly_shapely = Polygon(coordinates)
# Create a dictionary with needed attributes and required geometry column
attributes_df = {'Attribute': ['name1'], 'geometry': poly_shapely}
# Convert shapely object to a GeoDataFrame
poly = gpd.GeoDataFrame(attributes_df…