Remark
Please be aware that these lecture notes are accessible online in an ‘early access’ format. They are actively being developed, and certain sections will be further enriched to provide a comprehensive understanding of the subject matter.
3.6. Lab: Handling Vector Data Models using Python#
3.6.1. Setting Up the Environment#
Before we begin our analysis, we need to import the necessary libraries. These libraries are essential for handling geospatial data and performing GIS operations in Python:
import pandas as pd
import shapely
import geopandas as gpd
import folium
pandas: This powerful data manipulation library provides data structures for efficiently storing and analyzing large datasets. It offers tools for reading and writing data in various formats, data cleaning, merging, and aggregation. In GIS analysis, pandas is often used for handling attribute data associated with spatial features.
shapely: This library is specifically designed for manipulating and analyzing geometric objects in the Cartesian plane. It provides a set of geometric operations like intersections, unions, and buffering. In GIS work, shapely is crucial for working with the geometry of spatial features, such as points, lines, and polygons.
geopandas: This library extends the capabilities of pandas to work with geospatial data. It combines the data analysis tools of pandas with the geometric operations of shapely. GeoPandas introduces the GeoDataFrame, which is like a pandas DataFrame but with a special column for storing geometry objects. This allows for easy integration of spatial operations with attribute-based analysis.
folium: This library builds on the data wrangling strengths of Python and the mapping capabilities of the Leaflet.js library. Folium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map.
These libraries work together seamlessly:
pandas handles the tabular data aspect
shapely provides the geometric operations
geopandas combines both to create a powerful toolkit for geospatial analysis
folium allows for interactive visualization of geospatial data
Package |
Description |
Documentation |
---|---|---|
Data manipulation and analysis library |
||
Library for manipulation and analysis of geometric objects |
||
Extension of pandas for geospatial data |
||
Library for creating interactive maps |
3.6.2. Geospatial Data Models#
In Geographic Information Systems (GIS), vector data is represented using three fundamental geometric primitives: points, lines, and polygons. These primitives form the basis of the Simple Features standard, which defines a common way to store and manipulate spatial data.
3.6.2.1. Point#
A point represents a single location in space, defined by its coordinates. In this example, we’ll create and visualize points representing major cities in Alberta. The coordinates for these cities were obtained from LatLong.net, a reliable source for geographical coordinates.
Show code cell source
import geopandas as gpd
from shapely.geometry import Point
import folium
# Create a list of major Alberta cities with their coordinates
cities = [
("Calgary", 51.049999, -114.066666),
("Edmonton", 53.545883, -113.490112),
("Red Deer", 52.268112, -113.811241),
("Lethbridge", 49.694168, -112.832779),
("Fort McMurray", 56.732014, -111.375961),
("Medicine Hat", 50.040278, -110.679167)
]
# Create a GeoDataFrame
gdf = gpd.GeoDataFrame(
[{"City": city, "geometry": Point(lon, lat)} for city, lat, lon in cities],
crs="EPSG:4326"
)
# Display the GeoDataFrame
display(gdf)
# Create an interactive map
m = gdf.explore(
column="City",
tooltip="City",
marker_kwds={"radius": 8},
cmap="Set1",
legend=False,
categorical=True,
tiles="CartoDB positron"
)
# Add city labels
for idx, row in gdf.iterrows():
folium.Marker(
location=[row.geometry.y, row.geometry.x],
popup=row["City"],
icon=folium.DivIcon(
html=f'<div style="font-size: 10pt">{row["City"]}</div>')
).add_to(m)
# Display the map
display(m)
City | geometry | |
---|---|---|
0 | Calgary | POINT (-114.06667 51.05) |
1 | Edmonton | POINT (-113.49011 53.54588) |
2 | Red Deer | POINT (-113.81124 52.26811) |
3 | Lethbridge | POINT (-112.83278 49.69417) |
4 | Fort McMurray | POINT (-111.37596 56.73201) |
5 | Medicine Hat | POINT (-110.67917 50.04028) |
This example demonstrates several key concepts:
Creating Point geometries: We use the
Point
class from shapely to create point geometries for each city.Building a GeoDataFrame: We construct a GeoDataFrame, which is a pandas DataFrame with a special column (‘geometry’) for spatial data.
Coordinate Reference System (CRS): We specify the CRS as EPSG:4326, which is the standard for latitude and longitude coordinates.
Interactive Visualization: We use geopandas’
explore
method to create an interactive map, and enhance it with folium to add custom labels.Attribute Data: Each point (city) has associated attribute data (its name), demonstrating how GIS combines spatial and non-spatial information.
3.6.2.2. LineString#
A LineString represents a sequence of points connected by straight line segments. It’s a fundamental geometric object used to model linear features or connections between points in space.
A simple example connecting Edmonton, Red Deer, and Calgary:
Show code cell source
import geopandas as gpd
from shapely.geometry import Point, LineString
import folium
# Define cities and their coordinates
cities = [
("Edmonton", 53.545883, -113.490112),
("Red Deer", 52.268112, -113.811241),
("Calgary", 51.049999, -114.066666)
]
# Create a GeoDataFrame for cities
city_gdf = gpd.GeoDataFrame(
[{"City": city, "geometry": Point(lon, lat)} for city, lat, lon in cities],
crs="EPSG:4326"
)
# Create a LineString connecting the cities
city_connection = LineString([(lon, lat) for _, lat, lon in cities])
# Create a GeoDataFrame for the connection
connection_gdf = gpd.GeoDataFrame(
[{"Name": "City Connection", "geometry": city_connection}],
crs="EPSG:4326"
)
# Create a base map
m = folium.Map(location=[53, -113.5],
zoom_start=6,
tiles="CartoDB positron")
# Add the connection line to the map
folium.GeoJson(connection_gdf, style_function=lambda x: {
'color': 'blue', 'weight': 2}).add_to(m)
# Add city markers
for idx, row in city_gdf.iterrows():
folium.Marker(
location=[row.geometry.y, row.geometry.x],
popup=row["City"],
icon=folium.Icon(color='red', icon='info-sign')
).add_to(m)
# Display the map
display(m)
# Calculate and print the total length of the connection
length_km = connection_gdf.to_crs(
'+proj=utm +zone=11 +datum=WGS84').length.values[0] / 1000
print(f"Total length of connection: {length_km:.2f} km")
Total length of connection: 280.56 km
This example demonstrates key concepts of LineString:
Creating a LineString: We use the
LineString
class to create a geometry connecting the cities in sequence.Visualization: The map shows how LineString represents a direct connection between points.
Length Calculation: We calculate the total length of the LineString, demonstrating a basic spatial analysis function.
Simplicity and Abstraction: This representation simplifies complex real-world connections to their essence - a series of connected points.
LineStrings are versatile in GIS:
They can model any sequence of connected points in space.
Useful for representing paths, boundaries, or connections between locations.
Allow for length calculations and other spatial analyses.
Can be combined to form more complex linear networks.
This example illustrates the basic use of LineString to connect points in space, a fundamental concept in GIS and spatial analysis.
3.6.2.3. Polygon#
A Polygon represents a closed area defined by an outer ring and optional inner rings (holes). Polygons are used to model features with area and perimeter, such as administrative boundaries, lakes, or land parcels.
An example creating a simplified polygon for Banff National Park:
Show code cell source
import geopandas as gpd
from shapely.geometry import Polygon, Point
import folium
# Define approximate coordinates for Banff National Park corners
banff_coordinates = [
(-116.5, 51.2), # Northwest
(-115.5, 51.2), # Northeast
(-115.5, 50.5), # Southeast
(-116.5, 50.5), # Southwest
(-116.5, 51.2) # Close the polygon
]
# Create a Polygon
banff_polygon = Polygon(banff_coordinates)
# Create a GeoDataFrame for Banff National Park
park_gdf = gpd.GeoDataFrame(
[{"Name": "Banff National Park", "geometry": banff_polygon}],
crs="EPSG:4326"
)
# Define some points of interest within and around Banff
poi = [
("Banff Town", 51.1784, -115.5708),
("Lake Louise", 51.4254, -116.1773),
("Canmore", 51.0892, -115.3441)
]
# Create a GeoDataFrame for points of interest
poi_gdf = gpd.GeoDataFrame(
[{"Name": name, "geometry": Point(lon, lat)} for name, lat, lon in poi],
crs="EPSG:4326"
)
# Create a base map
m = folium.Map(location=[51.0, -115.9],
zoom_start=8,
tiles="CartoDB positron")
# Add the park polygon to the map
folium.GeoJson(
park_gdf,
style_function=lambda x: {'fillColor': 'green', 'color': 'black'}
).add_to(m)
# Add point of interest markers
for idx, row in poi_gdf.iterrows():
folium.Marker(
location=[row.geometry.y, row.geometry.x],
popup=row["Name"],
icon=folium.Icon(color='red', icon='info-sign')
).add_to(m)
# Display the map
display(m)
# Calculate and print the area of the park
area_km2 = park_gdf.to_crs(
'+proj=utm +zone=11 +datum=WGS84').area.values[0] / 1e6
print(f"Approximate area of Banff National Park: {area_km2:.2f} km²")
Approximate area of Banff National Park: 5480.15 km²
This example demonstrates several key concepts:
Creating a Polygon: We use the
Polygon
class from shapely to create a geometry representing a simplified boundary of Banff National Park.GeoDataFrames for different geometries: We create separate GeoDataFrames for the park (polygon) and points of interest (points), showing how different geometric types can be used together.
Visualization: We use folium to create an interactive map, adding both the park polygon and point markers.
Styling: We apply a style to the polygon to make it visually distinct on the map.
Area Calculation: We demonstrate how to calculate the area of the polygon, which is a common operation with this geometry type.
Real-world application: This example models an actual geographic feature, showing how polygons can represent bounded areas in GIS.
Polygons are versatile in GIS applications:
They represent areas with defined boundaries, such as parks, lakes, administrative regions, or land use zones.
Polygons allow for area and perimeter calculations.
They can be used for spatial queries like containment (e.g., which points fall within the polygon).
More complex shapes can be represented using MultiPolygon objects or polygons with holes.
This example illustrates how Polygons can be used to model and visualize bounded areas, a fundamental concept in many GIS applications. The simplification of Banff’s boundaries is for demonstration purposes; real GIS data would use more detailed and accurate polygon representations.
3.6.3. Multi-part Geometries#
The Simple Features standard also defines multi-part versions of the basic geometric primitives. These multi-part geometries allow for the representation of more complex spatial features that consist of multiple geometries of the same type. Let’s explore each of these with examples:
3.6.3.1. MultiPoint#
A MultiPoint is a collection of Point geometries. It’s useful for representing multiple discrete locations that are part of a single feature.
Show code cell source
from shapely.geometry import MultiPoint
import geopandas as gpd
# Create a MultiPoint representing multiple cities
cities = MultiPoint([(-114.066666, 51.049999), # Calgary
(-113.490112, 53.545883), # Edmonton
(-113.811241, 52.268112)]) # Red Deer
# Create a GeoDataFrame
multipoint_gdf = gpd.GeoDataFrame([{'name': 'Major Cities', 'geometry': cities}], crs="EPSG:4326")
display(multipoint_gdf)
# Plot the MultiPoint
multipoint_gdf.explore(marker_kwds={'radius': 6},
tiles="CartoDB positron")
name | geometry | |
---|---|---|
0 | Major Cities | MULTIPOINT ((-114.06667 51.05), (-113.49011 53... |
Examples of use cases for MultiPoint include:
Representing a cluster of oil wells in a field
Mapping multiple store locations for a retail chain
Showing sampling points in an environmental study
3.6.3.2. MultiLineString#
A MultiLineString is a collection of LineString geometries. It’s useful for representing features that consist of multiple linear elements.
Show code cell source
from shapely.geometry import MultiLineString
# Create a MultiLineString representing multiple highways
highways = MultiLineString([
[(-114.066666, 51.049999), (-113.490112, 53.545883)], # Highway 2 (simplified)
# Trans-Canada Highway (simplified)
[(-114.066666, 51.049999), (-110.679167, 50.040278)]
])
# Create a GeoDataFrame
multiline_gdf = gpd.GeoDataFrame(
[{'name': 'Major Highways', 'geometry': highways}], crs="EPSG:4326")
# Plot the MultiLineString
multiline_gdf.explore(color='red',
tiles="CartoDB positron")
Examples of use cases for MultiLineString include:
Representing a river system with multiple tributaries
Mapping a network of hiking trails in a park
3.6.3.3. MultiPolygon#
A MultiPolygon is a collection of Polygon geometries. It’s useful for representing features that consist of multiple separate areas.
Show code cell source
from shapely.geometry import MultiPolygon
# Create a MultiPolygon representing multiple parks (simplified boundaries)
parks = MultiPolygon([
Polygon([(-116.5, 51.2), (-115.5, 51.2), (-115.5, 50.5),
(-116.5, 50.5)]), # Banff (simplified)
Polygon([(-117.0, 52.5), (-116.0, 52.5), (-116.0, 51.8),
(-117.0, 51.8)]) # Jasper (simplified)
])
# Create a GeoDataFrame
multipolygon_gdf = gpd.GeoDataFrame(
[{'name': 'National Parks', 'geometry': parks}], crs="EPSG:4326")
# Plot the MultiPolygon
multipolygon_gdf.explore(color='green',
tiles="CartoDB positron")
Examples of use cases for MultiPolygon include:
Representing a group of islands
Mapping discontinuous areas of a particular land use type
Showing multiple administrative regions as a single feature