18  Reading files with Geopandas

Before doing anything else, we need to import the geopandas package!

The geopandas team don’t recommend using an alias (i.e. we’re not going to shorten the way we refer to geopandas).

import geopandas

18.1 Importing geojsons, geopackages or shape files

When working with prepackaged geographic data types, they will usually be stored in the GeoJSON format, the geopackage (gpkg) format, or as a .shp file.

Warning

Shapefiles are a little more complex as they are a number of files with different extensions that all need to be distributed together - even though it’s only the file with the extension ‘.shp’ that we read in.

Geojson and geopackages are often easier to distribute and download!

18.1.1 Files stored locally

You can refer to a range of geographic data file types stored locally.

countries_gdf = geopandas.read_file("package.gpkg")

18.1.2 Files stored on the web

You can also directly refer to files stored on the web.

df = geopandas.read_file("http://d2ad6b4ur7yvpq.cloudfront.net/naturalearth-3.3.0/ne_110m_land.geojson")

18.1.3 Zipped files

You can also directly refer to files stored as zip files by prefixing the file path with zip:///.

states = geopandas.read_file("zip:///Users/name/Downloads/cb_2017_us_state_500k.zip")

You can read more about file imports in the geopandas documentation, which is embedded below.

19 Exploring geopandas dataframes

Once you’ve read it in, it looks a lot like a pandas dataframe!

Even better, you can do all of your normal pandas commands with it - like ‘head’ to view the first 5 rows.

import geopandas

crime_figures = geopandas.read_file("https://github.com/hsma-programme/h6_3b_advanced_qgis_mapping_python/raw/main/h6_3b_advanced_qgis_and_mapping_in_python/example_code/lsoa_2011_sw5forces_crime_figures.gpkg")

crime_figures.head()
LSOA11CD LSOA11NM LSOA11NMW Area sw_5forces_street_by_lsoa_Anti-social behaviour sw_5forces_street_by_lsoa_Bicycle theft sw_5forces_street_by_lsoa_Burglary sw_5forces_street_by_lsoa_Criminal damage and arson sw_5forces_street_by_lsoa_Drugs sw_5forces_street_by_lsoa_Other crime sw_5forces_street_by_lsoa_Other theft sw_5forces_street_by_lsoa_Possession of weapons sw_5forces_street_by_lsoa_Public order sw_5forces_street_by_lsoa_Robbery sw_5forces_street_by_lsoa_Shoplifting sw_5forces_street_by_lsoa_Theft from the person sw_5forces_street_by_lsoa_Vehicle crime sw_5forces_street_by_lsoa_Violence and sexual offences sw_5forces_street_by_lsoa_Total number crimes geometry
0 E01014370 Bath and North East Somerset 007A Bath and North East Somerset 007A 374339 1476 75 172 181 45 21 445 12 541 34 975 107 34 917 5035 MULTIPOLYGON (((375207.458 165659.881, 375312....
1 E01014371 Bath and North East Somerset 007B Bath and North East Somerset 007B 410758 742 69 57 85 19 18 176 8 288 14 314 44 22 455 2311 MULTIPOLYGON (((375613.903 165217.218, 375635....
2 E01014372 Bath and North East Somerset 007C Bath and North East Somerset 007C 164486 270 15 60 68 13 7 98 1 129 12 8 15 14 330 1040 MULTIPOLYGON (((375273.457 165743.254, 375335....
3 E01014373 Bath and North East Somerset 010A Bath and North East Somerset 010A 1060797 57 3 15 27 3 1 8 0 24 1 1 1 15 55 211 MULTIPOLYGON (((377835.224 168339.576, 377910....
4 E01014374 Bath and North East Somerset 010B Bath and North East Somerset 010B 4972916 28 0 21 10 1 1 10 0 8 0 0 0 5 24 108 MULTIPOLYGON (((378721.671 167591.617, 378472....

When we check the type of the dataframe, we will see that it has come through as a GeoDataFrame

type(crime_figures)
geopandas.geodataframe.GeoDataFrame

19.1 Turning existing data into a GeoDataFrame

However - a lot of the time you may be extracting data from your data warehouse and turning this into a geodataframe.

Let’s go back to our crime dataset from the QGIS section.

import pandas as pd

sw_5forces_stop_and_search_df = pd.read_csv("https://raw.githubusercontent.com/hsma-programme/h6_3b_advanced_qgis_mapping_python/main/h6_3b_advanced_qgis_and_mapping_in_python/example_code/sw_5forces_stop_and_search.csv")

# view the first row
sw_5forces_stop_and_search_df.head(1)
Type Date Part of a policing operation Policing operation Latitude Longitude Gender Age range Self-defined ethnicity Officer-defined ethnicity Legislation Object of search Outcome Outcome linked to object of search Removal of more than just outer clothing Unnamed: 15
0 Person search 2019-06-01T00:02:00+00:00 NaN NaN 51.496817 -2.580971 Male 25-34 White - English/Welsh/Scottish/Northern Irish/... White Police and Criminal Evidence Act 1984 (section 1) Articles for use in criminal damage Arrest False False NaN

Here we’ve imported it as a csv - but if we’d extracted data from a database and saved it as pandas dataframe, the following steps would be the same!

So let’s just check the type first.

type(sw_5forces_stop_and_search_df)
pandas.core.frame.DataFrame

First, we need to know what the columns that identify the geometry are.

In this case, they are ‘Latitude’ and ‘Longitude’

We can now construct a geopandas geodataframe from this .csv file.

sw_5forces_stop_and_search_gdf = geopandas.GeoDataFrame(
    sw_5forces_stop_and_search_df, # Our pandas dataframe
    geometry = geopandas.points_from_xy(
        sw_5forces_stop_and_search_df['Longitude'], # Our 'x' column (horizontal position of points)
        sw_5forces_stop_and_search_df['Latitude'] # Our 'y' column (vertical position of points)
        ),
    crs = 'EPSG:4326' # the coordinate reference system of the data - use EPSG:4326 if you are unsure
    )

Let’s view this new object.

sw_5forces_stop_and_search_gdf.head()
Type Date Part of a policing operation Policing operation Latitude Longitude Gender Age range Self-defined ethnicity Officer-defined ethnicity Legislation Object of search Outcome Outcome linked to object of search Removal of more than just outer clothing Unnamed: 15 geometry
0 Person search 2019-06-01T00:02:00+00:00 NaN NaN 51.496817 -2.580971 Male 25-34 White - English/Welsh/Scottish/Northern Irish/... White Police and Criminal Evidence Act 1984 (section 1) Articles for use in criminal damage Arrest False False NaN POINT (-2.58097 51.49682)
1 Person search 2019-06-01T01:15:00+00:00 NaN NaN 51.454085 -2.599742 Male 25-34 Other ethnic group - Not stated White Misuse of Drugs Act 1971 (section 23) Controlled drugs A no further action disposal True False NaN POINT (-2.59974 51.45408)
2 Person search 2019-06-01T01:27:00+00:00 NaN NaN 50.983714 -3.219592 Male 25-34 White - English/Welsh/Scottish/Northern Irish/... White Misuse of Drugs Act 1971 (section 23) Controlled drugs A no further action disposal NaN False NaN POINT (-3.21959 50.98371)
3 Person search 2019-06-01T01:27:00+00:00 NaN NaN 50.983714 -3.219592 Male over 34 White - English/Welsh/Scottish/Northern Irish/... White Misuse of Drugs Act 1971 (section 23) Controlled drugs A no further action disposal NaN False NaN POINT (-3.21959 50.98371)
4 Person search 2019-06-01T01:27:00+00:00 NaN NaN 50.983714 -3.219592 Male over 34 White - English/Welsh/Scottish/Northern Irish/... White Misuse of Drugs Act 1971 (section 23) Controlled drugs A no further action disposal NaN False NaN POINT (-3.21959 50.98371)

And let’s view the type of object it is.

type(sw_5forces_stop_and_search_gdf)
geopandas.geodataframe.GeoDataFrame

19.2 Joining area data to boundary data

We can also combine pandas dataframes with geopandas dataframes.

When might we want to do this?

Imagine we have a dataset of patients who are using a particular type of service.

We can use pandas to count the number of patients per LSOA.

However - the LSOA code alone isn’t going to allow us to plot this dataset - it doesn’t contain the geometry.

Instead, we

  • import a shapefile, geoJSON or geopackage of boundaries
  • join it to our pandas dataframe using a common column (like LSOA code)

If we join our dataframe to our geodataframe, the result will be a geodataframe - so you can make use of all the useful features of geodataframes.

my_lsoa_boundary_gdf = geopandas.read_file("lsoa_boundaries.gpkg")

my_count_df = pd.read_csv(“counts_by_lsoa.csv”)

Let’s imagine the geodataframe has a column called ‘LSOA11CD’

The count dataframe has a column called ‘LSOA’

my_final_df = pd.merge(
    left=my_lsoa_boundary_gdf,
    right=my_count_df,
    left_on=”LSOA11CD”
    right_on=”LSOA”
    how=”right”
)
Warning

We need to be careful about the order we join things in to ensure we end up with the right type of object at the end.

“The stand-alone merge function will work if the GeoDataFrame is in the left argument; if a DataFrame is in the left argument and a GeoDataFrame is in the right position, the result will no longer be a GeoDataFrame.” - https://geopandas.org/en/v0.8.0/mergingdata.html”

This would result in a geodataframe:

my_final_df = pd.merge(
    left=my_lsoa_boundary_gdf,
    right=my_count_df,
    left_on=”LSOA11CD”
    right_on=”LSOA”
    how=”right”
    )

But this would not.

my_final_df = pd.merge(
    left=my_count_df,
    left=my_lsoa_boundary_gdf,
    left_on=”LSOA”
    right_on=”LSOA11CD”
    how="left"
)
The ‘how’ argument

The ‘how’ argument

If you set how = ‘left’, all of the rows from the geodataframe will be kept, even if there is no value in your dataframe of counts

If you set how = ‘right’, all of the rows from the counts dataframe will be kept, even if there is no value in your geodataframe Check you have no missing values in the ‘geometry’ column after this!

If you set how = ‘full’, all of the rows from both dataframes will be kept - so you may end up with empty geometry in some cases and/or empty counts in others