import geopandas
18 Reading files with Geopandas
Before doing anything else, we need to import the geopandas package!
The geopandas team don’t recommend using an alias (i.e. we’re not going to shorten the way we refer to geopandas).
18.1 Importing geojsons, geopackages or shape files
When working with prepackaged geographic data types, they will usually be stored in the GeoJSON format, the geopackage (gpkg) format, or as a .shp file.
Shapefiles are a little more complex as they are a number of files with different extensions that all need to be distributed together - even though it’s only the file with the extension ‘.shp’ that we read in.
Geojson and geopackages are often easier to distribute and download!
18.1.1 Files stored locally
You can refer to a range of geographic data file types stored locally.
= geopandas.read_file("package.gpkg") countries_gdf
18.1.2 Files stored on the web
You can also directly refer to files stored on the web.
= geopandas.read_file("http://d2ad6b4ur7yvpq.cloudfront.net/naturalearth-3.3.0/ne_110m_land.geojson") df
18.1.3 Zipped files
You can also directly refer to files stored as zip files by prefixing the file path with zip:///.
= geopandas.read_file("zip:///Users/name/Downloads/cb_2017_us_state_500k.zip") states
You can read more about file imports in the geopandas documentation, which is embedded below.
19 Exploring geopandas dataframes
Once you’ve read it in, it looks a lot like a pandas dataframe!
Even better, you can do all of your normal pandas commands with it - like ‘head’ to view the first 5 rows.
import geopandas
= geopandas.read_file("https://github.com/hsma-programme/h6_3b_advanced_qgis_mapping_python/raw/main/h6_3b_advanced_qgis_and_mapping_in_python/example_code/lsoa_2011_sw5forces_crime_figures.gpkg")
crime_figures
crime_figures.head()
LSOA11CD | LSOA11NM | LSOA11NMW | Area | sw_5forces_street_by_lsoa_Anti-social behaviour | sw_5forces_street_by_lsoa_Bicycle theft | sw_5forces_street_by_lsoa_Burglary | sw_5forces_street_by_lsoa_Criminal damage and arson | sw_5forces_street_by_lsoa_Drugs | sw_5forces_street_by_lsoa_Other crime | sw_5forces_street_by_lsoa_Other theft | sw_5forces_street_by_lsoa_Possession of weapons | sw_5forces_street_by_lsoa_Public order | sw_5forces_street_by_lsoa_Robbery | sw_5forces_street_by_lsoa_Shoplifting | sw_5forces_street_by_lsoa_Theft from the person | sw_5forces_street_by_lsoa_Vehicle crime | sw_5forces_street_by_lsoa_Violence and sexual offences | sw_5forces_street_by_lsoa_Total number crimes | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | E01014370 | Bath and North East Somerset 007A | Bath and North East Somerset 007A | 374339 | 1476 | 75 | 172 | 181 | 45 | 21 | 445 | 12 | 541 | 34 | 975 | 107 | 34 | 917 | 5035 | MULTIPOLYGON (((375207.458 165659.881, 375312.... |
1 | E01014371 | Bath and North East Somerset 007B | Bath and North East Somerset 007B | 410758 | 742 | 69 | 57 | 85 | 19 | 18 | 176 | 8 | 288 | 14 | 314 | 44 | 22 | 455 | 2311 | MULTIPOLYGON (((375613.903 165217.218, 375635.... |
2 | E01014372 | Bath and North East Somerset 007C | Bath and North East Somerset 007C | 164486 | 270 | 15 | 60 | 68 | 13 | 7 | 98 | 1 | 129 | 12 | 8 | 15 | 14 | 330 | 1040 | MULTIPOLYGON (((375273.457 165743.254, 375335.... |
3 | E01014373 | Bath and North East Somerset 010A | Bath and North East Somerset 010A | 1060797 | 57 | 3 | 15 | 27 | 3 | 1 | 8 | 0 | 24 | 1 | 1 | 1 | 15 | 55 | 211 | MULTIPOLYGON (((377835.224 168339.576, 377910.... |
4 | E01014374 | Bath and North East Somerset 010B | Bath and North East Somerset 010B | 4972916 | 28 | 0 | 21 | 10 | 1 | 1 | 10 | 0 | 8 | 0 | 0 | 0 | 5 | 24 | 108 | MULTIPOLYGON (((378721.671 167591.617, 378472.... |
When we check the type of the dataframe, we will see that it has come through as a GeoDataFrame
type(crime_figures)
geopandas.geodataframe.GeoDataFrame
19.1 Turning existing data into a GeoDataFrame
However - a lot of the time you may be extracting data from your data warehouse and turning this into a geodataframe.
Let’s go back to our crime dataset from the QGIS section.
import pandas as pd
= pd.read_csv("https://raw.githubusercontent.com/hsma-programme/h6_3b_advanced_qgis_mapping_python/main/h6_3b_advanced_qgis_and_mapping_in_python/example_code/sw_5forces_stop_and_search.csv")
sw_5forces_stop_and_search_df
# view the first row
1) sw_5forces_stop_and_search_df.head(
Type | Date | Part of a policing operation | Policing operation | Latitude | Longitude | Gender | Age range | Self-defined ethnicity | Officer-defined ethnicity | Legislation | Object of search | Outcome | Outcome linked to object of search | Removal of more than just outer clothing | Unnamed: 15 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Person search | 2019-06-01T00:02:00+00:00 | NaN | NaN | 51.496817 | -2.580971 | Male | 25-34 | White - English/Welsh/Scottish/Northern Irish/... | White | Police and Criminal Evidence Act 1984 (section 1) | Articles for use in criminal damage | Arrest | False | False | NaN |
Here we’ve imported it as a csv - but if we’d extracted data from a database and saved it as pandas dataframe, the following steps would be the same!
So let’s just check the type first.
type(sw_5forces_stop_and_search_df)
pandas.core.frame.DataFrame
First, we need to know what the columns that identify the geometry are.
In this case, they are ‘Latitude’ and ‘Longitude’
We can now construct a geopandas geodataframe from this .csv file.
= geopandas.GeoDataFrame(
sw_5forces_stop_and_search_gdf # Our pandas dataframe
sw_5forces_stop_and_search_df, = geopandas.points_from_xy(
geometry 'Longitude'], # Our 'x' column (horizontal position of points)
sw_5forces_stop_and_search_df['Latitude'] # Our 'y' column (vertical position of points)
sw_5forces_stop_and_search_df[
),= 'EPSG:4326' # the coordinate reference system of the data - use EPSG:4326 if you are unsure
crs )
Let’s view this new object.
sw_5forces_stop_and_search_gdf.head()
Type | Date | Part of a policing operation | Policing operation | Latitude | Longitude | Gender | Age range | Self-defined ethnicity | Officer-defined ethnicity | Legislation | Object of search | Outcome | Outcome linked to object of search | Removal of more than just outer clothing | Unnamed: 15 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Person search | 2019-06-01T00:02:00+00:00 | NaN | NaN | 51.496817 | -2.580971 | Male | 25-34 | White - English/Welsh/Scottish/Northern Irish/... | White | Police and Criminal Evidence Act 1984 (section 1) | Articles for use in criminal damage | Arrest | False | False | NaN | POINT (-2.58097 51.49682) |
1 | Person search | 2019-06-01T01:15:00+00:00 | NaN | NaN | 51.454085 | -2.599742 | Male | 25-34 | Other ethnic group - Not stated | White | Misuse of Drugs Act 1971 (section 23) | Controlled drugs | A no further action disposal | True | False | NaN | POINT (-2.59974 51.45408) |
2 | Person search | 2019-06-01T01:27:00+00:00 | NaN | NaN | 50.983714 | -3.219592 | Male | 25-34 | White - English/Welsh/Scottish/Northern Irish/... | White | Misuse of Drugs Act 1971 (section 23) | Controlled drugs | A no further action disposal | NaN | False | NaN | POINT (-3.21959 50.98371) |
3 | Person search | 2019-06-01T01:27:00+00:00 | NaN | NaN | 50.983714 | -3.219592 | Male | over 34 | White - English/Welsh/Scottish/Northern Irish/... | White | Misuse of Drugs Act 1971 (section 23) | Controlled drugs | A no further action disposal | NaN | False | NaN | POINT (-3.21959 50.98371) |
4 | Person search | 2019-06-01T01:27:00+00:00 | NaN | NaN | 50.983714 | -3.219592 | Male | over 34 | White - English/Welsh/Scottish/Northern Irish/... | White | Misuse of Drugs Act 1971 (section 23) | Controlled drugs | A no further action disposal | NaN | False | NaN | POINT (-3.21959 50.98371) |
And let’s view the type of object it is.
type(sw_5forces_stop_and_search_gdf)
geopandas.geodataframe.GeoDataFrame
19.2 Joining area data to boundary data
We can also combine pandas dataframes with geopandas dataframes.
When might we want to do this?
Imagine we have a dataset of patients who are using a particular type of service.
We can use pandas to count the number of patients per LSOA.
However - the LSOA code alone isn’t going to allow us to plot this dataset - it doesn’t contain the geometry.
Instead, we
- import a shapefile, geoJSON or geopackage of boundaries
- join it to our pandas dataframe using a common column (like LSOA code)
If we join our dataframe to our geodataframe, the result will be a geodataframe - so you can make use of all the useful features of geodataframes.
= geopandas.read_file("lsoa_boundaries.gpkg")
my_lsoa_boundary_gdf
= pd.read_csv(“counts_by_lsoa.csv”) my_count_df
Let’s imagine the geodataframe has a column called ‘LSOA11CD’
The count dataframe has a column called ‘LSOA’
= pd.merge(
my_final_df =my_lsoa_boundary_gdf,
left=my_count_df,
right=”LSOA11CD”
left_on=”LSOA”
right_on=”right”
how )
We need to be careful about the order we join things in to ensure we end up with the right type of object at the end.
“The stand-alone merge function will work if the GeoDataFrame is in the left argument; if a DataFrame is in the left argument and a GeoDataFrame is in the right position, the result will no longer be a GeoDataFrame.” - https://geopandas.org/en/v0.8.0/mergingdata.html”
This would result in a geodataframe:
= pd.merge(
my_final_df =my_lsoa_boundary_gdf,
left=my_count_df,
right=”LSOA11CD”
left_on=”LSOA”
right_on=”right”
how )
But this would not.
= pd.merge(
my_final_df =my_count_df,
left=my_lsoa_boundary_gdf,
left=”LSOA”
left_on=”LSOA11CD”
right_on="left"
how )
The ‘how’ argument
If you set how = ‘left’, all of the rows from the geodataframe will be kept, even if there is no value in your dataframe of counts
If you set how = ‘right’, all of the rows from the counts dataframe will be kept, even if there is no value in your geodataframe Check you have no missing values in the ‘geometry’ column after this!
If you set how = ‘full’, all of the rows from both dataframes will be kept - so you may end up with empty geometry in some cases and/or empty counts in others