Skip to content

Comments

Update geojson#67

Open
huntermills707 wants to merge 2 commits intoplotly:masterfrom
huntermills707:update-geojson
Open

Update geojson#67
huntermills707 wants to merge 2 commits intoplotly:masterfrom
huntermills707:update-geojson

Conversation

@huntermills707
Copy link

Adding data files for updated geojson and fips-unemp.

The original boundaries/names have changed for several counties.

It would be beneficial to update these files and subsequently update the Chloropleth map examples to use this version.
#57 stems from this.

I have opted to add these files versus replace them to prevent any downstream breaking changes that rely on these files.

Here is my reasoning. Feel free to comment:

  • Replacing fips-unemp-16.csv with data from 2023 does not make sense.
    • Instead adding fips-unemp-23.csv.
  • geojson-counties-fips.json will break any applications using older US Census Data.
    • Instead geojson-counties-fips-2024.json. '2024' for the year.

If once these changes (or any downstream changes are made), I will happily change any plotly examples to used the updated files (it should only be changing the urls).

Below is the script to generate these two file. Input data uses updated US Census files.
Data is manipulated with Pandas and GeoPandas to match original formatting.
GeoPandas is used to convert from Shape to GeoJSON. Polygons are simplified to reduce file size.

import json

import geopandas as gpd
import pandas as pd

# Census Unemployment 
# * Data: 2023/acs/acs5"
# * Unemployment Rate: "S2301_C04_001E"
# * County Aggregator: "pseudo(0100000US\$0500000)"
df = pd.read_csv('unemp.csv')
df = df.rename(columns={'ucgid': 'GEO_ID'})

# US Census Shape file
gdf = gpd.read_file("cb_2024_us_county_within_cd119_500k/cb_2024_us_county_within_cd119_500k.shp")

# *** Get County Polygons ***
gdf = gdf.to_crs(epsg=4326)

# merge counties with multiple polygons
gb = gdf[['STATEFP', 'COUNTYFP', 'geometry']].groupby(['STATEFP', 'COUNTYFP'])
l = [[s, c, sub['geometry'].union_all()] for (s, c),sub in gb]
geo = gpd.GeoDataFrame(l, columns=['STATEFP', 'COUNTYFP', 'geometry'])

# simplify polygons
geo['geometry'] = geo['geometry'].simplify(.005)

# *** Get Census Area ***

# convert to sq mile
gdf['CENSUSAREA'] =  gdf['ALAND'] * 3.86102e-7

# sum areas for counties with multiple entries
area = gdf[['STATEFP', 'COUNTYFP', 'CENSUSAREA']].groupby(['STATEFP', 'COUNTYFP']).sum()

# *** Merge Calculated Results ***
out = geo.merge(area, on=['STATEFP', 'COUNTYFP'])

# *** Make GEO_ID, LSAD, NAME ***
# Make GEO_ID
out['GEO_ID'] ='0500000US' + out['STATEFP'] + out['COUNTYFP']

# US Census formats County Name as f'{county_name} {lsad}, {state_name}'

# Deal with multi word LSAD
# City and Borough, Census Area, Planning Region
def split_name(s):
    name, state = s.split(',')
    if s[-16:] == 'City and Borough':
        name = name[:-16]
        lsad = 'City and Borough'
    elif s[-11:] == 'Census Area':
        name = name[:-11]
        lsad = 'City and Borough'
    elif s[-15:] == 'City and Borough':
        name = name[:-15]
        lsad = 'Planning Region'
    else:
        parts = name.split(' ')
        name = ' '.join(parts[:-1])
        lsad = parts[-1]
    return name, lsad

pairs = [split_name(s) for s in df['NAME']]

# get LSAD -- remove state and county
df['LSAD'] = [lsad for _, lsad in pairs]
# get county name -- remove state and LSAD
df['NAME'] = [name for name, _ in pairs]

# *** Finalize GeoJSON ***
# Merge, harmonize names, and match order from original
out = out.merge(df[['GEO_ID', 'NAME', 'LSAD']], on='GEO_ID')
out = out.rename(columns={'STATEFP': 'STATE', 'COUNTYFP': 'COUNTY'})
out = out[[
    'GEO_ID',
    'STATE',
    'COUNTY',
    'NAME',
    'LSAD',
    'CENSUSAREA',
    'geometry',
]]

# to file will not add 'id' as in original 
out.to_file("temp.geojson", driver='GeoJSON')
with open("temp.geojson") as fp:
    d = json.load(fp)

# add id
for e in d['features']:
    e['id'] = e['properties']['STATE'] + e['properties']['COUNTY']

# dump geojson
with open('geojson-counties-fips-post-2024.json', 'w') as fp:
    json.dump(d, fp)

# *** Unemployment CSV ***
df['fips'] = [e.split('US')[1] for e in df['GEO_ID']]
df = df.rename(columns={'S2301_C04_001E': 'unemp'})
df[['fips', 'unemp']].to_csv('fips-unemp-23.csv', index=False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant