ARCHIVED: COVID-19 Cases and Deaths Summarized by Geography

data.sfgov.org | Last Updated 11 Dec 2023

<strong>A. SUMMARY</strong> Medical provider confirmed COVID-19 cases and confirmed COVID-19 related deaths in San Francisco, CA aggregated by several different geographic areas and normalized by 2016-2020 American Community Survey (ACS) 5-year estimates for population data to calculate rate per 10,000 residents. On September 12, 2021, a new case definition of COVID-19 was introduced that includes criteria for enumerating new infections after previous probable or confirmed infections (also known as reinfections). A reinfection is defined as a confirmed positive PCR lab test more than 90 days after a positive PCR or antigen test. The first reinfection case was identified on December 7, 2021. Cases and deaths are both mapped to the residence of the individual, not to where they were infected or died. For example, if one was infected in San Francisco at work but lives in the East Bay, those are not counted as SF Cases or if one dies in Zuckerberg San Francisco General but is from another county, that is also not counted in this dataset. Dataset is cumulative and covers cases going back to 3/2/2020 when testing began. Geographic areas summarized are: 1. <a href="https://data.sfgov.org/Geographic-Locations-and-Boundaries/Analysis-Neighborhoods-2020-census-tracts-assigned/sevw-6tgi">Analysis Neighborhoods</a> 2. <a href="https://data.sfgov.org/Geographic-Locations-and-Boundaries/Census-2020-Tracts-for-San-Francisco/tmph-tgz9">Census Tracts</a> 3. <a href="https://www.census.gov/programs-surveys/geography/guidance/geo-areas/zctas.html">Census Zip Code Tabulation Areas</a> <strong>B. HOW THE DATASET IS CREATED</strong> Addresses from medical data are geocoded by the San Francisco Department of Public Health (SFDPH). Those addresses are spatially joined to the geographic areas. Counts are generated based on the number of address points that match each geographic area. The 2016-2020 American Community Survey (ACS) population estimates provided by the Census are used to create a rate which is equal to <i>([count] / [acs_population]) * 10000)</i> representing the number of cases per 10,000 residents. <strong>C. UPDATE PROCESS</strong> Geographic analysis is scripted by SFDPH staff and synced to this dataset daily at 7:30 Pacific Time. <strong>D. HOW TO USE THIS DATASET</strong> San Francisco population estimates for geographic regions can be found in a <a href="https://data.sfgov.org/d/35v5-seg9">view based on the San Francisco Population and Demographic Census dataset</a>. These population estimates are from the 2016-2020 5-year American Community Survey (ACS). <em>Privacy rules in effect</em> To protect privacy, certain rules are in effect: 1. Case counts greater than 0 and less than 10 are dropped - these will be null (blank) values 2. Death counts greater than 0 and less than 10 are dropped - these will be null (blank) values 3. Cases and deaths dropped altogether for areas where acs_population < 1000 <em>Rate suppression in effect where counts lower than 20</em> Rates are not calculated unless the case count is greater than or equal to 20. Rates are generally unstable at small numbers, so we avoid calculating them directly. We advise you to apply the same approach as this is best practice in epidemiology. <em>A note on Census ZIP Code Tabulation Areas (ZCTAs)</em> ZIP Code Tabulation Areas are special boundaries created by the U.S. Census based on ZIP Codes developed by the USPS. They are not, however, the same thing. ZCTAs are areal representations of routes. <a href="https://www.census.gov/programs-surveys/geography/guidance/geo-areas/zctas.html">Read how the Census develops ZCTAs on their website</a>. <em>Row included for Citywide case counts, incidence rate, and deaths</em> A single row is included that has the Citywide case counts and incidence rate. This can be used for comparisons. Citywide will capture all cases regardless of address quality. While some cases cannot be mapped to sub-areas like Census Tracts, ongoing data quality efforts result in improved mapping on a rolling basis. <strong>E. CHANGE LOG</strong> <UL><LI>9/11/2023 - data on COVID-19 cases and deaths summarized by geography are no longer being updated. This data is currently through 9/6/2023 and will not include any new data after this date. <LI>4/6/2023 - the State implemented system updates to improve the integrity of historical data. <LI>2/21/2023 - system updates to improve reliability and accuracy of cases data were implemented. <LI>1/31/2023 - updated “acs_population” column to reflect the 2020 Census Bureau American Community Survey (ACS) San Francisco Population estimates. <LI>1/31/2023 - implemented system updates to streamline and improve our geo-coded data, resulting in small shifts in our case and death data by geography. <LI>1/31/2023 - renamed column “last_updated_at” to “data_as_of”. <LI>2/23/2022 - the New Cases Map dashboard began pulling from this dataset. To access Cases by Geography Over Time, please refer to <a href="https://data.sfgov.org/d/d2ef-idww">this dataset</a>. <LI>1/22/2022 - system updates to improve timeliness and accuracy of cases and deaths data were implemented. <LI>7/15/2022 - reinfections added to cases dataset. See section SUMMARY for more information on how reinfections are identified. <LI>4/16/2021 - dataset updated to refresh with a five-day data lag.</UL>

This dataset has the following 11 columns:

Column NameAPI Column NameData TypeDescriptionSample Values
area_typearea_typetextType of geographic area, one of: Citywide, Census Tract, Analysis Neighborhood, or ZCTA (ZIP Code Tabulation Area)
ididtextThe identifier for the area type
acs_populationacs_populationnumber2016-2020 5-year American Community Survey (ACS) population estimate for a given geographic region
countcountnumberThe count of cases in the area, null when not zero and less than 10
count_last_60_dayscount_last_60_daysnumberThe count of cases in the area between max_specimen_collection_date and 60 days prior, null when total count is not zero and less than 10
rateratenumberThe rate of cases in the area, calculated as (count/acs_population) * 10000 which is a rate per 10,000 residents
deathsdeathsnumberNumber of deaths, null when not zero and less than 10
max_specimen_collection_datemax_specimen_collection_datecalendar_dateThe most recent date through which data is populated for the dataset. Will be 5 days before current date—due to the implemented five-day lag—barring any unforeseen data issues
data_as_ofdata_as_ofcalendar_dateWhen the dataset was last compiled by scripts, representing how current the data is
data_loaded_atdata_loaded_atcalendar_dateTimestamp when the record (row) was most recently updated here in the Open Data Portal
multipolygonmultipolygonmultipolygonThe geometry in multipolygon format stored in EPSG:4326 coordinate system