Home | Reports & Catalogs | Population Estimates | Population Estimates - Advanced (Researchers/Analysts)
Population Estimates - Advanced (Researchers/Analysts)
The Arizona Department of Health Services (ADHS) releases bridged-race population estimates (denominators) of the resident population of Arizona for use in calculating public health statistics. These estimates result in population counts that match the United States Office of Management and Budget (OMB) standards for the collection of data on race and ethnicity. The ADHS population estimates were produced under collaboration with the Arizona State Demographer’s Population Estimates Program. As required under Arizona Executive Order 2011-04, ADHS population denominators meet the requirement of state agencies using the official population estimates from the State Demographer. Therefore stratifications of data from ADHS match the mid-year population estimates officially published by the State Demographer’s Office.
The denominator datasets for researchers are stratified by census tract, community (community statistical area), and county levels. Within each dataset are variables that estimate annual population counts for Arizona residents by single year age (e.g. 0 - 85+ years of age) , race, ethnicity, and sex. These combinations allow for highly customizable queries.
In order to calculate public health rates, standardizing race and ethnicity for both the vital events (in the birth, death, and fetal death data) and the population denominators are needed. In these data sources, information on race and ethnicity is collected and categorized in a number of different ways, requiring a standard method of classifying race and ethnicity. To create frequency counts of race and ethnicity that were adequate to compute statistically reliable rates, race was “bridged,” or essentially collapsed into 5 categories; White non-Hispanic, Hispanic or Latino, Black or African American, Native American or Alaska Native, and Asian or Pacific Islander. When an individual was identified as both Hispanic and any other race, that person was included in the racial/ethnic group with the lowest population. For example, a person identified as both White and Hispanic would be coded as Hispanic, whereas a person identified as American Indian and Hispanic would be coded as American Indian.
Impact of U.S. Office of Management and Budget Statistical Policy Directive 15
ADHS is currently evaluating the impact of the U.S. Office of Management and Budget Statistical Policy Directive 15 which aims to update race and ethnicity categories in survey collection instruments and reporting. These new standards will be implemented in federal data sources by 2029. The update to the standard may have an impact on public health data. At this time, ADHS population estimates are currently using the matching federal standards outlined in OMB's 1997 Revisions to the Standards for the Classification of Federal Data on Race and Ethnicity.
Updated open source population denominators were developed to help identify and focus on underlying health issues and significant overarching health disparities faced by Arizonans and support the Arizona Health Improvement Plan Pandemic Recovery and Resiliency Initiative. New population denominator datasets and methods were developed in collaboration with Arizona State University, Center for Health Information and Research (CHiR) and the State of Arizona Demographer’s Office. These methods were first applied for evaluation of 2023 data and finalized for the 2024 data year release.The population projection process uses two complementary methodologies—Hamilton-Perry and regression modeling—to estimate future population counts by age, race,ethnicity, sex, and geography. The projections from these models are validated against the Arizona State Demographer’s county-level data, adjusted to match those benchmarks, combined, and then broken into single-year age categories. The overall workflow consists of three main phases: data preprocessing, projection and validation, and post-processing/delivery.
In the data preprocessing phase, population data are extracted from the 2010 and 2020 Decennial Census and from the American Community Survey (ACS) 5-year datasets spanning over the past 10-15 years. These data are accessed via the Census API and stratified by census tract, age, sex, and race or ethnicity. Because Hispanic individuals are sometimes double-counted in the Census and ACS race categories, they are treated as a separate category to ensure consistency. The data are then cleaned and aggregated into standardized 5-year age groups, with ACS age bands such as 15-17 and 18-19 combined into a single 15-19 group. Between 2010 and 2020, census tract boundaries changed from 1,526 to 1,765 tracts, requiring the use of a crosswalk prepared by the ADHS GIS team. This crosswalk converts 2010 tract data into 2020 boundaries by proportionally allocating population counts based on overlapping geographic areas. Additional preprocessing corrects the Hispanic categories: the Hispanic breakout removes double-counting by distinguishing White Hispanic individuals and redistributing non-White Hispanic counts across other racial groups. The result of this phase is a harmonized dataset aligned to 2020 tract boundaries and ready for projection modeling.
The projection and validation phase applies both the Hamilton-Perry and regression methods to forecast population changes. The Hamilton-Perry method uses demographic ratios—specifically, Cohort-Change Ratios (CCR) and Child-Woman Ratios (CWR)—calculated from the two most recent decennial censuses to project each age cohort forward ten years. CCR values, representing how populations move from one age group to the next across decades, are capped between 0.8 and 1.6 to prevent extreme estimates, while CWR values are used to estimate the youngest age groups based on the number of women of childbearing age. Where CCRs are missing, they are imputed from broader geographic aggregations such as tract-by-sex or county-by-sex levels. The regression method, by contrast, uses ACS data to build a time series for each unique combination of tract, sex, race, ethnicity and age group, and then applies a set of competing statistical models—including Quadratic Regression, Negative Binomial, Exponential Smoothing, ARIMA, K-Nearest Neighbors, and XGBoost—to project one step ahead into a year with no ACS data. Each model is first tested by predicting a known year, and whichever produces the smallest error is selected to generate the final projection. Once both sets of projections are complete, they are aggregated to the county level and compared with the State Demographer’s estimates to validate accuracy. Any discrepancies are adjusted by scaling the results to match the county totals and the race-wise projections, and then the final projections from the two methods are averaged together to form a single, unified population estimate.
In the post-processing and delivery phase, the combined projections, which are in five- and ten-year age groups, are expanded into single-year age estimates using proportions derived from the 2020 single-year census data. These proportions are applied separately for each census tract, race, and sex category to ensure that the age distribution mirrors the demographic structure observed in the most recent census. After that the Cox-Ernst method is applied to the dataset to round off the population values to the nearest integer (controlled rounding). The final dataset includes population counts for census tracts, community statistical areas (CSAs), and county levels, separated by male and female, and by race or ethnicity categories: American Indian/Alaska Native, Asian, Black, Hispanic or Latino, Native Hawaiian or Other Pacific Islander, Other, and White.
Since this methodology has been implemented from 2024, for the years prior to 2024, the data is scaled to match the published ADHS data for the corresponding year. This is done after single-year age estimates are complete. The ‘Other’ race from the calculations is redistributed between the other races. Then the totals are scaled to match the published records grouped at county, race, age, gender. The population values are then rounded off using the Cox-Ernst method and are made available at Tract, CSA and County levels.
Limitations and Improvements Based on the Updated Methodology
Due to missing information regarding the inputs of the Census population data for granular race category responses, the ‘Two or More’ and ‘Some Other Race’ categories have been summed into a new category called “Other/Unknown”. These counts are still part of the total population and did not have historical race-bridging methods applied like other categories of ‘American Indian or Alaska Native', 'Asian or Pacific Islander', 'Black or African American', 'Hispanic or Latino'. Mutually exclusive counts are still accounted for, but it should be noted that this Other/Unknown category does not exist in the Historical CDC Based County Denominators developed by the department.
Compared to the historical methodology using the CDC Based inputs, the new 2024 based methodology was improved to include an Expanded Age variable with single year of age 0-100+ years of age, 100-104, 105-109, and 110 or greater. Population counts are collapsible to the previous stratifications of single year 0-85+ using the Age variable.
Additional improvements on race and ethnicity of mutually exclusive counts of ethnicity, ‘Hispanic Or Latino’ and ‘Non Hispanic or Latino’ were added as a new column named, ‘Ethnicity’.
Lastly, in order to advance efforts to match the U.S. Office of Management and Budget Statistical Policy Directive 15 new minimum categories, a race expanded field was developed with the categories 'AMERICAN_INDIAN_AND_ALASKA_NATIVE_ALONE', 'ASIAN_ALONE', 'NATIVE_HAWAIIAN_AND_OTHER_PACIFIC_ISLANDER_ALONE', 'BLACK_OR_AFRICAN_AMERICAN_ALONE', 'SOME_OTHER_RACE_ALONE', 'TWO_OR_MORE_RACES' , 'WHITE. At this time, a Middle Eastern or North African category was not able to be derived from source inputs, but Native Hawaiian and Other Pacific Islander was available and therefore was split out.
Historical Methodology: Centers for Disease Control and Prevention Based County Denominators (2013-2023)
Until the year 2023, The Arizona Department of Health Services annual population estimates, sourced advanced stratification of age, race, and ethnicity data from a joint initiative between the U.S. Census and Centers for Disease Control and Prevention. Comprehensive single year and race-bridging stratifications were provided. This allowed for ADHS to stratify county population totals from the State Demographer. The CDC has since stopped producing the race-bridging dataset. ADHS saw this as an opportunity to develop new methods to maintain standardized denominator data for public health statistics in Arizona. Stratifications of data at smaller geographic scales, such as census tract were not available from these historical methods. ADHS stopped producing estimates using this historical method based on the CDC bridged-race population estimates with the 2023 data year.
Note: The CDC Based County Denominator dataset differs from the TRACT,CSA, and COUNTY datasets. These utilize the new methodology described in collaboration with Arizona State University and the State Demographer’s Office. The CDC Based County Denominator dataset should not be compared to the other datasets.
Age-Adjustment
Because mortality from most causes of death occur predominately among the elderly, a population group with a larger proportion of older persons would have a higher mortality rate. The "age-adjustment" removes the effect of the age differences among sub-populations (or in the same population over time) by placing them all in a population with a standard age distribution. One approach for calculating age-adjusted mortality rates is to compute by the direct method, that is, by weighting the age-specific rates for a given year by the age distribution of a standard population. The weighted age-specific rates are then added to produce the summary rate for all ages combined. Beginning with the 2000 data year, a new population standard for the age adjustment of mortality rates has replaced the standard based on the 1940 population, used since 1943. The new standard uses the age composition of the 2000 U.S. projected population. The standard is expressed in terms of a “standard million”: the relative distribution of the 2000 population of the United States totaling 1 million in standard age groups, such as 10 year age groups. These standard weights were developed by the U.S. National Center for Health Statistics and are used in ADHS public health statistics.
| Dataset Name | Ordinal Position | Column Name | Field Description | Data Type |
|---|
Data Dictionary
- Download - Data Dictionary
- The data dictionary provides accurate variable descriptions and standardized data elements that are consistent throughout the dataset.
Note: These datasets are meant to be opened in statistical software. The number of records in some of the datasets below may exceed the limit for Microsoft Excel 2007 and newer (1,048,576).
Description: The variables included in the CDC Based AZ County Population Count (estimate) dataset include Age (0-85+ years), Gender (M,F), County, Race (Black or African American, American Indian or Alaska Native, Asian or Pacific Islander, White Non-Hispanic, and Hispanic or Latino), Population, State, and Year. State and County Totals for the dataset in this section were modeled to match official population estimates from State Demographer. Note: Single year age estimates may differ from ADHS based dataset below due to different methodology.
Files are CSV file type within zip file
- CDC Based AZ County Population Counts (Denominators/Estimate)
Description: The variables included in the ADHS Based Population Count (estimate) datasets include Age (0-85+ years), Gender (M/F), County, Population, State, Year, Census Tract, Community Statistical Area (CSA) Name, CSA ID, Age Expanded (0-100+ years), Race (American Indian or Alaska Native', 'Asian or Pacific Islander', 'Black or African American', 'Hispanic or Latino', 'Other/Unknown', 'White non-Hispanic'. The 'Other/Unknown' category includes statuses that are 'Two or More races', 'Some other race alone' *2024 is the only year with the "OTHER" response category. Please note that "Refused" in the numerator should not be used with Other/Unknown Race option as they are fundamentally different.), Ethnicity (Hispanic/Latino or not Hispanic/Latino), Race Expanded ('American Indian and Alaska Native Alone', 'Asian Alone', 'Native Hawaiian and Other Pacific Islander Alone', 'Black or African American Alone', 'Some Other Race Alone', 'Two or More Races' , 'White'). State and County Totals for the datasets in this section were modeled to match official population estimates from State Demographer. Note: Single year age estimates may differ from CDC based dataset above due to different methodology.
Files are CSV file type within zip file
Description: These standard weights used in age-adjusted rate calculations were developed by the U.S. National Center for Health Statistics and are used in ADHS public health statistics.
Files are CSV file type
- U.S. National Center for Health Statistics Age-Adjusted Weights