Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions docs/source/change-log.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,31 @@ This project adheres to `Semantic Versioning`_.

.. _Semantic Versioning: http://semver.org/


0.7.0
-----
2025-06-09

* New Features

* Support for IPUMS IHGIS extract API, including

* Support for IPUMS IHGIS has been added to :py:class:`~ipumspy.api.extract.AggregateDataExtract`
* :py:class:`~ipumspy.api.extract.IhgisDataset` added for use when constructing IPUMS IHGIS extract requests.

* Support for IPUMS IHGIS metadata API, including

* :py:class:`~ipumspy.api.metadata.IhgisDatasetMetadata`
* :py:class:`~ipumspy.api.metadata.IhgisDataTableMetadata`

* Support for the Monetary Value Adjustment feature on select variables from IPUMS USA and IPUMS CPS

* Breaking Changes

* :py:class:`~ipumspy.api.extract.Dataset` has been renamed :py:class:`~ipumspy.api.extract.NhgisDataset`
* :py:class:`~ipumspy.api.metadata.DatasetMetadata` has been renamed :py:class:`~ipumspy.api.metadata.NhgisDatasetMetadata`
* :py:class:`~ipumspy.api.metadata.DataTableMetadata` has been renamed :py:class:`~ipumspy.api.metadata.NhgisDatasetMetadata`

0.6.2
-----
2025-04-18
Expand Down
50 changes: 35 additions & 15 deletions docs/source/ipums_api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,11 @@ available features for all collections currently supported by the API:
- ``nhgis``
- **X**
- **X**
* - `IPUMS IHGIS <https://ihgis.ipums.org/>`__
- Aggregate data
- ``ihgis``
- **X**
- **X**

Note that ipumspy may not necessarily support all the functionality currently supported by
the IPUMS API. See the `API documentation <https://developer.ipums.org/>`__ for more information
Expand Down Expand Up @@ -195,29 +200,44 @@ metadata obtained for the requested data source:
tst.description
#> 'Total Population'

The following table summarizes the currently available metadata endpoints:
The following table summarizes the currently available metadata endpoints. Endpoints listed
in the **Metadata type** column can be used with the indicated collection in :py:meth:`.get_metadata_catalog`.
Classes listed in the **Detailed metadata class** column can be used to obtain detailed metadata for
individual data sources of that type.

.. _metadata support table:

.. list-table:: Supported metadata endpoints
:widths: 2 3 5
:widths: 3 2 5
:header-rows: 1
:align: center

* - Metadata type
- Supported collections
- Detailed metadata class analog
* - ``datasets``
- IPUMS NHGIS
- :py:class:`~ipumspy.api.metadata.DatasetMetadata`
* - ``data_tables``
- IPUMS NHGIS
- :py:class:`~ipumspy.api.metadata.DataTableMetadata`
* - ``time_series_tables``
- IPUMS NHGIS
* - Collection
- Metadata type
- Detailed metadata class
* - NHGIS
- ``datasets``
- :py:class:`~ipumspy.api.metadata.NhgisDatasetMetadata`
* - NHGIS
- ``data_tables``
- :py:class:`~ipumspy.api.metadata.NhgisDataTableMetadata`
* - NHGIS
- ``time_series_tables``
- :py:class:`~ipumspy.api.metadata.TimeSeriesTableMetadata`
* - ``shapefiles``
- IPUMS NHGIS
* - NHGIS
- ``shapefiles``
-
* -
-
-
* - IHGIS
- ``datasets``
- :py:class:`~ipumspy.api.metadata.IhgisDatasetMetadata`
* - IHGIS
- ``data_tables``
- :py:class:`~ipumspy.api.metadata.IhgisDataTableMetadata`
* - IHGIS
- ``tabulation_geographies``
-

.. _submit-extract:
Expand Down
119 changes: 82 additions & 37 deletions docs/source/ipums_api/ipums_api_aggregate/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,36 +6,46 @@ Aggregate Data Extracts
=======================

IPUMS aggregate data collections distribute aggregated statistics for a set of geographic units.
IPUMS contains two aggregate data collections, both of which are supported by the IPUMS API:

Currently, `IPUMS NHGIS <https://www.nhgis.org/>`__ is the only aggregate data collection
supported by the IPUMS API.
- `IPUMS NHGIS <https://www.nhgis.org/>`__
- `IPUMS IHGIS <https://www.ihgis.ipums.org>`__

IPUMS NHGIS provides 3 different types of data sources:

- Datasets/data tables
- Time series tables
- Shapefiles

IPUMS IHGIS provides 1 type of data source:

- Datasets/data tables

.. note::
IHGIS does provide boundary shapefiles, but these are not provided
via the IPUMS API. Shapefiles from IHGIS can be downloaded directly from the
`IHGIS website <https://ihgis.ipums.org/geography-gis>`__.

Extract Objects
---------------

Construct an extract for an IPUMS aggregate data collection using the
:class:`AggregateDataExtract<ipumspy.api.extract.AggregateDataExtract>` class.
An ``AggregateDataExtract`` must contain an IPUMS collection ID
and at least one data source. IPUMS NHGIS provides 3 different types of data sources:

- Datasets/data tables
- Time series tables
- Shapefiles

We also recommend providing an extract description to make it easier to identify and
retrieve your extract in the future.
and at least one data source. We also recommend providing an extract description
to make it easier to identify and retrieve your extract in the future.

For example:

.. code:: python

from ipumspy import AggregateDataExtract, Dataset
from ipumspy import AggregateDataExtract, NhgisDataset

extract = AggregateDataExtract(
collection="nhgis",
description="An NHGIS extract example",
datasets=[
Dataset(name="1990_STF1", data_tables=["NP1", "NP2"], geog_levels=["county"])
NhgisDataset(name="1990_STF1", data_tables=["NP1", "NP2"], geog_levels=["county"])
]
)

Expand All @@ -48,9 +58,7 @@ After instantiation, an ``AggregateDataExtract`` object can be

.. note::
The IPUMS API provides a set of metadata endpoints for aggregate data collections that allow you
to browse available data sources and identify their associated API codes.

You can also browse metadata interactively through the `NHGIS Data Finder <https://data2.nhgis.org/main>`_.
to browse available data sources and identify their associated API codes (see below for examples).

Datasets + Data Tables
----------------------
Expand All @@ -59,20 +67,25 @@ An IPUMS **dataset** contains a collection of **data tables** that each correspo
A dataset is distinguished by the years, geographic levels, and topics that it covers. For instance, 2021 1-year data from the
American Community Survey (ACS) is encapsulated in a single dataset. In other cases, a single census product will be split into
multiple datasets, typically based on the lowest-level geography for which a set of tables is available. See the
`NHGIS documentation <https://www.nhgis.org/overview-nhgis-datasets>`_ for more details.
`NHGIS <https://www.nhgis.org/overview-nhgis-datasets>`_ and `IHGIS <https://ihgis.ipums.org/dataset-descriptions>`__ documentation
for more details.

To request data contained in an IPUMS dataset, you need to specify the name of the dataset, name of the data table(s) to request
from that dataset, and the geographic level at which those tables should be aggregated.
from that dataset, and the geographic level at which those tables should be aggregated.

NHGIS Datasets
++++++++++++++

Use the :class:`Dataset <ipumspy.api.extract.Dataset>` class to specify these parameters.
For NHGIS extracts, use the
:class:`NhgisDataset <ipumspy.api.extract.NhgisDataset>` class to specify these parameters:

.. code:: python

extract = AggregateDataExtract(
collection="nhgis",
description="An NHGIS example extract",
datasets=[
Dataset(name="2000_SF1a", data_tables=["NP001A", "NP031A"], geog_levels=["state"])
NhgisDataset(name="2000_SF1a", data_tables=["NP001A", "NP031A"], geog_levels=["state"])
],
)

Expand All @@ -84,7 +97,7 @@ Some datasets span multiple years and require a selection of ``years``:
collection="nhgis",
description="An NHGIS example extract",
datasets=[
Dataset(
NhgisDataset(
name="1988_1997_CBPa",
data_tables=["NT004"],
geog_levels=["county"],
Expand All @@ -105,7 +118,7 @@ for a dataset with the ``breakdown_values`` keyword argument:
collection="nhgis",
description="An NHGIS example extract",
datasets=[
Dataset(
NhgisDataset(
name="2000_SF1a",
data_tables=["NP001A", "NP031A"],
geog_levels=["state"],
Expand All @@ -122,23 +135,49 @@ For datasets with multiple breakdowns or data types (e.g., the American Communit
and margins of error), you can request that the data for each be provided in separate files or together in a
single file using the ``breakdown_and_data_type_layout`` argument.

IHGIS Datasets
++++++++++++++

For IHGIS, each dataset must be associated with a selection of data tables and tabulation geographies
(the level of geographic aggregation for the requested data). These are the only available parameters
for IHGIS dataset requests.

.. code:: python

AggregateDataExtract(
collection="ihgis",
description="An IHGIS example extract",
datasets=[
IhgisDataset(
"KZ2009pop",
data_tables=["KZ2009pop.AAA"],
tabulation_geographies=["KZ2009pop.g0"]
)
]
)

.. caution::
IHGIS extract requests only accept input for ``description`` and ``datasets``. Other ``AggregateDataExtract``
arguments do not apply to IHGIS extracts and will be omitted from the extract request if included.

Dataset + Data Table Metadata
+++++++++++++++++++++++++++++

You can obtain a listing of datasets and data tables as well as detailed information about individual
datasets and data tables via the :ref:`IPUMS Metadata API <ipums-metadata>`.

Use the :class:`DatasetMetadata <ipumspy.api.metadata.DatasetMetadata>` data class to browse the available
specification options for a particular dataset and identify the codes to use when
requesting data from the API:
Use the :class:`NhgisDatasetMetadata <ipumspy.api.metadata.NhgisDatasetMetadata>` and
:class:`IhgisDatasetMetadata <ipumspy.api.metadata.IhgisDatasetMetadata>` data classes
to browse the available specification options for a particular dataset and identify
the codes to use when requesting data from the API:

.. code:: python

from ipumspy import IpumsApiClient, DatasetMetadata
from ipumspy import IpumsApiClient, NhgisDatasetMetadata

ipums = IpumsApiClient(os.environ.get("IPUMS_API_KEY"))

ds = ipums.get_metadata(DatasetMetadata("nhgis", "2000_SF1a"))
ds = ipums.get_metadata(NhgisDatasetMetadata("2000_SF1a"))

The returned object will contain the metadata for the requested dataset. For example:

Expand All @@ -152,7 +191,9 @@ The returned object will contain the metadata for the requested dataset. For exa

# etc...

You can also request metadata for individual data tables using the same workflow with the :class:`DataTableMetadata <ipumspy.api.metadata.DataTableMetadata>` data class.
You can also request metadata for individual data tables using the same workflow with the
:class:`NhgisDataTableMetadata <ipumspy.api.metadata.NhgisDataTableMetadata>` and
:class:`IhgisDataTableMetadata <ipumspy.api.metadata.IhgisDataTableMetadata>` data classes.

Time Series Tables
------------------
Expand All @@ -162,7 +203,7 @@ U.S. censuses in a single package. A table is comprised of one or more related t
of which describes a single summary statistic measured at multiple times for a given geographic level.

Use the :class:`TimeSeriesTable<ipumspy.api.extract.TimeSeriesTable>` class to add time series tables
to your extract request.
to your NHGIS extract request.

Time series tables are already associated with a specific summary statistic, so they don't require an additional
selection of data tables as is required for NHGIS datasets. However, you will need to specify the geographic
Expand Down Expand Up @@ -218,7 +259,7 @@ Geographic Extent Selection

When working with small geographies it can be computationally intensive to work with
nationwide data. To avoid this problem, you can request data from a specific geographic area
using the ``geographic_extents`` argument
using the ``geographic_extents`` argument. This argument is only available for NHGIS extracts.

The following extract requests ACS 5-year sex-by-age counts at the census block group level, but
only includes block groups that fall within Alabama and Arkansas (identified by their FIPS codes with
Expand All @@ -230,15 +271,15 @@ a trailing 0):
collection="nhgis",
description="Extent selection example",
datasets=[
Dataset(name="2018_2022_ACS5a", data_tables=["B01001"], geog_levels=["blck_grp"]),
Dataset(name="2017_2021_ACS5a", data_tables=["B01001"], geog_levels=["blck_grp"])
NhgisDataset(name="2018_2022_ACS5a", data_tables=["B01001"], geog_levels=["blck_grp"]),
NhgisDataset(name="2017_2021_ACS5a", data_tables=["B01001"], geog_levels=["blck_grp"])
],
geographic_extents=["010", "050"]
)

.. tip::
You can see available extent selection API codes, if any, in the ``geographic_instances`` attribute of
a submitted :class:`DatasetMetadata <ipumspy.api.metadata.DatasetMetadata>` or
a submitted :class:`NhgisDatasetMetadata <ipumspy.api.metadata.NhgisDatasetMetadata>` or
:class:`TimeSeriesTableMetadata <ipumspy.api.metadata.TimeSeriesTableMetadata>` object. The
``geog_levels`` attribute indicates whether a given geographic level supports extent selection.

Expand All @@ -261,6 +302,8 @@ simply by specifying their names:
shapefiles=["us_county_2021_tl2021", "us_county_2020_tl2020"]
)

As mentioned above, IHGIS shapefiles must be downloaded directly from the `IHGIS website <https://ihgis.ipums.org/geography-gis>`__.

Shapefile Metadata
++++++++++++++++++

Expand All @@ -282,8 +325,8 @@ datasets:
collection="nhgis",
description="An NHGIS example extract",
datasets=[
Dataset(name="2000_SF1a", data_tables=["NP001A"], geog_levels=["state"]),
Dataset(name="2010_SF1a", data_tables=["P1"], geog_levels=["state"])
NhgisDataset(name="2000_SF1a", data_tables=["NP001A"], geog_levels=["state"]),
NhgisDataset(name="2010_SF1a", data_tables=["P1"], geog_levels=["state"])
],
shapefiles=["us_state_2000_tl2010", "us_state_2010_tl2010"]
)
Expand All @@ -296,7 +339,7 @@ several ACS years at once using list comprehensions. For instance:

acs1_names = ["2017_ACS1", "2018_ACS1", "2019_ACS1"]
acs1_specs = [
Dataset(name, data_tables=["B01001"], geog_levels=["state"]) for name in acs1_names
NhgisDataset(name, data_tables=["B01001"], geog_levels=["state"]) for name in acs1_names
]

# Total state-level population from 2017-2019 ACS 1-year estimates
Expand All @@ -313,8 +356,10 @@ By default, NHGIS extracts are provided in CSV format with only a single header
If you like, you can request that your CSV data include a second header row containing
a description of each column's contents by setting ``data_format="csv_header"``.

You can also request your data in
fixed-width format if so desired. Note that unlike for microdata projects, NHGIS does
While you can also request your data in
fixed-width format, NHGIS is likely to phase out support for this format in the
future. We therefore suggest that you request data in CSV format.
Also note that unlike for microdata projects, NHGIS does
not provide DDI codebook files (in XML format), which allow ipumspy to parse
microdata fixed-width files. Thus, loading an NHGIS fixed width file will require
manual work to parse the file correctly.
Expand Down
Loading