Exporting results#

You can save the catalogs that result from running your workflow to disk, in parquet format, using the to_hipscat call.

You must provide a base_catalog_path, which is the output path for your catalog directory, and (optionally) a name for your catalog, catalog_name. The catalog_name is the catalog’s internal name and therefore may differ from the catalog’s base directory name. If the directory already exists and you want to overwrite its content set the overwrite flag to True. Do not forget to provide the necessary credentials, as storage_options, when trying to export the catalog to protected remote storage.

For example, to save a catalog that contains the results of crossmatching Gaia with ZTF to "./my_catalogs/gaia_x_ztf" one could run:

gaia_x_ztf_catalog.to_hipscat(base_catalog_path="./my_catalogs/gaia_x_ztf", catalog_name="gaia_x_ztf")

The HiPSCat catalogs on disk follow a well-defined directory structure:

gaia_x_ztf/
├── N_order={}/
│   ├── Dir={}/
│   │   ├── Npix={}.parquet
│   │   └── ...
│   └── ...
├── N_order={}/
│   ├── Dir={}/
│   │   ├── Npix={}.parquet
│   │   └── ...
│   └── ...
├── _metadata
├── _common_metadata
├── catalog_info.json
├── partition_info.csv
└── provenance_info.json

The data is partitioned spatially and it is stored, in parquet format, according to the area of the respective partitions in the sky. Each parquet file represents a partition. The higher the Norder for a partition, the smaller it is in area. As a result, because partitions contain (approximately) the same number of points, those in a directory with a larger Norder hold data for denser regions of the sky. All this information is encoded in metadata files that exist at the root of our catalog.