lsdb.io.to_hipscat#

Module Contents#

Functions#

perform_write(→ int)

Performs a write of a pandas dataframe to a single parquet file, following the hipscat structure.

to_hipscat(catalog, base_catalog_path[, catalog_name, ...])

Writes a catalog to disk, in HiPSCat format. The output catalog comprises

write_partitions(...)

Saves catalog partitions as parquet to disk

_get_partition_info_dict(...)

Creates the partition info dictionary

create_modified_catalog_structure(...)

Creates a modified version of the HiPSCat catalog structure

_get_provenance_info(→ dict)

Fill all known information in a dictionary for provenance tracking.

perform_write(df: pandas.DataFrame, hp_pixel: hipscat.pixel_math.HealpixPixel, base_catalog_dir: hipscat.io.FilePointer, storage_options: dict | None = None, **kwargs) int[source]#

Performs a write of a pandas dataframe to a single parquet file, following the hipscat structure.

To be used as a dask delayed method as part of a dask task graph.

Parameters:
  • df (pd.DataFrame) – dataframe to write to file

  • hp_pixel (HealpixPixel) – HEALPix pixel of file to be written

  • base_catalog_dir (FilePointer) – Location of the base catalog directory to write to

  • storage_options (dict) – fsspec storage options

  • **kwargs – other kwargs to pass to pd.to_parquet method

Returns:

number of rows written to disk

to_hipscat(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset, base_catalog_path: str, catalog_name: str | None = None, overwrite: bool = False, storage_options: dict | None = None, **kwargs)[source]#

Writes a catalog to disk, in HiPSCat format. The output catalog comprises partition parquet files and respective metadata, as well as JSON files detailing partition, catalog and provenance info.

Parameters:
  • catalog (HealpixDataset) – A catalog to export

  • base_catalog_path (str) – Location where catalog is saved to

  • catalog_name (str) – The name of the output catalog

  • overwrite (bool) – If True existing catalog is overwritten

  • storage_options (dict) – Dictionary that contains abstract filesystem credentials

  • **kwargs – Arguments to pass to the parquet write operations

write_partitions(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset, base_catalog_dir_fp: hipscat.io.FilePointer, storage_options: Dict[Any, Any] | None = None, **kwargs) Dict[hipscat.pixel_math.HealpixPixel, int][source]#

Saves catalog partitions as parquet to disk

Parameters:
  • catalog (HealpixDataset) – A catalog to export

  • base_catalog_dir_fp (FilePointer) – Path to the base directory of the catalog

  • storage_options (dict) – Dictionary that contains abstract filesystem credentials

  • **kwargs – Arguments to pass to the parquet write operations

Returns:

A dictionary mapping each HEALPix pixel to the number of data points in it.

_get_partition_info_dict(ddf_points_map: Dict[hipscat.pixel_math.HealpixPixel, int]) Dict[hipscat.pixel_math.HealpixPixel, lsdb.types.HealpixInfo][source]#

Creates the partition info dictionary

Parameters:

ddf_points_map (Dict[HealpixPix,int]) – Dictionary mapping each HealpixPixel to the respective number of points inside its partition

Returns:

A partition info dictionary, where the keys are the HEALPix pixels and the values are pairs where the first element is the number of points inside the pixel, and the second is the list of destination pixel numbers.

create_modified_catalog_structure(catalog_structure: hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset, catalog_base_dir: str, catalog_name: str, **kwargs) hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset[source]#

Creates a modified version of the HiPSCat catalog structure

Parameters:
  • catalog_structure (hc.catalog.Catalog) – HiPSCat catalog structure

  • catalog_base_dir (str) – Base location for the catalog

  • catalog_name (str) – The name of the catalog to be saved

  • **kwargs – The remaining parameters to be updated in the catalog info object

Returns:

A HiPSCat structure, modified with the parameters provided.

_get_provenance_info(catalog_structure: hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset) dict[source]#

Fill all known information in a dictionary for provenance tracking.

Parameters:

catalog_structure (HCHealpixDataset) – The catalog structure

Returns:

dictionary with all argument_name -> argument_value as key -> value pairs.