lsdb.io.to_hipscat
#
Module Contents#
Functions#
|
Performs a write of a pandas dataframe to a single parquet file, following the hipscat structure. |
|
Writes a catalog to disk, in HiPSCat format. The output catalog comprises |
|
Saves catalog partitions as parquet to disk |
Creates the partition info dictionary |
|
Creates a modified version of the HiPSCat catalog structure |
|
|
Fill all known information in a dictionary for provenance tracking. |
- perform_write(df: pandas.DataFrame, hp_pixel: hipscat.pixel_math.HealpixPixel, base_catalog_dir: hipscat.io.FilePointer, storage_options: dict | None = None, **kwargs) int [source]#
Performs a write of a pandas dataframe to a single parquet file, following the hipscat structure.
To be used as a dask delayed method as part of a dask task graph.
- Parameters:
df (pd.DataFrame) – dataframe to write to file
hp_pixel (HealpixPixel) – HEALPix pixel of file to be written
base_catalog_dir (FilePointer) – Location of the base catalog directory to write to
storage_options (dict) – fsspec storage options
**kwargs – other kwargs to pass to pd.to_parquet method
- Returns:
number of rows written to disk
- to_hipscat(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset, base_catalog_path: str, catalog_name: str | None = None, overwrite: bool = False, storage_options: dict | None = None, **kwargs)[source]#
Writes a catalog to disk, in HiPSCat format. The output catalog comprises partition parquet files and respective metadata, as well as JSON files detailing partition, catalog and provenance info.
- Parameters:
catalog (HealpixDataset) – A catalog to export
base_catalog_path (str) – Location where catalog is saved to
catalog_name (str) – The name of the output catalog
overwrite (bool) – If True existing catalog is overwritten
storage_options (dict) – Dictionary that contains abstract filesystem credentials
**kwargs – Arguments to pass to the parquet write operations
- write_partitions(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset, base_catalog_dir_fp: hipscat.io.FilePointer, storage_options: Dict[Any, Any] | None = None, **kwargs) Dict[hipscat.pixel_math.HealpixPixel, int] [source]#
Saves catalog partitions as parquet to disk
- Parameters:
catalog (HealpixDataset) – A catalog to export
base_catalog_dir_fp (FilePointer) – Path to the base directory of the catalog
storage_options (dict) – Dictionary that contains abstract filesystem credentials
**kwargs – Arguments to pass to the parquet write operations
- Returns:
A dictionary mapping each HEALPix pixel to the number of data points in it.
- _get_partition_info_dict(ddf_points_map: Dict[hipscat.pixel_math.HealpixPixel, int]) Dict[hipscat.pixel_math.HealpixPixel, lsdb.types.HealpixInfo] [source]#
Creates the partition info dictionary
- Parameters:
ddf_points_map (Dict[HealpixPix,int]) – Dictionary mapping each HealpixPixel to the respective number of points inside its partition
- Returns:
A partition info dictionary, where the keys are the HEALPix pixels and the values are pairs where the first element is the number of points inside the pixel, and the second is the list of destination pixel numbers.
- create_modified_catalog_structure(catalog_structure: hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset, catalog_base_dir: str, catalog_name: str, **kwargs) hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset [source]#
Creates a modified version of the HiPSCat catalog structure
- Parameters:
catalog_structure (hc.catalog.Catalog) – HiPSCat catalog structure
catalog_base_dir (str) – Base location for the catalog
catalog_name (str) – The name of the catalog to be saved
**kwargs – The remaining parameters to be updated in the catalog info object
- Returns:
A HiPSCat structure, modified with the parameters provided.
- _get_provenance_info(catalog_structure: hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset) dict [source]#
Fill all known information in a dictionary for provenance tracking.
- Parameters:
catalog_structure (HCHealpixDataset) – The catalog structure
- Returns:
dictionary with all argument_name -> argument_value as key -> value pairs.