lsdb.dask.merge_catalog_functions#

Module Contents#

Functions#

concat_partition_and_margin(→ pandas.DataFrame)

Concatenates a partition and margin dataframe together

align_catalogs(→ hipscat.pixel_tree.PixelAlignment)

Aligns two catalogs, also using the right catalog's margin if it exists

align_and_apply(→ List[dask.delayed.Delayed])

Aligns catalogs to a given ordering of pixels and applies a function each set of aligned partitions

filter_by_hipscat_index_to_pixel(→ pandas.DataFrame)

Filters a catalog dataframe to the points within a specified HEALPix pixel using the hipscat index

construct_catalog_args(...)

Constructs the arguments needed to create a catalog from a list of delayed partitions

get_healpix_pixels_from_alignment(...)

Gets the list of primary and join pixels as the HealpixPixel class from a PixelAlignment

generate_meta_df_for_joined_tables(→ pandas.DataFrame)

Generates a Dask meta DataFrame that would result from joining two catalogs

get_partition_map_from_alignment_pixels(...)

Gets a dictionary mapping HEALPix pixel to index of pixel in the pixel_mapping of a PixelAlignment

align_catalog_to_partitions(...)

Aligns the partitions of a Catalog to a dataframe with HEALPix pixels in each row

concat_partition_and_margin(partition: pandas.DataFrame, margin: pandas.DataFrame | None, right_columns: List[str]) pandas.DataFrame[source]#

Concatenates a partition and margin dataframe together

Parameters:
  • partition (pd.DataFrame) – The partition dataframe

  • margin (pd.DataFrame) – The margin dataframe

Returns:

The concatenated dataframe with the partition on top and the margin on the bottom

align_catalogs(left: lsdb.catalog.catalog.Catalog, right: lsdb.catalog.catalog.Catalog) hipscat.pixel_tree.PixelAlignment[source]#

Aligns two catalogs, also using the right catalog’s margin if it exists

Parameters:
Returns:

The PixelAlignment object from aligning the catalogs

align_and_apply(catalog_mappings: List[Tuple[lsdb.catalog.dataset.healpix_dataset.HealpixDataset | None, List[hipscat.pixel_math.HealpixPixel]]], func: Callable, *args, **kwargs) List[dask.delayed.Delayed][source]#

Aligns catalogs to a given ordering of pixels and applies a function each set of aligned partitions

Parameters:
  • catalog_mappings (List[Tuple[HealpixDataset, List[HealpixPixel]]]) – The catalogs and their corresponding ordering of pixels to align the partitions to. Catalog cane be None, in which case None will be passed to the function for each partition. Each list of pixels should be the same length. Example input: [(catalog, pixels), (catalog2, pixels2), …]

  • func (Callable) –

    The function to apply to the aligned catalogs. The function should take the aligned partitions of the catalogs as dataframes as the first arguments, followed by the healpix pixel of each partition, the hc_structures of the catalogs, and any additional arguments and keyword arguments. For example:

    def func(
        cat1_partition_df,
        cat2_partition_df,
        cat1_pixel,
        cat2_pixel,
        cat1_hc_structure,
        cat2_hc_structure,
        *args,
        **kwargs
    ):
        ...
    

  • *args – Additional arguments to pass to the function

  • **kwargs – Additional keyword arguments to pass to the function

Returns:

A list of delayed objects, each one representing the result of the function applied to the aligned partitions of the catalogs

filter_by_hipscat_index_to_pixel(dataframe: pandas.DataFrame, order: int, pixel: int) pandas.DataFrame[source]#

Filters a catalog dataframe to the points within a specified HEALPix pixel using the hipscat index

Parameters:
  • dataframe (pd.DataFrame) – The dataframe to filter

  • order (int) – The order of the HEALPix pixel to filter to

  • pixel (int) – The pixel number in NESTED numbering of the HEALPix pixel to filter to

Returns:

The filtered dataframe with only the rows that are within the specified HEALPix pixel

construct_catalog_args(partitions: List[dask.delayed.Delayed], meta_df: pandas.DataFrame, alignment: hipscat.pixel_tree.PixelAlignment) Tuple[dask.dataframe.core.DataFrame, lsdb.types.DaskDFPixelMap, hipscat.pixel_tree.PixelAlignment][source]#

Constructs the arguments needed to create a catalog from a list of delayed partitions

Parameters:
  • partitions (List[Delayed]) – The list of delayed partitions to create the catalog from

  • meta_df (pd.DataFrame) – The dask meta schema for the partitions

  • alignment (PixelAlignment) – The alignment used to create the delayed partitions

Returns:

A tuple of (ddf, partition_map, alignment) with the dask dataframe, the partition map, and the alignment needed to create the catalog

get_healpix_pixels_from_alignment(alignment: hipscat.pixel_tree.PixelAlignment) Tuple[List[hipscat.pixel_math.HealpixPixel], List[hipscat.pixel_math.HealpixPixel]][source]#

Gets the list of primary and join pixels as the HealpixPixel class from a PixelAlignment

Parameters:

alignment (PixelAlignment) – the PixelAlignment to get pixels from

Returns:

a tuple of (primary_pixels, join_pixels) with lists of HealpixPixel objects

generate_meta_df_for_joined_tables(catalogs: Sequence[lsdb.catalog.catalog.Catalog], suffixes: Sequence[str], extra_columns: pandas.DataFrame | None = None, index_name: str = HIPSCAT_ID_COLUMN, index_type: numpy.typing.DTypeLike = np.uint64) pandas.DataFrame[source]#

Generates a Dask meta DataFrame that would result from joining two catalogs

Creates an empty dataframe with the columns of each catalog appended with a suffix. Allows specifying extra columns that should also be added, and the name of the index of the resulting dataframe.

Parameters:
  • catalogs (Sequence[lsdb.Catalog]) – The catalogs to merge together

  • suffixes (Sequence[Str]) – The column suffixes to apply each catalog

  • extra_columns (pd.Dataframe) – Any additional columns to the merged catalogs

  • index_name (str) – The name of the index in the resulting DataFrame

  • index_type (npt.DTypeLike) – The type of the index in the resulting DataFrame

Returns: An empty dataframe with the columns of each catalog with their respective suffix, and any extra columns specified, with the index name set.

get_partition_map_from_alignment_pixels(join_pixels: pandas.DataFrame) lsdb.types.DaskDFPixelMap[source]#

Gets a dictionary mapping HEALPix pixel to index of pixel in the pixel_mapping of a PixelAlignment

Parameters:

join_pixels (pd.DataFrame) – The pixel_mapping from a PixelAlignment object

Returns:

A dictionary mapping HEALPix pixel to the index that the pixel occurs in the pixel_mapping table

align_catalog_to_partitions(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset | None, pixels: List[hipscat.pixel_math.HealpixPixel]) List[dask.delayed.Delayed | None][source]#

Aligns the partitions of a Catalog to a dataframe with HEALPix pixels in each row

Parameters:
  • catalog – the catalog to align

  • pixels – the list of HealpixPixels specifying the order of partitions

Returns:

A list of dask delayed objects, each one representing the data in a HEALPix pixel in the order they appear in the input dataframe