lsdb.dask.merge_catalog_functions
#
Module Contents#
Functions#
|
Concatenates a partition and margin dataframe together |
|
Aligns two catalogs, also using the right catalog's margin if it exists |
|
Aligns catalogs to a given ordering of pixels and applies a function each set of aligned partitions |
|
Filters a catalog dataframe to the points within a specified HEALPix pixel using the hipscat index |
Constructs the arguments needed to create a catalog from a list of delayed partitions |
|
Gets the list of primary and join pixels as the HealpixPixel class from a PixelAlignment |
|
|
Generates a Dask meta DataFrame that would result from joining two catalogs |
Gets a dictionary mapping HEALPix pixel to index of pixel in the pixel_mapping of a PixelAlignment |
|
Aligns the partitions of a Catalog to a dataframe with HEALPix pixels in each row |
- concat_partition_and_margin(partition: pandas.DataFrame, margin: pandas.DataFrame | None, right_columns: List[str]) pandas.DataFrame [source]#
Concatenates a partition and margin dataframe together
- Parameters:
partition (pd.DataFrame) – The partition dataframe
margin (pd.DataFrame) – The margin dataframe
- Returns:
The concatenated dataframe with the partition on top and the margin on the bottom
- align_catalogs(left: lsdb.catalog.catalog.Catalog, right: lsdb.catalog.catalog.Catalog) hipscat.pixel_tree.PixelAlignment [source]#
Aligns two catalogs, also using the right catalog’s margin if it exists
- Parameters:
left (lsdb.Catalog) – The left catalog to align
right (lsdb.Catalog) – The right catalog to align
- Returns:
The PixelAlignment object from aligning the catalogs
- align_and_apply(catalog_mappings: List[Tuple[lsdb.catalog.dataset.healpix_dataset.HealpixDataset | None, List[hipscat.pixel_math.HealpixPixel]]], func: Callable, *args, **kwargs) List[dask.delayed.Delayed] [source]#
Aligns catalogs to a given ordering of pixels and applies a function each set of aligned partitions
- Parameters:
catalog_mappings (List[Tuple[HealpixDataset, List[HealpixPixel]]]) – The catalogs and their corresponding ordering of pixels to align the partitions to. Catalog cane be None, in which case None will be passed to the function for each partition. Each list of pixels should be the same length. Example input: [(catalog, pixels), (catalog2, pixels2), …]
func (Callable) –
The function to apply to the aligned catalogs. The function should take the aligned partitions of the catalogs as dataframes as the first arguments, followed by the healpix pixel of each partition, the hc_structures of the catalogs, and any additional arguments and keyword arguments. For example:
def func( cat1_partition_df, cat2_partition_df, cat1_pixel, cat2_pixel, cat1_hc_structure, cat2_hc_structure, *args, **kwargs ): ...
*args – Additional arguments to pass to the function
**kwargs – Additional keyword arguments to pass to the function
- Returns:
A list of delayed objects, each one representing the result of the function applied to the aligned partitions of the catalogs
- filter_by_hipscat_index_to_pixel(dataframe: pandas.DataFrame, order: int, pixel: int) pandas.DataFrame [source]#
Filters a catalog dataframe to the points within a specified HEALPix pixel using the hipscat index
- Parameters:
dataframe (pd.DataFrame) – The dataframe to filter
order (int) – The order of the HEALPix pixel to filter to
pixel (int) – The pixel number in NESTED numbering of the HEALPix pixel to filter to
- Returns:
The filtered dataframe with only the rows that are within the specified HEALPix pixel
- construct_catalog_args(partitions: List[dask.delayed.Delayed], meta_df: pandas.DataFrame, alignment: hipscat.pixel_tree.PixelAlignment) Tuple[dask.dataframe.core.DataFrame, lsdb.types.DaskDFPixelMap, hipscat.pixel_tree.PixelAlignment] [source]#
Constructs the arguments needed to create a catalog from a list of delayed partitions
- Parameters:
partitions (List[Delayed]) – The list of delayed partitions to create the catalog from
meta_df (pd.DataFrame) – The dask meta schema for the partitions
alignment (PixelAlignment) – The alignment used to create the delayed partitions
- Returns:
A tuple of (ddf, partition_map, alignment) with the dask dataframe, the partition map, and the alignment needed to create the catalog
- get_healpix_pixels_from_alignment(alignment: hipscat.pixel_tree.PixelAlignment) Tuple[List[hipscat.pixel_math.HealpixPixel], List[hipscat.pixel_math.HealpixPixel]] [source]#
Gets the list of primary and join pixels as the HealpixPixel class from a PixelAlignment
- Parameters:
alignment (PixelAlignment) – the PixelAlignment to get pixels from
- Returns:
a tuple of (primary_pixels, join_pixels) with lists of HealpixPixel objects
- generate_meta_df_for_joined_tables(catalogs: Sequence[lsdb.catalog.catalog.Catalog], suffixes: Sequence[str], extra_columns: pandas.DataFrame | None = None, index_name: str = HIPSCAT_ID_COLUMN, index_type: numpy.typing.DTypeLike = np.uint64) pandas.DataFrame [source]#
Generates a Dask meta DataFrame that would result from joining two catalogs
Creates an empty dataframe with the columns of each catalog appended with a suffix. Allows specifying extra columns that should also be added, and the name of the index of the resulting dataframe.
- Parameters:
catalogs (Sequence[lsdb.Catalog]) – The catalogs to merge together
suffixes (Sequence[Str]) – The column suffixes to apply each catalog
extra_columns (pd.Dataframe) – Any additional columns to the merged catalogs
index_name (str) – The name of the index in the resulting DataFrame
index_type (npt.DTypeLike) – The type of the index in the resulting DataFrame
Returns: An empty dataframe with the columns of each catalog with their respective suffix, and any extra columns specified, with the index name set.
- get_partition_map_from_alignment_pixels(join_pixels: pandas.DataFrame) lsdb.types.DaskDFPixelMap [source]#
Gets a dictionary mapping HEALPix pixel to index of pixel in the pixel_mapping of a PixelAlignment
- Parameters:
join_pixels (pd.DataFrame) – The pixel_mapping from a PixelAlignment object
- Returns:
A dictionary mapping HEALPix pixel to the index that the pixel occurs in the pixel_mapping table
- align_catalog_to_partitions(catalog: lsdb.catalog.dataset.healpix_dataset.HealpixDataset | None, pixels: List[hipscat.pixel_math.HealpixPixel]) List[dask.delayed.Delayed | None] [source]#
Aligns the partitions of a Catalog to a dataframe with HEALPix pixels in each row
- Parameters:
catalog – the catalog to align
pixels – the list of HealpixPixels specifying the order of partitions
- Returns:
A list of dask delayed objects, each one representing the data in a HEALPix pixel in the order they appear in the input dataframe