API Reference

STMiner.SPFinder

class STMiner.SPFinder(adata: AnnData | None = None)[source]

Bases: object

SPFinder is a class for spatial pattern discovery and analysis in spatial transcriptomics data. This class provides methods for reading, preprocessing, and analyzing spatial transcriptomics data, including gene expression matrix handling, spatial binning, pattern extraction using Gaussian Mixture Models (GMMs), distance calculations (e.g., optimal transport, cosine similarity, mean squared error), clustering, and visualization of spatial gene expression patterns.

  • This class is designed for spatial transcriptomics data analysis and requires AnnData and related dependencies.

  • Some methods rely on external utility functions and classes (e.g., Plot, fit_gmms, calculate_ot_distance).

  • Multiprocessing is supported for some distance calculations.

build_distance_array(method='gmm', gene_list=None)[source]

Build a distance array for genes based on the specified method.

This function supports four distance calculation methods: “gmm” (Gaussian Mixture Model), “mse” (Mean Squared Error), “cs” (Cosine Similarity), and “ot” (Optimal Transport). If no gene list is provided, all genes are used.

Parameters:
  • method (str) – The distance calculation method to use. Default is “gmm”.

  • gene_list (list) – A list of specific genes to use. If not provided, all genes are used.

Returns:

No direct return value, but updates the self.genes_distance_array attribute with the calculated distances.

compare_gene_to_genes(gene_name)[source]

Compares the Gaussian Mixture Model (GMM) of a specified gene to the GMMs of all genes in the current patterns.

Parameters:

gene_name (str) – The name of the gene whose GMM will be compared to others.

Returns:

A dictionary containing the distances between the specified gene’s GMM and the GMMs of all genes in the patterns.

Return type:

dict

compare_image_to_genes()[source]

Compares the GMM between the marked image and the gene expression matrix.

Returns:

pd.DataFrame

fit_pattern(n_top_genes: int = -1, n_comp: int = 20, normalize: bool = False, exclude_highly_expressed: bool = False, log1p: bool = False, min_cells: int = 20, gene_list: list | None = None, remove_low_exp_spots: bool = False)[source]

Fits gene expression patterns using Gaussian Mixture Models (GMMs) on selected genes. This method preprocesses the data and fits GMMs to the expression profiles of specified genes, allowing for the identification of spatial patterns in gene expression.

Parameters:
  • n_top_genes – int, optional (default: -1) Number of top highly variable genes to use. If -1, use all genes.

  • n_comp – int, optional (default: 20) Number of GMM components to fit for each gene.

  • normalize – bool, optional (default: False) Whether to normalize the data before fitting.

  • exclude_highly_expressed – bool, optional (default: False) Whether to exclude highly expressed genes during preprocessing.

  • log1p – bool, optional (default: False) Whether to apply log1p transformation to the data.

  • min_cells – int, optional (default: 20) Minimum number of cells a gene must be expressed in to be included.

  • gene_list – list, optional (default: None) List of gene names to fit patterns for. If None, uses top genes or all genes.

  • remove_low_exp_spots – bool, optional (default: False) Whether to remove spots with low expression before fitting.

Notes

The fitted patterns are stored in the self.patterns attribute.

get_custom_pattern(gene_list, n_components=20, vote_rate: int = 0, mode: str = 'vote')[source]

Generates a custom pattern model based on a list of genes using either a voting mechanism or a test mode.

Parameters:
  • gene_list (list) – List of gene identifiers to be used for pattern extraction.

  • n_components (int, optional) – Number of components for the Gaussian Mixture Model (GMM). Defaults to 20.

  • vote_rate (int, optional) – Threshold for voting mechanism in pattern extraction. Defaults to 0.

  • mode (str, optional) – Mode of operation, either “vote” for GMM-based pattern extraction or “test” for statistical testing. Defaults to “vote”.

Raises:

ValueError – If the mode is not “vote” or “test”.

Notes

  • In “vote” mode, fits a Gaussian Mixture Model to the gene pattern data.

  • In “test” mode, statistical testing is intended but not yet implemented.

get_genes_csr_array(min_cells: int, min_genes: int = 1, normalize: bool = True, exclude_highly_expressed: bool = False, log1p: bool = False, vmax: int = 100, gene_list: list | None = None)[source]

Generates a dictionary of compressed sparse row (CSR) matrices for gene expression data. This method processes the AnnData object (self.adata) to extract gene expression arrays for each gene, optionally normalizing, excluding highly expressed genes, and applying log1p transformation. The resulting matrices are stored in self.csr_dict with gene names as keys.

Parameters:
  • min_cells – int Minimum number of cells a gene must be expressed in to be included.

  • min_genes – int, optional Minimum number of genes a cell must express to be included (default: 1).

  • normalize – bool, optional Whether to normalize the data before processing (default: True).

  • exclude_highly_expressed – bool, optional Whether to exclude highly expressed genes (default: False).

  • log1p – bool, optional Whether to apply log1p transformation to the data (default: False).

  • vmax – int, optional Percentile value to cap gene expression values (default: 100).

  • gene_list – list, optional List of gene names to process. If None, all genes in self.adata are used (default: None).

Returns:

None

merge_bin(bin_width)[source]

Merge spatial coordinates into bins of a specified width.

This method updates the ‘x’ and ‘y’ columns in self.adata.obs by grouping their values into bins of size bin_width. The binning is performed using the merge_bin_coordinate function, starting from the minimum value of each coordinate.

Parameters:

bin_width (int or float) – The width of each bin to merge coordinates into.

Returns:

None

Notes

  • Assumes self.adata.obs is a pandas DataFrame with ‘x’ and ‘y’ columns.

  • The merge_bin_coordinate function should accept a coordinate array, a minimum value, and a bin size.

plot_gmm(gene_name, cmap=None)[source]

Plots the Gaussian Mixture Model (GMM) for a specified gene.

Parameters:
  • gene_name (str) – The name of the gene whose GMM is to be plotted.

  • cmap (str or matplotlib.colors.Colormap, optional) – Colormap to use for plotting. Defaults to None.

Returns:

None

read_h5ad(file, amplification=1, bin_size=1, merge_bin=False)[source]

Reads an h5ad file and sets the object’s adata attribute with the loaded data.

Parameters:
  • file (str) – Path to the h5ad file to be read.

  • amplification (int, optional) – Amplification factor to apply to the data. Defaults to 1.

  • bin_size (int, optional) – Size of the bin for binning the data. Defaults to 1.

  • merge_bin (bool, optional) – Whether to merge bins during reading. Defaults to False.

Returns:

None

set_adata(adata)[source]

Assigns the provided AnnData object to the instance.

Parameters:

adata (AnnData) – The annotated data matrix to be set for the instance.

spatial_high_variable_genes(vmax: int = 100, thread: int = 1)[source]

Identifies spatially high variable genes by comparing each gene’s spatial expression pattern to a global expression matrix using optimal transport (OT) distance. This method processes the spatial transcriptomics data to create a global sparse matrix, then computes the OT distance between the global matrix and each gene-specific matrix. The results are stored in a DataFrame with gene names, distances, and z-scores of the log-transformed distances. Optionally, multiprocessing can be used to speed up computation.

Parameters:
  • vmax – int, optional The upper percentile threshold for capping expression values in the global matrix (default is 100).

  • thread – int, optional The number of threads to use for multiprocessing. If thread <= 1, computation is done serially (default is 1).

Returns:

None

Notes

  • Requires self.csr_dict to be populated; otherwise, it will be generated.

  • Uses tqdm for progress display and multiprocessing for parallel computation if thread > 1.

  • Handles exceptions during OT distance calculation and prints the gene key and error.

  • The results are stored in the self.global_distance attribute as a pandas DataFrame with columns: “Gene”: Gene names. | “Distance”: OT distance to the global matrix. | “z-score”: Z-score of the log-transformed distances.

STMiner.Plot

class STMiner.Plot.Plot(sp)[source]

Bases: object

plot_gene(gene, cmap='Spectral_r', reverse_y=False, reverse_x=False, rotate=False, figsize=(8, 6), s=5, log1p=False, save_path='', format='eps', dpi=400, vmax=99)[source]

Plots the spatial expression of a given gene on a scatter plot.

Parameters:
  • gene (str) – The name of the gene to plot.

  • cmap (str, optional) – Colormap to use for gene expression. Default is ‘Spectral_r’.

  • reverse_y (bool, optional) – If True, reverse the y-axis. Default is False.

  • reverse_x (bool, optional) – If True, reverse the x-axis. Default is False.

  • rotate (bool, optional) – If True, rotate the plot by 90 degrees. Default is False.

  • figsize (tuple, optional) – Figure size in inches (width, height). Default is (8, 6).

  • s (int or None, optional) – Size of the scatter plot points. If None, uses default size. Default is 5.

  • log1p (bool, optional) – If True, apply log1p transformation to expression values. Default is False.

  • save_path (str, optional) – Directory path to save the plot. If empty, the plot is not saved. Default is ‘’.

  • format (str, optional) – File format for saving the plot (e.g., ‘eps’, ‘png’). Default is ‘eps’.

  • dpi (int, optional) – Dots per inch for saved figure. Default is 400.

  • vmax (float, optional) – Percentile value to set the upper limit of the color scale. Default is 99.

Returns:

None

Side Effects:
  • Displays the plot using matplotlib.

  • Saves the plot to the specified path if save_path is provided.

plot_intersection(pattern_list, cmap=None, s=None, rotate=False, reverse_x=False, reverse_y=False, figsize=(12, 8), image_path=None, rotate_img=False, plot_bg=True, k=1, bgs=5, aspect=1)[source]

Plots the intersection of multiple patterns as a scatter plot, optionally overlaying a background image and global pattern matrix.

Parameters:
  • pattern_list (list) – List of pattern names/keys to intersect and plot.

  • cmap (matplotlib.colors.Colormap, optional) – Colormap to use for the intersection points. Defaults to a preset colormap.

  • s (float or array-like, optional) – Size of the scatter plot points. Defaults to 10.

  • rotate (bool, optional) – Whether to rotate the intersection matrix by 90 degrees. Defaults to False.

  • reverse_x (bool, optional) – Whether to reverse the x-axis. Defaults to False.

  • reverse_y (bool, optional) – Whether to reverse the y-axis. Defaults to False.

  • figsize (tuple, optional) – Figure size for the plot. Defaults to (12, 8).

  • image_path (str, optional) – Path to a background image to overlay. If None, no image is shown.

  • rotate_img (bool, optional) – Whether to rotate the background image by 90 degrees. Defaults to False.

  • plot_bg (bool, optional) – Whether to plot the global pattern matrix as a background. Defaults to True.

  • k (int, optional) – Number of times to rotate the background image by 90 degrees. Defaults to 1.

  • bgs (float, optional) – Size of the background scatter plot points. Defaults to 5.

  • aspect (float, optional) – Aspect ratio for the background image. Defaults to 1.

Returns:

Displays the plot using matplotlib.

Return type:

None

plot_pattern(cmap=None, vmax=99, num_cols=4, rotate=False, reverse_y=False, reverse_x=False, heatmap=False, s=1, image_path=None, rotate_img=False, k=1, aspect=1, output_path=None, plot_bg=False)[source]

Plots spatial patterns for each label in the dataset as either heatmaps or scatter plots.

Parameters:
  • cmap (str or matplotlib colormap, optional) – Colormap to use for plotting. Defaults to “viridis” if None.

  • vmax (float, optional) – Percentile value to use as the maximum value for color scaling. Default is 99.

  • num_cols (int, optional) – Number of columns in the subplot grid. Default is 4.

  • rotate (bool, optional) – Whether to rotate the pattern matrices. Default is False.

  • reverse_y (bool, optional) – Whether to reverse the y-axis of the pattern matrices. Default is False.

  • reverse_x (bool, optional) – Whether to reverse the x-axis of the pattern matrices. Default is False.

  • heatmap (bool, optional) – If True, plot patterns as heatmaps; otherwise, use scatter plots. Default is False.

  • s (int or float, optional) – Size of scatter plot points. Default is 1.

  • image_path (str, optional) – Path to a background image to display under the scatter plot. Default is None.

  • rotate_img (bool, optional) – Whether to rotate the background image. Default is False.

  • k (int, optional) – Number of 90-degree rotations to apply to the background image if rotate_img is True. Default is 1.

  • aspect (float, optional) – Aspect ratio for the background image. Default is 1.

  • output_path (str, optional) – If provided, saves the plot to this path in EPS format. Default is None.

  • plot_bg (bool, optional) – Whether to plot the global background points in gray. Default is False.

Returns:

None. Displays the generated plots and optionally saves them to a file.