face_rhythm package

face_rhythm.alignment module

Image alignment pipeline and video frame ingestion.

Image_preparation_pipeline builds a clean reference image for registration from a sequence of frames by downsampling, masking with a VQT spectrogram to keep only low-spectral-variance (non-behavior) frames, and applying CLAHE. Also provides SFTP / local video frame extractors used to seed alignment.

class face_rhythm.alignment.Image_preparation_pipeline(ds_factor: int = 20, ptile_specVar_keep: float = 10, ptile_intensity_keep: float = 90, params_vqt: Dict = {'F_max': 60, 'F_min': 0.5, 'Fs_sample': 250, 'Q_highF': 20, 'Q_lowF': 3.5, 'downsample_factor': 10, 'fft_conv': True, 'n_freq_bins': 50, 'plot_pref': False, 'window_type': 'hann'}, clip_limit: float = 2.0, grid_size: int = 20, verbose: bool = True)[source]

Bases: object

Builds a clean reference image for registration by downsampling frames, selecting frames with low spectral variance via a VQT spectrogram, and applying CLAHE contrast enhancement. RH 2023

Parameters:
  • ds_factor (int) – Spatial downsampling factor applied before spectral analysis. (Default is 20)

  • ptile_specVar_keep (float) – Percentile cutoff for the per-frame mean spectral magnitude; frames at or below this percentile are kept as low-variance (non-behavior) frames. (Default is 10)

  • ptile_intensity_keep (float) – Percentile cutoff used when normalizing pixel intensities (kept for backwards compatibility; current implementation no longer applies this clip). (Default is 90)

  • params_vqt (Dict) – Keyword arguments forwarded to vqt.VQT for the spectral analysis. Recognized keys include Fs_sample (sample rate), Q_lowF, Q_highF (quality factors at the low and high frequency bounds), F_min, F_max (frequency range), n_freq_bins (number of frequency bins), window_type, downsample_factor, fft_conv (use FFT-based convolution), and plot_pref. (Default is the dictionary shown in the signature)

  • clip_limit (float) – clipLimit argument forwarded to OpenCV CLAHE. (Default is 2.0)

  • grid_size (int) – Tile grid size forwarded to OpenCV CLAHE. (Default is 20)

  • verbose (bool) – If True, prints progress messages and shows intermediate plots. (Default is True)

downsample(images: numpy.ndarray, ds_factor: int | None = None) numpy.ndarray[source]

Spatially downsamples a stack of images by ds_factor using bilinear interpolation, collapsing any color channel by mean.

Parameters:
  • images (np.ndarray) – Input image stack. shape: (n_frames, H, W) or (n_frames, H, W, C).

  • ds_factor (Optional[int]) – Integer downsampling factor. If None, self.ds_factor is used. (Default is None)

Returns:

images_ds (np.ndarray):

Downsampled image stack. shape: (n_frames, H // ds_factor, W // ds_factor), dtype: float32.

Return type:

(np.ndarray)

find_low_spectral_variance_idx(images_ds: numpy.ndarray, images: numpy.ndarray, ptile_specVar_keep: float | None = 10, ptile_intensity_keep: float | None = 90, params_vqt: Dict | None = {'F_max': 60, 'F_min': 0.5, 'Fs_sample': 250, 'Q_highF': 20, 'Q_lowF': 3.5, 'downsample_factor': 10, 'fft_conv': True, 'n_freq_bins': 50, 'plot_pref': False, 'window_type': 'hann'})[source]

Selects frames whose mean VQT spectral magnitude lies in the lowest ptile_specVar_keep percentile and returns a normalized mean image over those frames for use as a registration reference.

Parameters:
  • images_ds (np.ndarray) – Downsampled image stack used to compute spectrograms. shape: (n_frames, H_ds, W_ds).

  • images (np.ndarray) – Full-resolution image stack used to build the reference image. shape: (n_frames, H, W) or (n_frames, H, W, C).

  • ptile_specVar_keep (Optional[float]) – Percentile cutoff for the mean spectral magnitude; frames at or below this percentile are kept. If None, self.ptile_specVar_keep is used. (Default is 10)

  • ptile_intensity_keep (Optional[float]) – Percentile cutoff for intensity normalization (currently unused in the active code path; retained for backwards compatibility). If None, self.ptile_intensity_keep is used. (Default is 90)

  • params_vqt (Optional[Dict]) – Keyword arguments forwarded to vqt.VQT. If None, self.params_vqt is used. (Default is the dictionary shown in the signature)

Returns:

im (np.ndarray):

Square-rooted, max-normalized mean of the kept full- resolution frames. shape: (H, W), dtype: float32 (or matching the dtype of images.mean(0)).

Return type:

(np.ndarray)

apply_clahe(image: numpy.ndarray, clip_limit: float | None = 2.0, grid_size: int | None = 20) numpy.ndarray[source]

Applies CLAHE contrast enhancement to a single image via rois.Image_Aligner.augment_images.

Parameters:
  • image (np.ndarray) – Input image. shape: (H, W).

  • clip_limit (Optional[float]) – CLAHE clipLimit argument. If None, self.clip_limit is used. (Default is 2.0)

  • grid_size (Optional[int]) – CLAHE tile grid size. If None, self.grid_size is used. (Default is 20)

Returns:

im_aug (np.ndarray):

CLAHE-enhanced image. shape: (H, W).

Return type:

(np.ndarray)

apply_pipeline(images: numpy.ndarray)[source]

Runs the full reference-image pipeline: downsample, select low-spectral-variance frames, then apply CLAHE.

Parameters:

images (np.ndarray) – Input image stack. shape: (n_frames, H, W) or (n_frames, H, W, C).

Returns:

im_aug (np.ndarray):

CLAHE-enhanced reference image. shape: (H, W).

Return type:

(np.ndarray)

class face_rhythm.alignment.SFTPVideoFrameExtractor(host: str, username: str, password: str, port: int = 22, verbose: bool = True)[source]

Bases: object

Extracts frames from remote video files over SFTP and returns them as a NumPy array. The password is held base64-encoded with a random salt and only decoded transiently when constructing the SFTP URL; frame extraction is streamed through ffmpeg so the full file is never downloaded. RH 2023

Parameters:
  • host (str) – Hostname or IP address of the remote server.

  • username (str) – Username for authenticating to the remote server.

  • password (str) – Password for authenticating to the remote server. Stored internally in base64-encoded form with a random salt.

  • port (int) – TCP port for the SFTP connection. (Default is 22)

  • verbose (bool) – If True, prints progress messages during probing and frame retrieval. (Default is True)

extract_frames(remote_video_path: str, time_start: float, duration: int, fps: float | None = None) numpy.ndarray[source]

Streams a window of frames from a remote video over SFTP using ffmpeg and returns them as a stacked NumPy array. The number of frames returned is int(fps * duration).

Parameters:
  • remote_video_path (str) – Path to the video file on the remote server, e.g. "/path/to/video.mp4".

  • time_start (float) – Start time, in seconds, of the extraction window.

  • duration (int) – Length of the extraction window, in seconds. The total number of frames returned is fps * duration.

  • fps (Optional[float]) – Frame rate of the video. If None, it is probed from the video metadata via ffmpeg.probe. (Default is None)

Returns:

frames (np.ndarray):

Decoded frames stacked along axis 0. shape: (n_frames, H, W, 3), dtype: uint8.

Return type:

(np.ndarray)

Raises:

RuntimeError – If ffmpeg fails while extracting frames from the SFTP stream.

face_rhythm.alignment.get_frames(path, time_start, time_end, verbose=False)[source]

Reads a contiguous range of frames from a local video using OpenCV by seeking with cv2.CAP_PROP_POS_FRAMES. Stops early if the requested range extends past EOF rather than raising.

Parameters:
  • path (str) – Path to the local video file.

  • time_start (float) – Start time, in seconds, of the read window.

  • time_end (float) – End time, in seconds, of the read window. The number of frames requested is int((time_end - time_start) * fps).

  • verbose (bool) – If True, displays a tqdm progress bar over the seek loop. (Default is False)

Returns:

ims (np.ndarray):

Decoded frames stacked along axis 0. shape: (n_frames, H, W, 3), dtype matches the dtype returned by cv2.VideoCapture.read (typically uint8).

Return type:

(np.ndarray)

Raises:

ValueError – If no frames could be read in the requested interval.

face_rhythm.alignment_multisession module

Multi-session (cross-session) image alignment.

Ported from ROICaT (https://github.com/RichieHakim/ROICaT, roicat/tracking/alignment.py and roicat/helpers.py). Both projects (C) Rich Hakim — released under the face-rhythm LICENSE alongside the rest of the package. The ROICaT source is GPL-3.0-only; because Rich is the sole author of both packages, there is no license conflict; this module re-licenses the ported portions under face-rhythm’s terms for face-rhythm users.

This module provides Aligner, which registers a list of FOV images to a template using one of several geometric-registration backends. The public API and call-shape intentionally match ROICaT’s tracking.alignment.Aligner so notebooks and scripts that used the ROICaT entry-point can swap roicat.tracking.alignment for face_rhythm.alignment_multisession unchanged.

Backends ported:
  • 'RoMa' (optional — requires pip install face-rhythm[multisession])

  • 'ECC_cv2' (OpenCV-only, always available)

  • 'PhaseCorrelation' (torch-FFT only, always available)

  • 'NullRegistration' (identity, always available)

Backends deliberately NOT ported (pull heavy deps that face-rhythm users don’t need): LoFTR, DISK_LightGlue, DeepFlow, OpticalFlowFarneback, SIFT, ORB. If you need them, install and use ROICaT directly.

Low-level helpers (warp_matrix_to_remappingIdx, remap_images, compose_transform_matrices, cv2RemappingIdx_to_pytorchFlowField, find_geometric_transformation, make_batches, hash_file) are imported from face_rhythm.helpers, which already carries their ROICaT-ported equivalents. Only the helpers unique to alignment (ImageAlignmentChecker, phase_correlation, 2-D Butterworth bandpass filter construction, Dijkstra path reconstruction) are re-implemented here.

face_rhythm.alignment_multisession.make_distance_grid(shape: Tuple[int, int] = (512, 512), p: int = 2, idx_center: Tuple[int, int] | None = None, use_fftshift_center: bool = False) numpy.ndarray[source]

Creates an (H, W) array of Minkowski-p distances to a reference index. Ported from roicat.helpers.make_distance_grid.

Parameters:
  • shape (Tuple[int, int]) – Grid shape (H, W). (Default is (512, 512))

  • p (int) – Minkowski order. Use 1 for Manhattan, 2 for Euclidean, and inf for Chebyshev. Values above 2 approximate the max-norm. (Default is 2)

  • idx_center (Optional[Tuple[int, int]]) – Center index for the distances. If None, uses the geometric middle of the array (between two pixels on even shapes). (Default is None)

  • use_fftshift_center (bool) – If True, uses the index where np.fft.fftshift(np.fft.fftfreq(N)) is zero as the center (the correct reference for fftshifted 2-D FFTs). (Default is False)

Returns:

grid_dist (np.ndarray):

Minkowski-p distances to the center. shape: shape.

Return type:

(np.ndarray)

face_rhythm.alignment_multisession.design_butter_bandpass(lowcut: float, highcut: float, fs: float, order: int = 5) Tuple[numpy.ndarray, numpy.ndarray][source]

Designs a Butterworth bandpass filter, with low/highpass edge cases. Ported from roicat.helpers.design_butter_bandpass.

Parameters:
  • lowcut (float) – Low-cutoff frequency. If <= 0, a lowpass is used instead.

  • highcut (float) – High-cutoff frequency. If >= fs / 2, a highpass is used instead.

  • fs (float) – Sample rate.

  • order (int) – Butterworth filter order. (Default is 5)

Returns:

tuple containing:
b (np.ndarray):

Numerator polynomial of the IIR filter.

a (np.ndarray):

Denominator polynomial of the IIR filter.

Return type:

(Tuple[np.ndarray, np.ndarray])

face_rhythm.alignment_multisession.make_2D_frequency_filter(hw: Tuple[int, int], low: float = 5.0, high: float = 6.0, order: int = 3, distance_p: int = 100) numpy.ndarray[source]

Builds a 2-D fftshifted bandpass mask for phase-correlation scoring. Ported from roicat.helpers.make_2D_frequency_filter. The filter is the 1-D Butterworth magnitude response from design_butter_bandpass() evaluated on a Minkowski-distance_p distance grid produced by make_distance_grid().

Parameters:
  • hw (Tuple[int, int]) – Output height and width.

  • low (float) – Low cutoff in pixel units. (Default is 5.0)

  • high (float) – High cutoff in pixel units. (Default is 6.0)

  • order (int) – Butterworth filter order. (Default is 3)

  • distance_p (int) – Minkowski norm for the distance grid (100 is approximately Chebyshev). (Default is 100)

Returns:

filt (np.ndarray):

2-D bandpass mask with values in [0, 1]. shape: hw.

Return type:

(np.ndarray)

face_rhythm.alignment_multisession.phase_correlation(im_template: numpy.ndarray | torch.Tensor, im_moving: numpy.ndarray | torch.Tensor, mask_fft: numpy.ndarray | torch.Tensor | None = None, return_filtered_images: bool = False, eps: float = 1e-08) numpy.ndarray | torch.Tensor | Tuple[source]

Computes the phase-correlation of two images along the last two axes. Ported from roicat.helpers.phase_correlation.

Parameters:
  • im_template (Union[np.ndarray, torch.Tensor]) – Template image(s). shape: (…, H, W). Leading dims broadcast.

  • im_moving (Union[np.ndarray, torch.Tensor]) – Moving image(s). shape: (…, H, W). Broadcasts against the template.

  • mask_fft (Optional[Union[np.ndarray, torch.Tensor]]) – Optional 2-D bandpass mask. Assumed to already be fftshifted; this function un-shifts it so that it lines up with the raw FFT output. (Default is None)

  • return_filtered_images (bool) – If True, additionally returns the mask-filtered template and moving images in the image domain. (Default is False)

  • eps (float) – Floor used to avoid division by zero in the phase-correlation normalization. (Default is 1e-8)

Returns:

cc (Union[np.ndarray, torch.Tensor]):

Phase-correlation response with a shape that matches the broadcast of the inputs. Returned as np.ndarray when im_template is numpy, otherwise as torch.Tensor. When return_filtered_images is True, a 3-tuple (cc, filtered_template, filtered_moving) is returned instead, with the filtered images in the image domain.

Return type:

(Union[np.ndarray, torch.Tensor, Tuple])

face_rhythm.alignment_multisession.get_path_between_nodes(idx_start: int, idx_end: int, predecessors: numpy.ndarray, max_length: int = 9999) List[int][source]

Reconstructs a shortest path from a predecessor matrix. Ported from roicat.helpers.get_path_between_nodes. The predecessor matrix is the one returned by scipy.sparse.csgraph.shortest_path(), so predecessors[idx_end, idx_current] gives the previous node on the shortest path from idx_current to idx_end.

Parameters:
  • idx_start (int) – Index of the first node on the path.

  • idx_end (int) – Index of the destination node.

  • predecessors (np.ndarray) – Square predecessor matrix returned by scipy.sparse.csgraph.shortest_path().

  • max_length (int) – Safety cap on path length to avoid infinite loops. (Default is 9999)

Returns:

path (List[int]):

Node indices along the shortest path, in the form [idx_start, ..., idx_end].

Return type:

(List[int])

Raises:
  • AssertionError – Input validation failed (shapes, integer types, or the no-path placeholder -9999).

  • ValueError – Reconstructed path length exceeds max_length.

class face_rhythm.alignment_multisession.ImageAlignmentChecker(hw: Tuple[int, int], radius_in: float | Tuple[float, float], radius_out: float | Tuple[float, float], order: int = 5, device: str = 'cpu')[source]

Bases: object

Scores whether a set of images is spatially aligned via phase correlation. Ported from roicat.helpers.ImageAlignmentChecker.

The class constructs two band-selectable 2-D filters in the phase-correlation domain: an “in” filter over the center (within radius_in) and an “out” filter away from the center. Statistics of the phase-correlation peak under each filter are compared to produce an alignment z-score.

Parameters:
  • hw (Tuple[int, int]) – Image height and width. All scored images must match this shape.

  • radius_in (Union[float, Tuple[float, float]]) – Either the upper bound of the “in” bandpass (lower bound is 0) or an explicit (low, high) tuple.

  • radius_out (Union[float, Tuple[float, float]]) – Either the lower bound of the “out” bandpass (upper bound is min(H, W) / 2) or an explicit (low, high) tuple.

  • order (int) – Butterworth order shared by both filters. Values above 5 may cause the filters to collapse numerically. (Default is 5)

  • device (str) – Torch device string (e.g. 'cpu' or 'cuda:0') on which the precomputed filters live. (Default is 'cpu')

hw

Image height and width.

Type:

Tuple[int, int]

order

Butterworth order used for both filters.

Type:

int

device

Torch device string the filters were placed on.

Type:

str

filt_in

Precomputed in-band 2-D bandpass filter. shape: hw, dtype: float32.

Type:

torch.Tensor

filt_out

Precomputed out-band 2-D bandpass filter. shape: hw, dtype: float32.

Type:

torch.Tensor

score_alignment(images: numpy.ndarray | torch.Tensor | List | Tuple, images_ref: numpy.ndarray | torch.Tensor | List | Tuple | None = None) Dict[str, Any][source]

Computes per-pair alignment statistics for a stack of images.

Parameters:
  • images (Union[np.ndarray, torch.Tensor, List, Tuple]) – Stack of images. shape: (N, H, W), or (H, W) for a single image (which is broadcast).

  • images_ref (Optional[Union[np.ndarray, torch.Tensor, List, Tuple]]) – Reference images. If None, images is compared against itself (N x N scoring). (Default is None)

Returns:

stats (Dict[str, Any]):

Per-pair statistics keyed by name. Contains 'pc' (the phase-correlation array), 'mean_in', 'mean_out', 'ptile95_out', 'max_in', 'std_in', 'std_out', 'max_diff', 'z_in' (the primary alignment score), and 'r_in'.

Return type:

(Dict[str, Any])

class face_rhythm.alignment_multisession.ImageRegistrationMethod(device: str = 'cpu', verbose: bool | int = False)[source]

Bases: object

Base class for image-to-image registration backends. Subclasses either implement _forward_rigid() (to emit keypoint pairs for the RANSAC pipeline in fit_rigid()) or override fit_rigid() directly.

Parameters:
  • device (str) – Torch device string used by the backend (e.g. 'cpu' or 'cuda:0'). (Default is 'cpu')

  • verbose (Union[bool, int]) – Verbosity flag or integer level. (Default is False)

device

Torch device string used by the backend.

Type:

str

verbose

Verbosity flag or integer level.

Type:

Union[bool, int]

fit_rigid(im_template: numpy.ndarray | torch.Tensor, im_moving: numpy.ndarray | torch.Tensor, inl_thresh: float = 2.0, max_iter: int = 10, confidence: float = 0.99, constraint: str = 'homography', **kwargs) numpy.ndarray[source]

Estimates a constrained 3x3 warp between two images via RANSAC. Subclasses that emit keypoint pairs use this default implementation; the estimator branches on constraint.

Parameters:
  • im_template (Union[np.ndarray, torch.Tensor]) – Template image. shape: (H, W).

  • im_moving (Union[np.ndarray, torch.Tensor]) – Moving image. shape: (H, W).

  • inl_thresh (float) – RANSAC inlier threshold in pixels. (Default is 2.0)

  • max_iter (int) – Maximum RANSAC iterations. (Default is 10)

  • confidence (float) – RANSAC confidence level. (Default is 0.99)

  • constraint (str) –

    Warp family to fit. Either

    • 'rigid': Procrustes (rotation + translation).

    • 'euclidean': skimage.measure.ransac() with skimage.transform.EuclideanTransform.

    • 'similarity': cv2.estimateAffinePartial2D().

    • 'affine': cv2.estimateAffine2D().

    • 'homography': cv2.findHomography() with MAGSAC.

    (Default is 'homography')

  • **kwargs – Additional keyword arguments forwarded to _forward_rigid() for keypoint detection.

Returns:

warp_matrix (np.ndarray):

3x3 warp matrix. Affine rows are padded with [0, 0, 1] where appropriate. dtype: float32.

Return type:

(np.ndarray)

Raises:
  • RuntimeError – A fitting branch failed (e.g. RANSAC returned None).

  • ValueErrorconstraint is not one of the supported values.

class face_rhythm.alignment_multisession.RoMa(model_type: str = 'outdoor', n_points: int = 10000, batch_size: int = 1000, device: str = 'cpu', weight_urls: Dict | None = None, fallback_weight_urls: Dict | None = None, verbose: bool = False)[source]

Bases: ImageRegistrationMethod

Feature-matching registration backend that uses the RoMa model.

Requires the optional dependency romatch-roicat, installed via pip install face-rhythm[multisession]. The package imports as romatch regardless of which PyPI distribution was installed.

On first use the constructor downloads ~1.5 GB of weights via torch.hub.load_state_dict_from_url() into Path(torch.hub.get_dir()) / "checkpoints". Set the TORCH_HOME environment variable before import to redirect the cache.

Parameters:
  • model_type (str) –

    RoMa model variant. Either

    • 'outdoor': Outdoor-trained RoMa weights.

    • 'indoor': Indoor-trained RoMa weights.

    (Default is 'outdoor')

  • n_points (int) – Number of matched points to sample per image pair. (Default is 10000)

  • batch_size (int) – Sub-batch size used by the matching sampler. (Default is 1000)

  • device (str) – Torch device string for the RoMa model. (Default is 'cpu')

  • weight_urls (Optional[Dict]) – Primary download URLs and MD5 hashes for the RoMa and DINOv2 weights. If None, uses DEFAULT_WEIGHT_URLS. (Default is None)

  • fallback_weight_urls (Optional[Dict]) – OSF mirror URLs and matching hashes used if the primary downloads fail. If None, uses DEFAULT_FALLBACK_WEIGHT_URLS. (Default is None)

  • verbose (bool) – Verbosity flag. (Default is False)

roma_model_type

RoMa variant in use ('outdoor' or 'indoor').

Type:

str

n_points

Number of matched points to sample per pair.

Type:

int

batch_size

Sub-batch size for the matching sampler.

Type:

int

weight_urls

Primary URLs and hashes for the model weights.

Type:

Dict

fallback_weight_urls

Fallback (mirror) URLs and hashes for the model weights.

Type:

Dict

model

Initialized RoMa model instance.

Type:

object

DEFAULT_WEIGHT_URLS = {'dinov2': {'filename': 'dinov2_vitl14_pretrain.pth', 'hash': '19a02c10947ed50096ce382b46b15662', 'url': 'https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth'}, 'romatch': {'indoor': {'filename': 'roma_indoor.pth', 'hash': '349a17aaa21883bb164b1a5884febb21', 'url': 'https://github.com/Parskatt/storage/releases/download/roma/roma_indoor.pth'}, 'outdoor': {'filename': 'roma_outdoor.pth', 'hash': '9a451dfb65745e777bf916db6ea84933', 'url': 'https://github.com/Parskatt/storage/releases/download/roma/roma_outdoor.pth'}}}
DEFAULT_FALLBACK_WEIGHT_URLS = {'dinov2': {'filename': 'dinov2_vitl14_pretrain.pth', 'hash': '19a02c10947ed50096ce382b46b15662', 'url': 'https://osf.io/tmj5c/download'}, 'romatch': {'indoor': {'filename': 'roma_indoor.pth', 'hash': '349a17aaa21883bb164b1a5884febb21', 'url': 'https://osf.io/uzx64/download'}, 'outdoor': {'filename': 'roma_outdoor.pth', 'hash': '9a451dfb65745e777bf916db6ea84933', 'url': 'https://osf.io/cmzpa/download'}}}
class face_rhythm.alignment_multisession.ECC_cv2(mode_transform: str = 'euclidean', n_iter: int = 200, termination_eps: float = 1e-09, gaussFiltSize: float | int = 1, auto_fix_gaussFilt_step: int | None = 10, device: str = 'cpu', verbose: bool | int = False)[source]

Bases: ImageRegistrationMethod

OpenCV Enhanced Correlation Coefficient (ECC) registration backend. Wraps face_rhythm.helpers.find_geometric_transformation(), which in turn wraps cv2.findTransformECC(). On failure, the call is retried with a larger Gaussian filter size.

Parameters:
  • mode_transform (str) –

    Warp family for ECC. Either

    • 'translation': Translation-only warp.

    • 'euclidean': Rotation + translation.

    • 'affine': Affine warp.

    • 'homography': 3x3 homography.

    (Default is 'euclidean')

  • n_iter (int) – Maximum ECC iterations. (Default is 200)

  • termination_eps (float) – ECC convergence tolerance. (Default is 1e-09)

  • gaussFiltSize (Union[float, int]) – Gaussian-filter kernel size used as a smoothing pre-pass before the ECC iteration. Cast to int via np.round. (Default is 1)

  • auto_fix_gaussFilt_step (Optional[int]) – If set, on ECC failure the kernel size is incremented by this value and ECC is retried recursively. None disables the retry. (Default is 10)

  • device (str) – Ignored; ECC always runs on CPU. (Default is 'cpu')

  • verbose (Union[bool, int]) – Verbosity flag or integer level. (Default is False)

mode_transform

Warp family selected for ECC.

Type:

str

n_iter

Maximum ECC iterations.

Type:

int

termination_eps

ECC convergence tolerance.

Type:

float

gaussFiltSize

Effective Gaussian-filter kernel size used by ECC.

Type:

int

auto_fix_gaussFilt_step

Increment applied to gaussFiltSize after each ECC failure.

Type:

Optional[int]

fit_rigid(im_template: numpy.ndarray | torch.Tensor, im_moving: numpy.ndarray | torch.Tensor, **kwargs) numpy.ndarray[source]

Estimates a 3x3 warp matrix via ECC, retrying with a larger Gaussian filter on failure.

Parameters:
Returns:

warp_matrix (np.ndarray):

Homogeneous warp matrix. shape: (3, 3). Affine warps are padded with [0, 0, 1].

Return type:

(np.ndarray)

class face_rhythm.alignment_multisession.PhaseCorrelationRegistration(device: str = 'cpu', bandpass_freqs: List[float] | None = None, order: int = 5, verbose: bool = False)[source]

Bases: ImageRegistrationMethod

Translation-only registration via phase_correlation() peak detection. Supports an optional bandpass on the phase-correlation mask for robustness against low- and high-frequency noise.

Parameters:
  • device (str) – Torch device used for the FFT. (Default is 'cpu')

  • bandpass_freqs (Optional[List[float]]) – [low, high] cutoffs for the bandpass filter. None skips the bandpass. (Default is None)

  • order (int) – Butterworth order for the bandpass filter. (Default is 5)

  • verbose (bool) – Verbosity flag. (Default is False)

bandpass_freqs

Cutoffs used to construct the bandpass mask, if any.

Type:

Optional[List[float]]

order

Butterworth order for the bandpass filter.

Type:

int

fit_rigid(im_template: numpy.ndarray | torch.Tensor, im_moving: numpy.ndarray | torch.Tensor, **kwargs) numpy.ndarray[source]

Estimates a translation-only 3x3 warp via phase-correlation peak detection.

Parameters:
Returns:

warp_matrix (np.ndarray):

Translation-only homogeneous warp matrix. shape: (3, 3), dtype: float32.

Return type:

(np.ndarray)

class face_rhythm.alignment_multisession.NullRegistration(device: str | None = None, verbose: bool = False)[source]

Bases: ImageRegistrationMethod

Identity registration backend that returns an identity warp for every pair. Useful for debugging the Aligner.fit_geometric() pipeline, evaluating pre-registered images, and as a zero-cost method baseline.

Parameters:
  • device (Optional[str]) – Torch device string. None falls back to 'cpu'. (Default is None)

  • verbose (bool) – Verbosity flag. (Default is False)

fit_rigid(im_template: numpy.ndarray | torch.Tensor, im_moving: numpy.ndarray | torch.Tensor, **kwargs) numpy.ndarray[source]

Returns an identity 2x3 affine warp regardless of the input images.

Parameters:
Returns:

warp_matrix (np.ndarray):

Identity affine warp. shape: (2, 3), dtype: float32. Aligner.fit_geometric() pads this to (3, 3).

Return type:

(np.ndarray)

class face_rhythm.alignment_multisession.Aligner(use_match_search: bool = True, all_to_all: bool = False, radius_in: float = 4, radius_out: float = 20, order: int = 5, z_threshold: float = 4.0, um_per_pixel: float = 1.0, device: str = 'cpu', verbose: bool | int = True)[source]

Bases: _AlignerModuleStub

Registers a list of FOV images to a template using a chosen backend. The public API mirrors ROICaT’s tracking.alignment.Aligner so that existing notebooks can swap the import path without further changes.

Workflow:
  1. aligner = Aligner(...).

  2. aligner.fit_geometric(template=..., ims_moving=[...], method='RoMa' | 'ECC_cv2' | 'PhaseCorrelation' | 'NullRegistration', ...).

  3. Use aligner.remappingIdx_geo (a list of (H, W, 2) float32 arrays) to warp points or images, or call aligner.transform_images(ims_moving, remappingIdx=aligner.remappingIdx_geo).

  4. Inspect alignment quality with aligner.plot_alignment_results_geometric().

Parameters:
  • use_match_search (bool) – If any image scores <= z_threshold against the template, run the Dijkstra match-search step to find a pairwise path through other images. (Default is True)

  • all_to_all (bool) – If True, always run the all-to-all match search even when direct registrations all pass z_threshold. Much slower (O(N^2)). (Default is False)

  • radius_in (float) – Inner radius for the ImageAlignmentChecker, scaled by um_per_pixel. (Default is 4)

  • radius_out (float) – Outer radius for the ImageAlignmentChecker, scaled by um_per_pixel. (Default is 20)

  • order (int) – Butterworth order for the in- and out-band filters used by ImageAlignmentChecker. (Default is 5)

  • z_threshold (float) – z-score cutoff below which a pair is considered mis-aligned. The multi-session notebook sets 50 to always trigger the match-search. (Default is 4.0)

  • um_per_pixel (float) – Pixel scale, which must match across all images. (Default is 1.0)

  • device (str) – Torch device string for the backends (e.g. 'cuda:0'). ECC_cv2 and PhaseCorrelationRegistration ignore this and run on CPU; RoMa on CPU is prohibitively slow. (Default is 'cpu')

  • verbose (Union[bool, int]) – Verbosity flag or integer level. (Default is True)

Whether the Dijkstra match-search runs on bad alignments.

Type:

bool

all_to_all

Whether the all-to-all match-search runs unconditionally.

Type:

bool

radius_in

Inner radius parameter for ImageAlignmentChecker.

Type:

float

radius_out

Outer radius parameter for ImageAlignmentChecker.

Type:

float

order

Butterworth order parameter for ImageAlignmentChecker.

Type:

int

z_threshold

z-score cutoff for the alignment check.

Type:

float

device

Torch device string passed to the backends.

Type:

str

um_per_pixel

Pixel scale used to scale the in/out radii.

Type:

float

remappingIdx_geo

Per-image remapping arrays produced by fit_geometric(), each with shape (H, W, 2) and dtype float32. None until fit_geometric() runs.

Type:

Optional[List[np.ndarray]]

warp_matrices

Composed warp matrices set by fit_geometric(). None until fit_geometric() runs.

Example

aligner = Aligner(z_threshold=50, device='cuda:0')
aligner.fit_geometric(
    template=0,
    ims_moving=images,
    method='RoMa',
)
warped = aligner.transform_images(
    ims_moving=images,
    remappingIdx=aligner.remappingIdx_geo,
)
fit_geometric(template: int | float | numpy.ndarray, ims_moving: List[numpy.ndarray], template_method: str = 'sequential', mask_borders: Tuple[int, int, int, int] = (0, 0, 0, 0), method: str = 'RoMa', kwargs_method: Dict[str, Dict[str, Any]] | None = None, constraint: str = 'affine', kwargs_RANSAC: Dict[str, Any] | None = None, verbose: bool | None = None) List[numpy.ndarray][source]

Fits geometric warps from ims_moving to template and scores their alignment.

Calls the backend identified by method once per pair, composes warps across sequential templates (if any), then scores alignment via ImageAlignmentChecker. If any pair fails the z_threshold gate and use_match_search is True, a Dijkstra search through all intermediate images is run to reconstruct better paths.

Parameters:
  • template (Union[int, float, np.ndarray]) – Template image or index. Fractional indices in [0, 1] are mapped to int(N * f).

  • ims_moving (List[np.ndarray]) – Same-shape images to register. shape: (H, W) each.

  • template_method (str) –

    Template-resolution mode. Either

    • 'image': template is a concrete image (or pinned index resolved to one).

    • 'sequential': Each image is registered to its neighbor along a chain that ends at the template index.

    (Default is 'sequential')

  • mask_borders (Tuple[int, int, int, int]) – Pre-crop borders (top, bottom, left, right) removed from every image before registration. (Default is (0, 0, 0, 0))

  • method (str) – Backend key into _METHODS_LUT. One of 'RoMa', 'ECC_cv2', 'PhaseCorrelation', or 'NullRegistration'. (Default is 'RoMa')

  • kwargs_method (Optional[Dict[str, Dict[str, Any]]]) – Per-backend kwargs keyed by backend name, so the same dict can be passed for any method choice. If None, uses _DEFAULT_KWARGS_METHOD. (Default is None)

  • constraint (str) – Warp family passed through to ImageRegistrationMethod.fit_rigid(). (Default is 'affine')

  • kwargs_RANSAC (Optional[Dict[str, Any]]) – RANSAC kwargs for fit_rigid. If None, uses {'inl_thresh': 2.0, 'max_iter': 10, 'confidence': 0.99}. (Default is None)

  • verbose (Optional[bool]) – Overrides self._verbose when not None. (Default is None)

Returns:

remappingIdx_geo (List[np.ndarray]):

One remapping array per input image. shape: (H, W, 2) each, dtype: float32. Also stored on self.remappingIdx_geo.

Return type:

(List[np.ndarray])

transform_images(ims_moving: List[numpy.ndarray] | numpy.ndarray, remappingIdx: List[numpy.ndarray] | numpy.ndarray) List[numpy.ndarray] | numpy.ndarray[source]

Applies per-image remapping indices via face_rhythm.helpers.remap_images().

Parameters:
  • ims_moving (Union[List[np.ndarray], np.ndarray]) – Images to warp. May be a list of (H, W) or (H, W, C) arrays, or a single np.ndarray (returned as a bare array).

  • remappingIdx (Union[List[np.ndarray], np.ndarray]) – Matching remap arrays. shape: (H, W, 2) each. The cv2 backend is used with a per-image border_value = im_moving.mean() so that the cropped border matches the image statistics.

Returns:

ims_registered (Union[List[np.ndarray], np.ndarray]):

Registered images. Returned as a single np.ndarray when ims_moving was a bare ndarray, otherwise as a list.

Return type:

(Union[List[np.ndarray], np.ndarray])

plot_alignment_results_geometric(plot_direct: bool = True) Tuple[matplotlib.pyplot.Figure, matplotlib.pyplot.Figure | None][source]

Renders two-panel score + alignment heatmaps per registration stage.

Parameters:

plot_direct (bool) – If True and a direct all-to-all matrix was produced (i.e. the match-search ran), also render the “direct” stage. Otherwise only the “final” stage is drawn. (Default is True)

Returns:

tuple containing:
fig_final (matplotlib.figure.Figure):

Figure for the post-registration results.

fig_direct (Optional[matplotlib.figure.Figure]):

Figure for the direct (pre-match-search) results, or None if the match-search did not run.

Return type:

(Tuple[matplotlib.figure.Figure, Optional[matplotlib.figure.Figure]])

face_rhythm.data_importing module

class face_rhythm.data_importing.Dataset_videos(bufferedVideoReader: BufferedVideoReader = None, paths_videos: str | List[str] = None, contiguous: bool = False, frame_rate_clamp: float = None, verbose: bool | int = 1)[source]

Bases: FR_Module

Container for one or more videos used as input to the face-rhythm pipeline. RH 2022

Imports videos via decord (or wraps an existing BufferedVideoReader) and exposes lazy per-video readers along with aggregated metadata (frame counts, frame rate, frame shape, channel count). Acts as a sequence of video readers.

Parameters:
  • bufferedVideoReader (object) – Pre-built BufferedVideoReader whose readers and metadata are reused. Mutually exclusive with paths_videos; exactly one must be provided. (Default is None)

  • paths_videos (Union[str, List[str]]) – Path or list of paths to the video files to load. Used when bufferedVideoReader is None. (Default is None)

  • contiguous (bool) – If True, videos are treated as a single contiguous stream (the first frame of each subsequent video continues the frame index of the previous one). (Default is False)

  • frame_rate_clamp (float) – If None the frame rate stored in self.frame_rate is the median of the per-video metadata frame rates. If a float, that value is used verbatim. (Default is None)

  • verbose (Union[bool, int]) –

    Verbosity level.

    • 0: Silent.

    • 1: Warnings only.

    • 2: Warnings and informational progress messages.

    (Default is 1)

videos

Per-video lazy reader objects (VideoReaderWrapper instances or readers borrowed from bufferedVideoReader).

Type:

List[object]

paths_videos

Absolute paths to the source video files.

Type:

List[str]

metadata

Per-video metadata with keys 'paths_videos', 'num_frames', 'frame_rate', 'frame_height_width', and 'num_channels'.

Type:

dict

num_frames_total

Total number of frames summed across all videos.

Type:

int

frame_rate

Effective frame rate used by the pipeline.

Type:

float

frame_height_width

Frame height and width shared by all videos.

Type:

List[int]

num_channels

Number of channels shared by all videos.

Type:

int

example_image

The first frame of the first video, materialized as a CPU numpy array.

Type:

np.ndarray

contiguous

Whether videos are treated as a single contiguous stream.

Type:

bool

config

Inputs needed to reconstruct this object, used by FR_Module.

Type:

dict

run_info

Derived run-level metadata, used by FR_Module.

Type:

dict

run_data

Heavyweight outputs (currently example_image), used by FR_Module.

Type:

dict

face_rhythm.decomposition module

Tensor Component Analysis (TCA) wrapper around tensorly.

Provides the TCA class for decomposing multi-way arrays (time x points x frequency x …) produced upstream by the spectral pipeline. Handles dict-of- arrays ingestion, axis concatenation, complex-to-real unfolding, normalization, and the actual decomposition via tensorly’s CP / NN-HALS / Randomized CP solvers on either numpy or pytorch backends.

class face_rhythm.decomposition.TCA(verbose: bool | int = 1)[source]

Bases: FR_Module

Performs Tensor Component Analysis (TCA) on multi-way arrays produced by the spectral pipeline using tensorly solvers. RH 2022

Parameters:

verbose (Union[bool, int]) –

Verbosity level. One of

  • 0: No messages.

  • 1: Warnings only.

  • 2: Info messages.

(Default is 1)

config

Configuration dictionary populated by __init__ and updated by subsequent method calls.

Type:

dict

run_info

Run-time information populated after fitting and rearranging.

Type:

dict

run_data

Run-time data (factors and dimension names) populated after fitting and rearranging.

Type:

dict

rearrange_data(data: dict, names_dims_array: list = ['xy', 'points', 'frequency', 'time'], names_dims_concat_array: list = [['xy', 'points']], concat_complexDim: bool = True, name_dim_concat_complexDim: str = 'time', name_dim_dictElements: str = 'trials', method_handling_dictElements: str = 'concatenate', name_dim_concat_dictElements: str = 'time', idx_windows: list = None, name_dim_array_window: str = 'time')[source]

Rearranges the input data dictionary into a single tensor (or set of tensors) suitable for TCA. Supports concatenating array dimensions, unfolding the complex dimension, combining or stacking dictionary elements, and windowing each array along a chosen dimension.

Parameters:
  • data (dict) – Dictionary mapping element name to a numpy.ndarray of consistent rank. Arrays may be complex valued.

  • names_dims_array (list) – Names of the dimensions of the data arrays, in axis order. (Default is ['xy', 'points', 'frequency', 'time'])

  • names_dims_concat_array (list) – List of 2-element lists [dim_a, dim_b] describing pairs of array dimensions to concatenate. dim_a is folded into dim_b, producing a single dimension named '(dim_a dim_b)' with length len(dim_a) * len(dim_b). Pairs are applied in the order given. (Default is [['xy', 'points']])

  • concat_complexDim (bool) – If True, real and imaginary parts are stacked and folded into name_dim_concat_complexDim. Requires complex valued input. (Default is True)

  • name_dim_concat_complexDim (str) – Name of the array dimension into which the complex dimension is folded. Typically 'time'. (Default is 'time')

  • name_dim_dictElements (str) – Semantic name for the dictionary elements (e.g. 'trials' or 'videos'). (Default is 'trials')

  • method_handling_dictElements (str) –

    How to combine dictionary elements. One of

    • 'concatenate': Concatenate elements along name_dim_concat_dictElements; output is a single array of the same rank as inputs.

    • 'stack': Stack elements along a new leading axis; output is a single array with one extra dimension.

    • 'separate': Keep each element as its own tensor; decompositions run independently.

    (Default is 'concatenate')

  • name_dim_concat_dictElements (str) – Array dimension along which to concatenate dictionary elements. Only used when method_handling_dictElements is 'concatenate'. (Default is 'time')

  • idx_windows (list) – Per-element (start, end) index pairs (inclusive) defining a window along name_dim_array_window. If None, the full array is used. (Default is None)

  • name_dim_array_window (str) – Array dimension along which idx_windows is applied. Only used when idx_windows is not None. (Default is 'time')

normalize_data(mean_subtract: bool = False, std_divide: bool = False, dim_name: str = 'time')[source]

Normalizes self.data along a named dimension by optional mean subtraction and/or standard-deviation scaling. Requires that self.rearrange_data has already populated self.data.

Parameters:
  • mean_subtract (bool) – If True, subtracts the mean along dim_name. (Default is False)

  • std_divide (bool) – If True, divides by the standard deviation along dim_name. (Default is False)

  • dim_name (str) – Name of the array dimension to normalize over. Must match a name in self.names_dims_array_preDecomp (i.e. the names that exist after rearrange_data). (Default is 'time')

fit(data: dict = None, method: str = 'CP_NN_HALS', params_method: dict = {'cvg_criterion': 'abs_rec_error', 'exact': False, 'fixed_modes': None, 'init': 'svd', 'n_iter_max': 100, 'nn_modes': 'all', 'rank': 6, 'sparsity_coefficients': None, 'svd': 'truncated_svd', 'tol': 1e-07, 'verbose': False}, backend: str = 'pytorch', DEVICE: str = 'cpu', verbose: bool | int = 1)[source]

Fits a TCA model to the rearranged data using a tensorly decomposition. Populates self.factors, self.factors_raw, and self.factor_weights.

Parameters:
  • data (dict) – Dictionary of numpy.ndarray data arrays of identical shape. If None, self.data (set by rearrange_data) is used. (Default is None)

  • method (str) –

    tensorly decomposition class to instantiate. One of

    • 'CP_NN_HALS': Non-negative CP decomposition via the HALS algorithm.

    • 'CP': Standard CP decomposition.

    • 'RandomizedCP': Randomized CP decomposition for large tensors.

    • 'ConstrainedCP': Constrained CP decomposition.

    (Default is 'CP_NN_HALS')

  • params_method (dict) – Keyword arguments forwarded to the tensorly decomposition class. See tensorly documentation for valid keys. (Default is the CP_NN_HALS parameter set defined in the signature)

  • backend (str) –

    tensorly backend. One of

    • 'pytorch': Recommended for most use cases.

    • 'numpy': NumPy backend.

    (Default is 'pytorch')

  • DEVICE (str) – Torch device string (e.g. 'cpu' or 'cuda') used when backend is 'pytorch'. (Default is 'cpu')

  • verbose (Union[bool, int]) – Verbosity level. 0 is silent, 1 warnings, 2 info. (Default is 1)

order_factors_by_EVR(data: dict = None, factors: dict = None, weights: dict = None, overwrite_factors: bool = True)[source]

Reorders TCA factors by descending explained variance ratio (EVR) on each data tensor.

Parameters:
  • data (dict) – Dictionary of numpy.ndarray data tensors. If None, self.data is used. (Default is None)

  • factors (dict) – Dictionary of factor sets keyed to match data. If None, self.factors is used. (Default is None)

  • weights (dict) – Dictionary of CP weights keyed to match data. If None, self.factor_weights is used. (Default is None)

  • overwrite_factors (bool) – If True, writes the reordered results back to self.factors, self.factor_weights, and self.evrs_ordered. (Default is True)

Returns:

tuple containing:
orders (dict):

Per-key sort indices used to reorder factors.

factors_ordered (dict):

Per-key dictionaries of factors reordered by EVR.

weights_ordered (dict):

Per-key CP weight vectors reordered by EVR.

evrs_ordered (dict):

Per-key explained variance ratios in descending order.

Return type:

(tuple)

rearrange_factors(factors: dict = None, undo_concat_dictElements: bool = True, undo_concat_complexDim: bool = True)[source]

Reverses the dimension folding applied in rearrange_data so the fitted factors can be interpreted in the original data space. Populates self.factors_rearranged, self.names_dims_array_postDecomp, and self.name_dim_dictElements_postDecomp.

Parameters:
  • factors (dict) – Dictionary of factors to rearrange. If None, self.factors is used. (Default is None)

  • undo_concat_dictElements (bool) – If True, splits the concatenated dictionary-elements dimension back into per-element factors. Requires that method_handling_dictElements was 'concatenate'. (Default is True)

  • undo_concat_complexDim (bool) – If True, recombines real and imaginary halves of the folded complex dimension into complex-valued arrays. Requires that concat_complexDim was True. (Default is True)

plot_factors(factors: dict = None, figure_saver: Figure_Saver = None, show_figures: bool = True)[source]

Plots each leaf factor as a normalized line plot, one figure per factor. Optionally writes figures to disk via a util.Figure_Saver.

Parameters:
  • factors (dict) – Nested dictionary of factors to plot. If None, self.factors_rearranged is used if available, otherwise self.factors. (Default is None)

  • figure_saver (util.Figure_Saver) – Saver used to persist figures. If None, figures are not saved. (Default is None)

  • show_figures (bool) – If True, enables interactive mode so figures are shown. (Default is True)

face_rhythm.h5_handling module

HDF5 utilities: hierarchical traversal, group I/O, and bulk-close helpers.

Convenience wrappers around h5py for the face-rhythm project. Nothing here is CUDA- or video-specific; the module is safe to import anywhere.

face_rhythm.h5_handling.close_all_h5()[source]

Closes every open h5py.File object found in the Python workspace.

Iterates over all live objects via gc and calls close on any h5py.File instance. Falls back to tables.file._open_files.close_all if the primary loop raises. Adapted from https://stackoverflow.com/questions/29863342/close-an-open-h5py-data-file.

face_rhythm.h5_handling.show_group_items(hObj)[source]

Prints the items at the top hierarchical level of an HDF5 object or dict. RH 2021

See show_item_tree() for a full recursive listing.

Parameters:

hObj (object) – Hierarchical object: an h5py.File, h5py.Group, or a Python dict.

Example

with h5py.File(path, 'r') as f:
    h5_handling.show_group_items(f)
face_rhythm.h5_handling.show_item_tree(hObj=None, path=None, depth=None, show_metadata=True, print_metadata=False, indent_level=0)[source]

Recursively prints the items and groups in an HDF5 object or dict. RH 2021

Parameters:
  • hObj (object) – Hierarchical object: an h5py.File, h5py.Group, or a Python dict. Ignored when path is provided. (Default is None)

  • path (Optional[object]) – Path-like to an HDF5 file to open in read mode. If not None, the file is opened and traversed in place of hObj. (Default is None)

  • depth (Optional[int]) – Maximum number of hierarchical levels to descend. None means unlimited. (Default is None)

  • show_metadata (bool) – If True, list per-node metadata attributes alongside items. (Default is True)

  • print_metadata (bool) – If True, also print the value of each metadata attribute; otherwise only its shape and dtype are shown. (Default is False)

  • indent_level (int) – Internal recursion bookkeeping for indentation; users should leave this at the default. (Default is 0)

Example

with h5py.File(path, 'r') as f:
    h5_handling.show_item_tree(f)
face_rhythm.h5_handling.make_h5_tree(dict_obj, h5_obj, group_string='', use_compression=False, track_order=True)[source]

Recursively writes a Python dict into an HDF5 group/dataset tree. RH 2021

Intended to be called by write_dict_to_h5(); using it directly is not recommended.

Parameters:
  • dict_obj (dict) – Source dictionary whose hierarchy and leaf values become groups and datasets, respectively.

  • h5_obj (h5py.File) – Open HDF5 file (or group) into which the tree is written.

  • group_string (str) – Path of the current HDF5 group within h5_obj during recursion. An empty string is treated as the root '/'. (Default is '')

  • use_compression (bool) – If True, write each dataset with gzip level 9 compression. (Default is False)

  • track_order (bool) – If True, set h5py.get_config() to preserve insertion order of items. (Default is True)

face_rhythm.h5_handling.write_dict_to_h5(path_save, input_dict, use_compression=False, track_order=True, write_mode='w-', show_item_tree_pref=True)[source]

Writes a Python dict to an HDF5 file, mirroring its hierarchy and data. RH 2021

Wraps make_h5_tree() and optionally prints the resulting tree.

Parameters:
  • path_save (object) – Full path of the file to write. str or pathlib.Path.

  • input_dict (dict) – Dictionary whose leaves are HDF5-writable values (typically numpy.ndarray or strings).

  • use_compression (bool) – If True, write each dataset with gzip compression. (Default is False)

  • track_order (bool) – If True, preserve dict insertion order in the HDF5 file. (Default is True)

  • write_mode (str) –

    File-open mode forwarded to h5py.File. Either

    • 'w': Overwrite any existing file.

    • 'w-': Refuse to overwrite an existing file.

    (Default is 'w-')

  • show_item_tree_pref (bool) – If True, print the resulting HDF5 hierarchy after writing. (Default is True)

face_rhythm.h5_handling.simple_load(filepath, return_dict=True, verbose=False)[source]

Loads an HDF5 file and returns it as a nested dict or an open file. RH 2023

Parameters:
  • filepath (object) – Full path of the file to read. str or pathlib.Path.

  • return_dict (bool) – If True, return a nested dict whose keys are group names and whose leaves are the dataset arrays. If False, return the open h5py.File object instead. (Default is True)

  • verbose (bool) – If True, print the file’s hierarchy via show_item_tree() before returning. (Default is False)

Returns:

data (object):

Either a nested dict of arrays (when return_dict is True) or an open h5py.File handle.

Return type:

(object)

face_rhythm.h5_handling.h5Obj_to_dict(hObj)[source]

Converts an h5py group or file into a nested Python dict. RH 2023

Parameters:

hObj (object) – An h5py.File or h5py.Group to traverse.

Returns:

h5_dict (dict):

Nested dictionary mirroring the HDF5 hierarchy. Datasets are materialized via [()].

Return type:

(dict)

face_rhythm.h5_handling.simple_save(dict_to_save, path=None, use_compression=False, track_order=True, write_mode='w-', verbose=False)[source]

Saves a Python dict to an HDF5 file or appends it to an existing one. RH 2021

Parameters:
  • dict_to_save (dict) – Dictionary to save to the HDF5 file.

  • path (object) – Full path of the file to write. str or pathlib.Path. (Default is None)

  • use_compression (bool) – If True, write each dataset with gzip compression. (Default is False)

  • track_order (bool) – If True, preserve dict insertion order in the HDF5 file. (Default is True)

  • write_mode (str) –

    File-open mode forwarded to h5py.File. Either

    • 'w': Overwrite any existing file.

    • 'w-': Refuse to overwrite an existing file.

    • 'a': Append a new dataset to an existing file.

    (Default is 'w-')

  • verbose (bool) – If True, print the resulting HDF5 hierarchy after writing. (Default is False)

face_rhythm.h5_handling.merge_helper(d, group)[source]

Recursively merges a dictionary into an open h5py.Group.

Sub-dictionaries map to subgroups; non-dict values are written as datasets, replacing any existing dataset with the same name.

Parameters:
  • d (dict) – Dictionary containing the data to merge.

  • group (object) – Target h5py.Group (or h5py.File) to merge into.

face_rhythm.h5_handling.merge_dict_into_h5_file(d, filepath=None, h5Obj=None)[source]

Merges a dictionary into an existing HDF5 file or open file object.

Wraps merge_helper(), which recursively walks the dict and merges each level into the matching HDF5 group. Exactly one of filepath or h5Obj must be supplied.

Parameters:
  • d (dict) – Dictionary containing the data to merge.

  • filepath (Optional[str]) – Path to an HDF5 file to open in append mode. Do not specify when h5Obj is provided. (Default is None)

  • h5Obj (object) – Open h5py.File object to merge into. Do not specify when filepath is provided. (Default is None)

face_rhythm.helpers module

General-purpose helpers: video I/O wrappers, path tools, image warping, downloads.

Collected utilities used across the face-rhythm package. Notable groups:

  • Video readers (VideoReaderWrapper, BufferedVideoReader) around decord / torchcodec with pre-fetch threads.

  • Path and file helpers (find_paths, prepare_filepath_for_saving, download + hash verification, zip extraction).

  • Image registration helpers (find_geometric_transformation, remap-index and flow-field conversions) used by face_rhythm.rois.

  • Parameter dictionary utilities (fill_missing_keys_with_defaults, flatten_dict) and a handful of numerical / plotting / device utilities.

Some routines are adapted from Rich Hakim’s basic_neural_processing_modules.

face_rhythm.helpers.prepare_cv2_imshow()[source]

Pre-initializes cv2.imshow to avoid kernel crashes. RH 2022

Calling cv2.imshow after av or decord have been imported can crash the Python kernel. Showing a small dummy frame here primes the OpenCV display loop so subsequent cv2.imshow calls work safely.

face_rhythm.helpers.find_paths(dir_outer: str | List[str], reMatch: str = 'filename', reMatch_in_path: str | None = None, find_files: bool = True, find_folders: bool = False, depth: int = 0, natsorted: bool = True, alg_ns: str | None = None, verbose: bool = False) List[str][source]

Searches for files and/or folders recursively in a directory using a regex match. RH 2022-2023

Parameters:
  • dir_outer (Union[str, List[str]]) – Path(s) to the directory(ies) to search. If a list of directories, then all directories will be searched.

  • reMatch (str) – Regular expression to match. Each file or folder name encountered will be compared using re.search(reMatch, filename). If the output is not None, the file will be included in the output.

  • reMatch_in_path (Optional[str]) –

    Additional regular expression to match anywhere in the upper path. Useful for finding files/folders in specific subdirectories. If None, then no additional matching is done.

    (Default is None)

  • find_files (bool) – Whether to find files. (Default is True)

  • find_folders (bool) – Whether to find folders. (Default is False)

  • depth (int) –

    Maximum folder depth to search. (Default is 0).

    • depth=0 means only search the outer directory.

    • depth=2 means search the outer directory and two levels of subdirectories below it

  • natsorted (bool) – Whether to sort the output using natural sorting with the natsort package. (Default is True)

  • alg_ns (str) – Algorithm to use for natural sorting. See natsort.ns or https://natsort.readthedocs.io/en/4.0.4/ns_class.html/ for options. Default is PATH. Other commons are INT, FLOAT, VERSION. (Default is None)

  • verbose (bool) – Whether to print the paths found. (Default is False)

Returns:

paths (List[str]):

Paths to matched files and/or folders in the directory.

Return type:

(List[str])

face_rhythm.helpers.prepare_path(path: str, mkdir: bool = False, exist_ok: bool = True) str[source]

Validates a directory or file path for saving or loading. RH 2023

Resolution rules:

  • If the path exists and exist_ok is True, it is accepted.

  • If the path exists and exist_ok is False, an error is raised.

  • If the path does not exist and refers to a file: the parent directory is created when mkdir is True, otherwise an error is raised when the parent does not exist.

  • If the path does not exist and refers to a directory: the directory is created when mkdir is True, otherwise an error is raised.

Parameters:
  • path (str) – Path to be checked.

  • mkdir (bool) – If True, creates the parent directory (or directory) if it does not exist. (Default is False)

  • exist_ok (bool) – If True, allows the path to already exist. (Default is True)

Returns:

path (str):

Resolved absolute path.

Return type:

(str)

face_rhythm.helpers.prepare_filepath_for_saving(filepath: str, mkdir: bool = False, allow_overwrite: bool = True) str[source]

Prepares a file path for saving a file. Ensures the file path is valid and has the necessary permissions.

Parameters:
  • filepath (str) – The file path to be prepared for saving.

  • mkdir (bool) – If set to True, creates parent directory if it does not exist. (Default is False)

  • allow_overwrite (bool) – If set to True, allows overwriting of existing file. (Default is True)

Returns:

path (str):

The prepared file path for saving.

Return type:

(str)

face_rhythm.helpers.prepare_filepath_for_loading(filepath: str, must_exist: bool = True) str[source]

Prepares a file path for loading a file. Ensures the file path is valid and has the necessary permissions.

Parameters:
  • filepath (str) – The file path to be prepared for loading.

  • must_exist (bool) – If set to True, the file at the specified path must exist. (Default is True)

Returns:

path (str):

The prepared file path for loading.

Return type:

(str)

face_rhythm.helpers.prepare_directory_for_saving(directory: str, mkdir: bool = False, exist_ok: bool = True) str[source]

Prepares a directory path for saving a file. This function is rarely used.

Parameters:
  • directory (str) – The directory path to be prepared for saving.

  • mkdir (bool) – If set to True, creates parent directory if it does not exist. (Default is False)

  • exist_ok (bool) – If set to True, allows overwriting of existing directory. (Default is True)

Returns:

path (str):

The prepared directory path for saving.

Return type:

(str)

face_rhythm.helpers.prepare_directory_for_loading(directory: str, must_exist: bool = True) str[source]

Prepares a directory path for loading a file. This function is rarely used.

Parameters:
  • directory (str) – The directory path to be prepared for loading.

  • must_exist (bool) – If set to True, the directory at the specified path must exist. (Default is True)

Returns:

path (str):

The prepared directory path for loading.

Return type:

(str)

face_rhythm.helpers.pickle_save(obj: Any, filepath: str, mode: str = 'wb', zipCompress: bool = False, mkdir: bool = False, allow_overwrite: bool = True, **kwargs_zipfile: Dict[str, Any]) None[source]

Saves an object to a pickle file using pickle.dump. Allows for zipping of the file.

RH 2022

Parameters:
  • obj (Any) – The object to save.

  • filepath (str) – The path to save the object to.

  • mode (str) –

    The mode to open the file in. Options are:

    • 'wb': Write binary.

    • 'ab': Append binary.

    • 'xb': Exclusive write binary. Raises FileExistsError if the file already exists.

    (Default is 'wb')

  • zipCompress (bool) – If True, compresses pickle file using zipfileCompressionMethod, which is similar to savez_compressed in numpy (with zipfile.ZIP_DEFLATED). Useful for saving redundant and/or sparse arrays objects. (Default is False)

  • mkdir (bool) – If True, creates parent directory if it does not exist. (Default is False)

  • allow_overwrite (bool) – If True, allows overwriting of existing file. (Default is True)

  • kwargs_zipfile (Dict[str, Any]) –

    Keyword arguments that will be passed into zipfile.ZipFile. compression=``zipfile.ZIP_DEFLATED`` by default. See https://docs.python.org/3/library/zipfile.html#zipfile-objects. Other options for ‘compression’ are (input can be either int or object):

    • 0: zipfile.ZIP_STORED (no compression)

    • 8: zipfile.ZIP_DEFLATED (usual zip compression)

    • 12: zipfile.ZIP_BZIP2 (bzip2 compression) (usually not as good as ZIP_DEFLATED)

    • 14: zipfile.ZIP_LZMA (lzma compression) (usually better than ZIP_DEFLATED but slower)

face_rhythm.helpers.pickle_load(filepath: str, zipCompressed: bool = False, mode: str = 'rb') Any[source]

Loads an object from a pickle file. RH 2022

Parameters:
  • filepath (str) – Path to the pickle file.

  • zipCompressed (bool) – If True, the file is assumed to be a .zip file. The function will first unzip the file, then load the object from the unzipped file. (Default is False)

  • mode (str) – The mode to open the file in. (Default is 'rb')

Returns:

obj (Any):

The object loaded from the pickle file.

Return type:

(Any)

face_rhythm.helpers.json_save(obj: Any, filepath: str, indent: int = 4, mode: str = 'w', mkdir: bool = False, allow_overwrite: bool = True) None[source]

Saves an object to a json file using json.dump. RH 2022

Parameters:
  • obj (Any) – The object to save.

  • filepath (str) – The path to save the object to.

  • indent (int) – Number of spaces for indentation in the output json file. (Default is 4)

  • mode (str) –

    The mode to open the file in. Options are:

    • 'wb': Write binary.

    • 'ab': Append binary.

    • 'xb': Exclusive write binary. Raises FileExistsError if the file already exists.

    (Default is 'w')

  • mkdir (bool) – If True, creates parent directory if it does not exist. (Default is False)

  • allow_overwrite (bool) – If True, allows overwriting of existing file. (Default is True)

face_rhythm.helpers.json_load(filepath: str, mode: str = 'r') Any[source]

Loads an object from a json file. RH 2022

Parameters:
  • filepath (str) – Path to the json file.

  • mode (str) – The mode to open the file in. (Default is 'r')

Returns:

obj (Any):

The object loaded from the json file.

Return type:

(Any)

face_rhythm.helpers.yaml_save(obj: object, filepath: str, indent: int = 4, mode: str = 'w', mkdir: bool = False, allow_overwrite: bool = True) None[source]

Saves an object to a YAML file using the yaml.dump method. RH 2022

Parameters:
  • obj (object) – The object to be saved.

  • filepath (str) – Path to save the object to.

  • indent (int) – The number of spaces for indentation in the saved YAML file. (Default is 4)

  • mode (str) –

    Mode to open the file in.

    • 'w': write (default)

    • 'wb': write binary

    • 'ab': append binary

    • 'xb': exclusive write binary. Raises FileExistsError if file already exists.

    (Default is 'w')

  • mkdir (bool) – If True, creates the parent directory if it does not exist. (Default is False)

  • allow_overwrite (bool) – If True, allows overwriting of existing files. (Default is True)

face_rhythm.helpers.yaml_load(filepath: str, mode: str = 'r', loader: object = <class 'yaml.loader.FullLoader'>) object[source]

Loads a YAML file. RH 2022

Parameters:
  • filepath (str) – Path to the YAML file to load.

  • mode (str) – Mode to open the file in. (Default is 'r')

  • loader (object) –

    The YAML loader to use.

    • yaml.FullLoader: Loads the full YAML language. Avoids arbitrary code execution. (Default for PyYAML 5.1+)

    • yaml.SafeLoader: Loads a subset of the YAML language, safely. This is recommended for loading untrusted input.

    • yaml.UnsafeLoader: The original Loader code that could be easily exploitable by untrusted data input.

    • yaml.BaseLoader: Only loads the most basic YAML. All scalars are loaded as strings.

    (Default is yaml.FullLoader)

Returns:

loaded_obj (object):

The object loaded from the YAML file.

Return type:

(object)

face_rhythm.helpers.download_file(url: str | None, path_save: str, check_local_first: bool = True, check_hash: bool = False, hash_type: str = 'MD5', hash_hex: str | None = None, mkdir: bool = False, allow_overwrite: bool = True, write_mode: str = 'wb', verbose: bool = True, chunk_size: int = 1024) None[source]

Downloads a file from a URL to a local path using requests. Checks if file already exists locally and verifies the hash of the downloaded file against a provided hash if required. RH 2023

Parameters:
  • url (Optional[str]) – URL of the file to download. If None, then no download is attempted. (Default is None)

  • path_save (str) – Path to save the file to.

  • check_local_first (bool) – Whether to check if the file already exists locally. If True and the file exists locally, the download is skipped. If True and check_hash is also True, the hash of the local file is checked. If the hash matches, the download is skipped. If the hash does not match, the file is downloaded. (Default is True)

  • check_hash (bool) – Whether to check the hash of the local or downloaded file against hash_hex. (Default is False)

  • hash_type (str) – Type of hash to use. Options are: 'MD5', 'SHA1', 'SHA256', 'SHA512'. (Default is 'MD5')

  • hash_hex (Optional[str]) – Hash to compare to, in hexadecimal format (e.g., ‘a1b2c3d4e5f6…’). Can be generated using hash_file() or hashlib.hexdigest(). If check_hash is True, hash_hex must be provided. (Default is None)

  • mkdir (bool) – If True, creates the parent directory of path_save if it does not exist. (Default is False)

  • write_mode (str) – Write mode for saving the file. Options include: 'wb' (write binary), 'ab' (append binary), 'xb' (write binary, fail if file exists). (Default is 'wb')

  • verbose (bool) – If True, prints status messages. (Default is True)

  • chunk_size (int) – Size of chunks in which to download the file. (Default is 1024)

face_rhythm.helpers.hash_file(path: str, type_hash: str = 'MD5', buffer_size: int = 65536) str[source]

Computes the hash of a file using the specified hash type and buffer size. RH 2022

Parameters:
  • path (str) – Path to the file to be hashed.

  • type_hash (str) –

    Type of hash to use. (Default is 'MD5'). Either

    • 'MD5': MD5 hash algorithm.

    • 'SHA1': SHA1 hash algorithm.

    • 'SHA256': SHA256 hash algorithm.

    • 'SHA512': SHA512 hash algorithm.

  • buffer_size (int) – Buffer size (in bytes) for reading the file. 65536 corresponds to 64KB. (Default is 65536)

Returns:

hash_val (str):

The computed hash of the file.

Return type:

(str)

face_rhythm.helpers.get_dir_contents(directory: str) Tuple[List[str], List[str]][source]

Retrieves the names of the folders and files in a directory (does not include subdirectories). RH 2021

Parameters:

directory (str) – The path to the directory.

Returns:

tuple containing:
folders (List[str]):

A list of folder names.

files (List[str]):

A list of file names.

Return type:

(tuple)

face_rhythm.helpers.compare_file_hashes(hash_dict_true: Dict[str, Tuple[str, str]], dir_files_test: str | None = None, paths_files_test: List[str] | None = None, verbose: bool = True) Tuple[bool, Dict[str, bool], Dict[str, str]][source]

Compares hashes of files in a directory or list of paths to provided hashes. RH 2022

Parameters:
  • hash_dict_true (Dict[str, Tuple[str, str]]) – Dictionary of hashes to compare. Each entry should be in the format: {‘key’: (‘filename’, ‘hash’)}.

  • dir_files_test (str) – Path to directory containing the files to compare hashes. Unused if paths_files_test is not None. (Optional)

  • paths_files_test (List[str]) – List of paths to files to compare hashes. dir_files_test is used if None. (Optional)

  • verbose (bool) – If True, failed comparisons are printed out. (Default is True)

Returns:

tuple containing:
total_result (bool):

True if all hashes match, False otherwise.

individual_results (Dict[str, bool]):

Dictionary indicating whether each hash matched.

paths_matching (Dict[str, str]):

Dictionary of paths that matched. Each entry is in the format: {‘key’: ‘path’}.

Return type:

(tuple)

face_rhythm.helpers.extract_zip(path_zip: str, path_extract: str | None = None, verbose: bool = True) List[str][source]

Extracts a zip file. RH 2022

Parameters:
  • path_zip (str) – Path to the zip file.

  • path_extract (Optional[str]) – Path (directory) to extract the zip file to. If None, extracts to the same directory as the zip file. (Default is None)

  • verbose (bool) – Whether to print progress. (Default is True)

Returns:

paths_extracted (List[str]):

List of paths to the extracted files.

Return type:

(List[str])

face_rhythm.helpers.make_batches(iterable, batch_size=None, num_batches=None, min_batch_size=0, return_idx=False, length=None)[source]

Generates batches of data from an iterable. RH 2021

Parameters:
  • iterable (Iterable) – Iterable to be batched.

  • batch_size (Optional[int]) – Size of each batch. If None, batch_size is computed from num_batches. (Default is None)

  • num_batches (Optional[int]) – Number of batches to make. Used only when batch_size is None. (Default is None)

  • min_batch_size (int) – Minimum size of each batch. Batches smaller than this are skipped. (Default is 0)

  • return_idx (bool) – If True, yields (batch, [start, end]) tuples instead of just the batch. (Default is False)

  • length (Optional[int]) – Length of the iterable. If None, uses len(iterable). Useful when the iterable does not implement __len__. (Default is None)

Returns:

output (Generator):

Yields successive batches from iterable. If return_idx is True, yields (batch, [start, end]) tuples.

Return type:

(Generator)

face_rhythm.helpers.cp_to_dense(cp, weights=None)[source]

Reconstructs a dense tensor from a CP-format list of factor matrices. RH 2022

Parameters:
  • cp (List[np.ndarray]) – List of length n_modes of 2D factor matrices, each with shape (len_dim, rank). This is the format Tensorly uses for its 'cp' representation. Elements may be NumPy arrays or torch.Tensor (matching dtype).

  • weights (Optional[np.ndarray]) – Per-rank weights of length rank. If None, uses a vector of ones. (Default is None)

Returns:

dense (np.ndarray):

Reconstructed dense tensor. shape: (len_dim_0, len_dim_1, …).

Return type:

(np.ndarray)

class face_rhythm.helpers.Lazy_repeat_item(item, pseudo_length=None)[source]

Bases: object

Lazy iterator-like container that always returns the same item. RH 2021

Parameters:
  • item (Any) – Item to repeat on every access.

  • pseudo_length (Optional[int]) – Reported length of the container. If None, the container has no enforced length and __getitem__ always returns item. (Default is None)

item

The repeated item.

Type:

Any

pseudo_length

Stored pseudo length.

Type:

Optional[int]

face_rhythm.helpers.deep_update_dict(dictionary, key, new_val=None, new_key=None, in_place=False)[source]

Updates a value or renames a key inside a nested dictionary. RH 2022

Parameters:
  • dictionary (Dict) – Dictionary to update.

  • key (List[str]) – Hierarchical path of string keys leading to the entry to update. Each element corresponds to a nesting level.

  • new_val (Optional[Any]) – New value to assign. If None, new_key must be provided and only the key is renamed. (Default is None)

  • new_key (Optional[str]) – If provided, key[-1] is removed and replaced with new_key (mapping to new_val if given, otherwise to the existing value). (Default is None)

  • in_place (bool) – If True, updates dictionary in place and returns None. If False, returns a deep-copied updated dictionary. (Default is False)

Returns:

output (Optional[Dict]):

Updated dictionary when in_place is False; otherwise None.

Return type:

(Optional[Dict])

Example

deep_update_dict(params, ['dataloader_kwargs', 'prefetch_factor'], val)
face_rhythm.helpers.flatten_dict(d: MutableMapping, parent_key: str = '', sep: str = '.') MutableMapping[source]

Flattens a nested dictionary into a single dictionary. RH 2022

All keys are coerced to strings and joined by sep. Adapted from https://stackoverflow.com/a/6027615.

Parameters:
  • d (MutableMapping) – Dictionary to flatten.

  • parent_key (str) – Key prefix prepended to flattened keys. Used internally for recursion. (Default is '')

  • sep (str) – Separator between key components. Used internally for recursion. (Default is '.')

Returns:

flattened (Dict):

Flat dictionary with paths joined by sep.

Return type:

(Dict)

face_rhythm.helpers.find_subDict_key(d: dict, s: str, max_depth: int = 9999999)[source]

Recursively searches a nested dictionary for keys matching a regex.

Parameters:
  • d (dict) – Dictionary to search.

  • s (str) – Regex pattern that keys are matched against.

  • max_depth (int) – Maximum depth to descend. 1 searches only the top level, 2 searches the first and second levels, etc. (Default is 9999999)

Returns:

k_all (List[Tuple[List[str], Any]]):

List of 2-tuples (path, value) where path is the list of string keys leading to the matched entry and value is the matched sub-dictionary value.

Return type:

(List[Tuple[List[str], Any]])

face_rhythm.helpers.fill_in_dict(d: Dict, defaults: Dict, verbose: bool = True, hierarchy: List[str] = ['dict'])[source]

Fills in a dictionary in place with values from defaults for missing keys, recursing into nested dictionaries. RH 2023

Parameters:
  • d (Dict) – Dictionary to fill in (modified in place).

  • defaults (Dict) – Dictionary of default values.

  • verbose (bool) – If True, prints a message each time a default value is inserted. (Default is True)

  • hierarchy (List[str]) – Path of keys leading to d. Used internally for recursion. (Default is ['dict'])

face_rhythm.helpers.check_keys_subset(d, default_dict, error_on_missing_keys=True, hierarchy=['defaults'])[source]

Verifies recursively that every key in d also appears in default_dict. RH 2023

Parameters:
  • d (Dict) – Dictionary to check.

  • default_dict (Dict) – Dictionary containing the allowed keys.

  • error_on_missing_keys (bool) – If True, raises AssertionError when a key in d is not in default_dict. If False, emits a warning instead. (Default is True)

  • hierarchy (List[str]) – Path of keys leading to d. Used internally for recursion. (Default is ['defaults'])

face_rhythm.helpers.prepare_params(params, defaults, error_on_missing_keys=True, verbose=True)[source]

Validates params against defaults and fills in missing keys.

Performs the following:

  • Checks that all keys in params are also in defaults.

  • Fills in any missing keys in params with values from defaults.

  • Returns a deepcopy of the filled-in params.

Parameters:
  • params (Dict) – Dictionary of parameters.

  • defaults (Dict) – Dictionary of defaults.

  • error_on_missing_keys (bool) – If True, raises an error when a key in params is not in defaults. If False, emits a warning instead. (Default is True)

  • verbose (bool) – If True, prints messages while filling in defaults. (Default is True)

Returns:

params_out (Dict):

Validated and default-filled deepcopy of params.

Return type:

(Dict)

class face_rhythm.helpers.VideoReaderWrapper(*args: Any, **kwargs: Any)[source]

Bases: VideoReader

Subclass of decord.VideoReader that works around a memory leak.

Calls self.seek(0) after initialization and after every __getitem__ so that decord releases buffered frames. Adapted from https://github.com/dmlc/decord/issues/208#issuecomment-1157632702.

path

Path to the video file (the first positional argument).

Type:

str

class face_rhythm.helpers.TorchCodecVideoReader(path_video: str, device: str = 'cpu', num_ffmpeg_threads: int = 0)[source]

Bases: object

Video reader backed by torchcodec.decoders.VideoDecoder with a workaround for torchcodec issue #905.

Provides the same __getitem__ / __len__ / get_avg_fps interface as VideoReaderWrapper (decord) so it can be used as a drop-in replacement inside BufferedVideoReader.

Frames are returned as torch.Tensor with shape (H, W, C) and dtype uint8 (NHWC layout), matching the output of decord’s torch bridge.

Issue #905 workaround: torchcodec’s sequential access path skips cursor reset; after reading n - has_b_frames frames on a single decoder, FFmpeg’s H.264 drain emits has_b_frames AVFrames with pts = INT64_MIN, which the internal PTS filter rejects, causing EndOfFileException before the last frames are decoded. The fix is to serve only frames [0, n - SAFETY) from the primary decoder (which never reaches the drain), and route the trailing SAFETY frames through a fresh decoder that takes the non-sequential seek branch (avformat_seek_file + flush + forward-decode from keyframe) where pkt_dts remains valid. SAFETY = max(has_b_frames, 2). torchcodec’s VideoStreamMetadata does not currently expose has_b_frames, so SAFETY always defaults to 2, which matches ffprobe-reported has_b_frames=2 for H.264 AVIs and is a safe overestimate for has_b_frames=0 files.

The tail decoder is lazily created on first tail access and cached for the lifetime of this reader (one decoder recreation per video pass, versus one per chunk with earlier workarounds).

Thread safety is guaranteed by an internal lock — required because BufferedVideoReader loads slots from background threads.

Parameters:
  • path_video (str) – Path to the video file.

  • device (str) – Decode device. 'cpu' for software decode, 'cuda' or 'cuda:0' for NVDEC hardware decode. NVDEC requires torchcodec built with CUDA support and an FFmpeg built with --enable-cuda.

  • num_ffmpeg_threads (int) – Number of FFmpeg internal threads for decoding. 0 lets FFmpeg choose automatically (recommended).

get_avg_fps() float[source]

Return the average frame rate of the video.

class face_rhythm.helpers.BufferedVideoReader(video_readers: list = None, paths_videos: list = None, buffer_size: int = 1000, prefetch: int = 2, posthold: int = 1, method_getitem: str = 'continuous', starting_seek_position: int = 0, backend: str = 'torchcodec', device: str = 'cpu', decord_backend: str = 'torch', decord_ctx=None, verbose: int = 1)[source]

Bases: object

Reads frames from one or more videos with a chunked memory buffer and optional background prefetching. RH 2022

Sequential batches of frames can be read quickly because buffers are filled by background threads. In many cases, batches can be consumed without waiting for the next chunk to finish loading.

Optimal use:

  1. Create a BufferedVideoReader object.

  2. EITHER set method_getitem='continuous' and iterate over the object (fastest path), OR request batches of frames sequentially (going backwards is slow because buffers move forward).

  3. Each batch should fit inside a single buffer slot. Slices that span multiple buffer slots require concatenation and are slow. With a buffer size of 1000 frames, [0:1000], [1000:2000], ... is fast, while [0:1700], [1700:3200], [0:990], [990:1010] are slow (too big, overlapping, backwards, or crossing slot boundaries).

Parameters:
  • video_readers (Optional[list]) – List of video reader objects (decord.VideoReader or TorchCodecVideoReader). A single reader is also accepted. If None, paths_videos must be provided. (Default is None)

  • paths_videos (Optional[list]) – List of paths to videos. A single str is also accepted. If None, video_readers must be provided. If both are supplied, video_readers wins. (Default is None)

  • buffer_size (int) – Number of frames per buffer slot. Avoid indexing more than buffer_size frames at a time or across slot boundaries (e.g. across idx % buffer_size == 0); these require concatenating buffers and are slow. (Default is 1000)

  • prefetch (int) – Number of buffers to prefetch ahead. 0 disables prefetching. A single buffer slot only contains frames from one video, so buffer_size <= video length is recommended. (Default is 2)

  • posthold (int) – Number of buffers to keep loaded behind the current position. 0 disables posthold. Useful when iterating backwards. (Default is 1)

  • method_getitem (str) –

    Indexing mode for __getitem__. One of

    • 'continuous': index across all videos as a single concatenated sequence; reader[idx_frames_slice].

    • 'by_video': index requires a (idx_video, idx_frames) tuple; reader[(idx_video, slice)].

    (Default is 'continuous')

  • starting_seek_position (int) – Starting frame index for the iterator. Used only when method_getitem == 'continuous' and iterating. (Default is 0)

  • backend (str) –

    Video decoding backend. One of

    • 'torchcodec': uses torchcodec.decoders.VideoDecoder. Frame-accurate seeking, actively maintained, supports CPU and GPU (NVDEC) decode. Includes a workaround for torchcodec issue #905 (sequential-access drain bug near EOF in H.264 AVIs): frames in [0, n - SAFETY) come from a persistent decoder; the trailing SAFETY = max(has_b_frames, 2) frames go through a fresh decoder cached as the tail decoder.

    • 'decord': uses decord.VideoReader. Well-tested fallback and the only backend available on Windows. Provided by the decord2 PyPI package on Linux/macOS (with vendored FFmpeg 8 wheels for py3.10-3.14) and by eva_decord on Windows. Both are installed by face-rhythm’s default dependencies.

    Only used when paths_videos is provided. (Default is 'torchcodec')

  • device (str) – Device for video decoding when using torchcodec. 'cpu' decodes on CPU. 'cuda' or 'cuda:0' decodes on GPU using NVDEC; frames are returned as CUDA tensors. GPU decode requires an NVIDIA GPU, torchcodec installed with CUDA support, and FFmpeg built with --enable-cuda. (Default is 'cpu')

  • decord_backend (str) – Backend used by decord when loading frames ('torch', 'numpy', 'mxnet', …). Only used when backend='decord'. (Default is 'torch')

  • decord_ctx (object) – Context used by decord when loading frames (e.g. decord.cpu(), decord.gpu()). Only used when backend='decord'. (Default is None)

  • verbose (int) – Verbosity level. 0 silences output, 1 prints warnings, 2 prints warnings and info. (Default is 1)

num_frames_total

Total number of frames across all videos.

Type:

int

num_videos

Number of videos being read.

Type:

int

metadata

Per-video metadata (path, length, fps, frame size, channels).

Type:

pandas.DataFrame

frame_rate

Frame rate of each video.

Type:

List[float]

frame_height_width

(H, W) of each video.

Type:

List[Tuple[int, int]]

num_channels

Number of channels of each video.

Type:

List[int]

slots

Buffer slots holding chunks of decoded frames.

Type:

List[List[Optional[torch.Tensor]]]

boundaries

Inclusive (start, end) frame index for each slot.

Type:

List[List[Tuple[int, int]]]

lookup

Lookup table mapping continuous frame index to (video, slot).

Type:

pandas.DataFrame

delete_all_slots()[source]

Frees every currently loaded slot by delegating to _delete_slots.

wait_for_loading()[source]

Blocks until every background slot-loading thread has finished.

get_frames_from_single_video_index(idx: tuple)[source]

Returns frames from a single video by (video, frame) index.

If idx is an int or slice it is interpreted as a video index and a new BufferedVideoReader is constructed over the selected videos.

Parameters:

idx (Union[int, slice, Tuple[int, Union[int, slice]]]) – Either (idx_video, idx_frames) to read frames from one video, or an int / slice to spawn a reader over a subset of videos.

Returns:

frames (Union[torch.Tensor, BufferedVideoReader]):

Decoded frames with shape (num_frames, H, W, C) when idx is a tuple, or a new reader when idx selects videos.

Return type:

(Union[torch.Tensor, BufferedVideoReader])

get_frames_from_continuous_index(idx)[source]

Returns frames addressed by a continuous (concatenated) frame index.

The videos are treated as one long sequence of frames; idx is the index of the frames within this sequence.

Parameters:

idx (Union[int, slice]) – Frame index. If an int, a single frame is returned. If a slice, the corresponding batch of frames is returned.

Returns:

frames (torch.Tensor):

Stacked frames. shape: (num_frames, height, width, num_channels).

Return type:

(torch.Tensor)

set_iterator_frame_idx(idx)[source]

Sets the starting frame for the iterator.

Parameters:

idx (int) – Frame index from which the iterator should start. Must be in 'continuous' format, i.e. the index of the frame within the concatenated sequence of all videos.

face_rhythm.helpers.save_gif(array, path, frameRate=5.0, loop=0, backend='PIL', kwargs_backend={})[source]

Saves an array of images as an animated GIF. RH 2023

Parameters:
  • array (Union[np.ndarray, list]) – 3D (grayscale) or 4D (color) array of images. If dtype is floating, values are interpreted in [0, 1]; if integer, in [0, 255].

  • path (str) – Output path for the GIF.

  • frameRate (float) – Frame rate of the GIF in frames per second. (Default is 5.0)

  • loop (int) – Number of loops. 0 loops forever, 1 plays once, 2 plays twice, etc. (Default is 0)

  • backend (str) –

    GIF writer backend. One of

    • 'imageio'

    • 'PIL'

    (Default is 'PIL')

  • kwargs_backend (dict) – Extra keyword arguments forwarded to the chosen backend. (Default is {})

face_rhythm.helpers.grayscale_to_rgb(array)[source]

Converts a grayscale image or movie to RGB by repeating the channel. RH 2023

Parameters:

array (Union[np.ndarray, torch.Tensor, list]) – 2D image or 3D movie of grayscale frames. Lists of arrays or tensors are stacked first.

Returns:

rgb (Union[np.ndarray, torch.Tensor]):

Same backend as the input with an extra trailing channel dimension of size 3.

Return type:

(Union[np.ndarray, torch.Tensor])

class face_rhythm.helpers.Toeplitz_convolution2d(x_shape, k, mode='same', dtype=None)[source]

Bases: object

Convolves a 2D array with a 2D kernel via Toeplitz matrix multiplication. RH 2022

Allows sparse x inputs (k must remain dense). Ideal when x is very sparse (density < 0.01), x is small (shape < (1000, 1000)), k is small (shape < (100, 100)), and the batch size is large (e.g. 1000+). Generally faster than scipy.signal.convolve2d when convolving many arrays with the same kernel. Memory footprint stays low because the Toeplitz matrix is held as a sparse matrix.

See https://stackoverflow.com/a/51865516 and https://github.com/alisaaalehi/convolution_as_multiplication for an illustration. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.convolution_matrix.html for the 1D version, and https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.matmul_toeplitz.html for potential speedups.

Parameters:
  • x_shape (Tuple[int, int]) – Shape of the 2D array to be convolved.

  • k (np.ndarray) – 2D kernel to convolve with.

  • mode (str) –

    Convolution mode. One of

    • 'full'

    • 'same'

    • 'valid'

    See scipy.signal.convolve2d for details. (Default is 'same')

  • dtype (Optional[np.dtype]) – Data type for the Toeplitz matrix. Ideally matches the dtype of the input array. If None, the dtype of k is used. (Default is None)

k

Flipped copy of the kernel used internally.

Type:

np.ndarray

mode

Convolution mode set in __init__.

Type:

str

x_shape

Stored input shape.

Type:

Tuple[int, int]

dtype

Data type of the Toeplitz matrix.

Type:

np.dtype

so

Output array size before cropping.

Type:

Tuple[int, int]

dt

The double-block Toeplitz matrix in sparse CSR form.

Type:

scipy.sparse.csr_matrix

Example

conv = Toeplitz_convolution2d(x_shape=x.shape, k=kernel, mode='same')
y = conv(x)
face_rhythm.helpers.cosine_kernel_2D(center=(5, 5), image_size=(11, 11), width=5)[source]

Generates a 2D radial cosine kernel. RH 2021

Parameters:
  • center (Tuple[int, int]) – (x, y) peak position, zero-indexed. Set the second element to 0 to obtain a 1D kernel. (Default is (5, 5))

  • image_size (Tuple[int, int]) – (width, height) of the output kernel. Set the second element to 0 for a 1D kernel. (Default is (11, 11))

  • width (float) – Full width of one cycle of the cosine. (Default is 5)

Returns:

k_cos (np.ndarray):

Cosine kernel. shape: (image_size[0], image_size[1]).

Return type:

(np.ndarray)

face_rhythm.helpers.bounded_logspace(start, stop, num)[source]

Logarithmically spaced values between start and stop (inclusive). RH 2022

Parameters:
  • start (float) – First value in the output array.

  • stop (float) – Last value in the output array.

  • num (int) – Number of values in the output array.

Returns:

output (np.ndarray):

Logarithmically spaced values bounded by start and stop. shape: (num,).

Return type:

(np.ndarray)

face_rhythm.helpers.gaussian(x=None, mu=0, sig=1, plot_pref=False)[source]

Evaluates a normalized 1D Gaussian function on a grid. RH 2021

Parameters:
  • x (Optional[np.ndarray]) – 1D array of x positions. If None, a default range covering five sigma on each side is used. (Default is None)

  • mu (float) – Mean of the Gaussian. (Default is 0)

  • sig (float) – Standard deviation of the Gaussian. (Default is 1)

  • plot_pref (bool) – If True, plots the Gaussian using matplotlib. (Default is False)

Returns:

gaus (np.ndarray):

Gaussian evaluated at each value of x.

Return type:

(np.ndarray)

face_rhythm.helpers.torch_hilbert(x, N=None, dim=0)[source]

Computes the analytic signal of x via a Hilbert transform. RH 2022

Mirrors scipy.signal.hilbert but operates on torch.Tensor inputs.

Parameters:
  • x (torch.Tensor) – Real-valued signal of arbitrary rank.

  • N (Optional[int]) – Number of Fourier components. If None, uses x.shape[dim]. (Default is None)

  • dim (int) – Dimension along which to transform. (Default is 0)

Returns:

xa (torch.Tensor):

Complex analytic signal with the same shape as x.

Return type:

(torch.Tensor)

face_rhythm.helpers.make_VQT_filters(Fs_sample=1000, Q_lowF=3, Q_highF=20, F_min=10, F_max=400, n_freq_bins=55, win_size=501, symmetry='center', taper_asymmetric=True, plot_pref=False)[source]

Builds a bank of complex sinusoid filters for the VQT algorithm. RH 2022

Setting Q_lowF == Q_highF produces a Constant-Q Transform (CQT) filter set. Differing values vary the Q factor logarithmically across the frequency range.

Parameters:
  • Fs_sample (float) – Sampling frequency of the signal. (Default is 1000)

  • Q_lowF (float) – Q factor for the lowest frequency. (Default is 3)

  • Q_highF (float) – Q factor for the highest frequency. (Default is 20)

  • F_min (float) – Lowest frequency. (Default is 10)

  • F_max (float) – Highest frequency (inclusive). (Default is 400)

  • n_freq_bins (int) – Number of frequency bins. (Default is 55)

  • win_size (int) – Window size in samples. Must be odd. (Default is 501)

  • symmetry (str) –

    Window symmetry. One of

    • 'center': symmetric / two-sided window.

    • 'left': one-sided window, only the left half is nonzero.

    • 'right': one-sided window, only the right half is nonzero.

    (Default is 'center')

  • taper_asymmetric (bool) – If True and symmetry != 'center', the center sample of the window is multiplied by 0.5 to taper the discontinuity. (Default is True)

  • plot_pref (bool) – If True, plots the filters and windows. (Default is False)

Returns:

tuple containing:
filts_complex (torch.Tensor):

Complex sinusoid filters. shape: (n_freq_bins, win_size).

freqs (np.ndarray):

Filter center frequencies. shape: (n_freq_bins,).

wins (torch.Tensor):

Gaussian window for each filter. shape: (n_freq_bins, win_size).

Return type:

(tuple)

class face_rhythm.helpers.VQT(Fs_sample=1000, Q_lowF=3, Q_highF=20, F_min=10, F_max=400, n_freq_bins=55, win_size=501, symmetry='center', taper_asymmetric=True, downsample_factor=4, padding='valid', DEVICE_compute='cpu', DEVICE_return='cpu', batch_size=1000, return_complex=False, filters=None, plot_pref=False, progressBar=True)[source]

Bases: object

Variable Q Transform implemented with PyTorch. RH 2022

Differs from librosa / nnAudio: this implementation does not iterate lowpass filtering. Instead it convolves a fixed set of complex filters, optionally returns the envelope via Hilbert transform, and downsamples. Gradients propagate through the transform, and computation can run on GPU. Q is the quality factor, roughly the number of cycles inside four sigma (95%) of a Gaussian window.

Parameters:
  • Fs_sample (float) – Sampling frequency of the signal. (Default is 1000)

  • Q_lowF (float) – Q factor for the lowest frequency. (Default is 3)

  • Q_highF (float) – Q factor for the highest frequency. (Default is 20)

  • F_min (float) – Lowest frequency. (Default is 10)

  • F_max (float) – Highest frequency. (Default is 400)

  • n_freq_bins (int) – Number of frequency bins. (Default is 55)

  • win_size (int) – Window size in samples. Must be odd. (Default is 501)

  • symmetry (str) –

    Window symmetry passed through to make_VQT_filters. One of

    • 'center'

    • 'left'

    • 'right'

    (Default is 'center')

  • taper_asymmetric (bool) – If True and symmetry != 'center', the center sample of the window is multiplied by 0.5. (Default is True)

  • downsample_factor (int) – Time-downsampling factor. The input is zero-padded to be a multiple of this value. (Default is 4)

  • padding (str) – Convolution padding. 'same' pads to keep output length equal to input length; 'valid' does not pad. (Default is 'valid')

  • DEVICE_compute (str) – Device used for computation. (Default is 'cpu')

  • DEVICE_return (str) – Device on which results are returned. (Default is 'cpu')

  • batch_size (int) – Number of signals processed per batch. Reduce when out of memory. (Default is 1000)

  • return_complex (bool) – If True, returns the complex-valued transform; otherwise returns its absolute value (envelope). downsample_factor must be 1 when True. (Default is False)

  • filters (Optional[torch.Tensor]) – Pre-built complex sinusoid filters. shape: (n_freq_bins, win_size). If None, make_VQT_filters is called. (Default is None)

  • plot_pref (bool) – If True, plots the filters. (Default is False)

  • progressBar (bool) – If True, displays a tqdm progress bar during __call__. (Default is True)

filters

Complex sinusoid filters used for convolution.

Type:

torch.Tensor

freqs

Filter center frequencies (only when filters were generated internally).

Type:

np.ndarray

wins

Gaussian windows for each filter (only when filters were generated internally).

Type:

torch.Tensor

using_custom_filters

True if filters were supplied by the caller.

Type:

bool

face_rhythm.helpers.generate_multiphasic_sinewave(n_samples: int = 10000, n_periods: float = 1.0, n_waves: int = 3, return_x: bool = False, return_phases: bool = False)[source]

Generates n_waves cosine waves with evenly spaced phase offsets. RH 2024

Parameters:
  • n_samples (int) – Number of samples per wave. (Default is 10000)

  • n_periods (float) – Number of full periods spanned by n_samples. (Default is 1.0)

  • n_waves (int) – Number of phase-shifted sine waves to return. (Default is 3)

  • return_x (bool) – If True, also returns the x positions. (Default is False)

  • return_phases (bool) – If True, also returns the per-wave phase arrays. (Default is False)

Returns:

output (Union[np.ndarray, tuple]):

Combination depending on return_x / return_phases:

  • waves (np.ndarray): generated cosine waves.

  • x (np.ndarray): x positions, if return_x.

  • phases (np.ndarray): per-wave phases, if return_phases.

Return type:

(Union[np.ndarray, tuple])

face_rhythm.helpers.set_device(use_GPU: bool = True, device_num: int = 0, verbose: bool = True) str[source]

Sets the device for PyTorch. If a GPU is available and use_GPU is True, it will be set as the device. Otherwise, the CPU will be set as the device. RH 2022

Parameters:
  • use_GPU (bool) –

    Determines if the GPU should be utilized:

    • True: the function will attempt to use the GPU if a GPU is not available.

    • False: the function will use the CPU.

    (Default is True)

  • device_num (int) – Specifies the index of the GPU to use. (Default is 0)

  • verbose (bool) –

    Determines whether to print the device information.

    • True: the function will print out the device information.

    (Default is True)

Returns:

device (str):

A string specifying the device, either “cpu” or “cuda:<device_num>”.

Return type:

(str)

face_rhythm.helpers.tensorly_cp_to_device(cp, device='cpu')[source]

Moves the factors and weights of a tensorly CP object to device. RH 2024

Parameters:
  • cp (object) – Tensorly CP tensor (tensorly.cp_tensor.CP).

  • device (str) – Target device for cp.factors and cp.weights. (Default is 'cpu')

Returns:

cp (object):

Same CP object with all factors and weights on device.

Return type:

(object)

face_rhythm.helpers.simple_cmap(colors=[[1, 0, 0], [1, 0.6, 0], [0.9, 0.9, 0], [0.6, 1, 0], [0, 1, 0], [0, 1, 0.6], [0, 0.8, 0.8], [0, 0.6, 1], [0, 0, 1], [0.6, 0, 1], [0.8, 0, 0.8], [1, 0, 0.6]], under=[0, 0, 0], over=[0.5, 0.5, 0.5], bad=[0.9, 0.9, 0.9], name='none')[source]

Builds a LinearSegmentedColormap from a sequence of RGB values.

Adapted from https://gist.github.com/ahwillia/3e022cdd1fe82627cbf1f2e9e2ad80a7e.

Parameters:
  • colors (list) – Sequence of RGB triples (or matplotlib color strings) defining the colormap stops.

  • under (list) – RGB color used for values below the colormap range. (Default is [0, 0, 0])

  • over (list) – RGB color used for values above the colormap range. (Default is [0.5, 0.5, 0.5])

  • bad (list) – RGB color used for masked / NaN values. (Default is [0.9, 0.9, 0.9])

  • name (str) – Colormap name. (Default is 'none')

Returns:

cmap (matplotlib.colors.LinearSegmentedColormap):

Resulting linear-segmented colormap.

Return type:

(matplotlib.colors.LinearSegmentedColormap)

Example

cmap = simple_cmap([(1, 1, 1), (1, 0, 0)])  # white to red
cmap = simple_cmap(['w', 'r'])               # white to red
cmap = simple_cmap(['r', 'b', 'r'])          # red to blue to red
class face_rhythm.helpers.Cmap_conjunctive(cmaps, dtype_out=<class 'int'>, normalize=False, normalization_range=[0, 255], name='cmap_conjunctive')[source]

Bases: object

Combines multiple colormaps by multiplying their per-channel outputs. RH 2022

Parameters:
  • cmaps (list) – List of matplotlib.colors.LinearSegmentedColormap objects to combine.

  • dtype_out (np.dtype) – Data type of the returned color array. (Default is int)

  • normalize (bool) – If True, normalizes each input column to [0, 1] before applying the colormaps. (Default is False)

  • normalization_range (list) – [lo, hi] to which the output is rescaled. (Default is [0, 255])

  • name (str) – Name of the resulting colormap. (Default is 'cmap_conjunctive')

cmaps

Stored input colormaps.

Type:

list

n_cmaps

Number of input colormaps.

Type:

int

fn_conj_cmap

Function that maps an input array of shape (n_samples, n_cmaps) to the elementwise product of each colormap’s output.

Type:

Callable

class face_rhythm.helpers.Colorwheel(rotation: float = 0.0, saturation: float = 1.0, center: int = 0, radius: int = 255, dtype: numpy.dtype = numpy.uint8, bit_depth: int = 16, exponent: float = 10, normalize: bool = True, colors: List[List | Tuple] = [[1, 0, 0], [1, 0.5, 0], [1, 1, 0], [0.5, 1, 0], [0, 1, 0], [0, 1, 0.5], [0, 1, 1], [0, 0.5, 1], [0, 0, 1], [0.5, 0, 1], [1, 0, 1], [1, 0, 0.5]])[source]

Bases: object

2D colorwheel colormap (angle + magnitude) for cyclic data. RH 2024

Useful for visualizing complex/polar values, optical flow, and other cyclic data.

Parameters:
  • rotation (float) – Rotation of the colorwheel in radians. (Default is 0.0)

  • saturation (float) – Color saturation in [0, 1]. (Default is 1.0)

  • center (int) – Color value at the center of the wheel. (Default is 0)

  • radius (int) – Maximum color value at the rim of the wheel. (Default is 255)

  • dtype (np.dtype) – Output dtype of the color array. (Default is np.uint8)

  • bit_depth (int) – Number of samples used to discretize the wheel: 2 ** bit_depth. (Default is 16)

  • exponent (float) – Exponent applied to each base color wave to sharpen transitions. (Default is 10)

  • normalize (bool) – If True, normalizes the per-angle color sum to 1 so that color intensity is uniform around the wheel. (Default is True)

  • colors (List[Union[List, Tuple]]) – Sequence of base RGB triples used to build the rainbow. (Default is a 12-color rainbow)

fn_interp

Interpolator that maps an angle (radians) to per-base-color weights along the wheel.

Type:

Callable

colors

Array of base colors. shape: (n_colors, 3).

Type:

np.ndarray

plot_colorwheel(n_samples: int = 100000)[source]

Renders the colorwheel as a 2D image and displays it. RH 2024

Parameters:

n_samples (int) – Approximate number of samples used to build the wheel; the actual grid is ceil(sqrt(n_samples)) per side. (Default is 100000)

face_rhythm.helpers.clahe(im, grid_size=50, clipLimit=0, normalize=True)[source]

Applies Contrast Limited Adaptive Histogram Equalization to an image. RH 2022

Parameters:
  • im (np.ndarray) – Input image.

  • grid_size (int) – Tile grid size passed to cv2.createCLAHE. (Default is 50)

  • clipLimit (int) – Contrast clip limit passed to cv2.createCLAHE. (Default is 0)

  • normalize (bool) – If True, normalizes the input to span the full 16-bit range before applying CLAHE. (Default is True)

Returns:

im_c (np.ndarray):

CLAHE-enhanced image. dtype: uint16.

Return type:

(np.ndarray)

face_rhythm.helpers.add_text_to_images(images, text, position=(10, 10), font_size=1, color=(255, 255, 255), line_width=1, font=None, show=False, frameRate=30)[source]

Overlays multi-line text onto each frame using cv2.putText. RH 2022

Parameters:
  • images (np.ndarray) – Frames of video or images. shape: (n_frames, H, W, C).

  • text (List[List[str]]) – Text per frame. Outer list has one element per frame; each inner list holds the lines of text drawn on that frame.

  • position (Tuple[int, int]) – (x, y) position of the top-left corner of the text. (Default is (10, 10))

  • font_size (int) – Font scale passed to cv2.putText. (Default is 1)

  • color (Tuple[int, int, int]) – (R, G, B) text color. (Default is (255, 255, 255))

  • line_width (int) – Line thickness passed to cv2.putText. (Default is 1)

  • font (Optional[int]) – OpenCV font constant. If None, uses cv2.FONT_HERSHEY_SIMPLEX. (Default is None)

  • show (bool) – If True, displays each annotated frame using cv2.imshow. (Default is False)

  • frameRate (float) – Display frame rate when show is True. (Default is 30)

Returns:

images_with_text (np.ndarray):

Frames of video or images with text overlays applied.

Return type:

(np.ndarray)

face_rhythm.helpers.mask_image_border(im: numpy.ndarray, border_outer: int | Tuple[int, int, int, int] | None = None, border_inner: int | None = None, mask_value: float = 0) numpy.ndarray[source]

Masks an image within specified outer and inner borders. RH 2022

Parameters:
  • im (np.ndarray) – Input image of shape: (height, width) or (height, width, channels).

  • border_outer (Union[int, tuple[int, int, int, int], None]) – Number of pixels along the border to mask. If None, the border is not masked. If an int is provided, all borders are equally masked. If a tuple of ints is provided, borders are masked in the order: (top, bottom, left, right). (Default is None)

  • border_inner (int, Optional) – Number of pixels in the center to mask. Will be a square with side length equal to this value. (Default is None)

  • mask_value (float) – Value to replace the masked pixels with. (Default is 0)

Returns:

im_out (np.ndarray):

Masked output image.

Return type:

(np.ndarray)

face_rhythm.helpers.find_geometric_transformation(im_template: numpy.ndarray, im_moving: numpy.ndarray, warp_mode: str = 'euclidean', n_iter: int = 5000, termination_eps: float = 1e-10, mask: numpy.ndarray | None = None, gaussFiltSize: int = 1) numpy.ndarray[source]

Estimates the geometric transformation between two images via ECC. RH 2022

Wraps cv2.findTransformECC.

Parameters:
  • im_template (np.ndarray) – Template image. dtype: uint8 or float32.

  • im_moving (np.ndarray) – Moving image. dtype: uint8 or float32.

  • warp_mode (str) –

    Motion model. One of

    • 'translation': 2x3 warpMatrix; only translation is estimated.

    • 'euclidean': 2x3 warpMatrix; rigid (rotation + translation).

    • 'affine': 2x3 warpMatrix; six parameters.

    • 'homography': 3x3 warpMatrix; eight parameters.

    (Default is 'euclidean')

  • n_iter (int) – Maximum number of iterations. (Default is 5000)

  • termination_eps (float) – Threshold on the per-iteration increment of the correlation coefficient. (Default is 1e-10)

  • mask (Optional[np.ndarray]) – Binary mask. Pixels where mask is zero are ignored. If None, no mask is used. (Default is None)

  • gaussFiltSize (int) – Gaussian filter size. 0 disables filtering. (Default is 1)

Returns:

warp_matrix (np.ndarray):

Estimated warp matrix. Apply with cv2.warpAffine or cv2.warpPerspective.

Return type:

(np.ndarray)

face_rhythm.helpers.apply_warp_transform(im_in: numpy.ndarray, warp_matrix: numpy.ndarray, interpolation_method: int = <MagicMock id='136729743356272'>, borderMode: int = <MagicMock id='136729743491200'>, borderValue: int = 0) numpy.ndarray[source]

Applies a warp transform to an image. RH 2022

Wraps cv2.warpAffine (for 2x3 matrices) and cv2.warpPerspective (for 3x3 matrices).

Parameters:
  • im_in (np.ndarray) – Input image with any dimensions.

  • warp_matrix (np.ndarray) – Warp matrix. Shape should be (2, 3) for affine transformations, and (3, 3) for homography. See cv2.findTransformECC for more info.

  • interpolation_method (int) – Interpolation method. See cv2.warpAffine for more info. (Default is cv2.INTER_LINEAR)

  • borderMode (int) – Border mode. Determines how to handle pixels from outside the image boundaries. See cv2.warpAffine for more info. (Default is cv2.BORDER_CONSTANT)

  • borderValue (int) – Value to use for border pixels if borderMode is set to cv2.BORDER_CONSTANT. (Default is 0)

Returns:

im_out (np.ndarray):

Transformed output image with the same dimensions as the input image.

Return type:

(np.ndarray)

face_rhythm.helpers.warp_matrix_to_remappingIdx(warp_matrix: numpy.ndarray | torch.Tensor, x: int, y: int) numpy.ndarray | torch.Tensor[source]

Converts a warp matrix (2x3 or 3x3) into a 2D remapping index field. RH 2023

Parameters:
  • warp_matrix (Union[np.ndarray, torch.Tensor]) – Warp matrix of shape (2, 3) for affine transformations, or (3, 3) for homography.

  • x (int) – Width of the output remapping field.

  • y (int) – Height of the output remapping field.

Returns:

remapIdx (Union[np.ndarray, torch.Tensor]):

Remapping indices. shape: (y, x, 2). The last axis stores the pixel coordinate (x, y) to sample from.

Return type:

(Union[np.ndarray, torch.Tensor])

face_rhythm.helpers.remap_images(images: numpy.ndarray | torch.Tensor, remappingIdx: numpy.ndarray | torch.Tensor, backend: str = 'torch', interpolation_method: str = 'linear', border_mode: str = 'constant', border_value: float = 0, device: str = 'cpu') numpy.ndarray | torch.Tensor[source]

Applies remapping indices to a set of images. Remapping indices, similar to flow fields, describe the index of the pixel to sample from rather than the displacement of each pixel. RH 2023

Parameters:
  • images (Union[np.ndarray, torch.Tensor]) – The images to be warped. Shapes can be (N, C, H, W), (C, H, W), or (H, W).

  • remappingIdx (Union[np.ndarray, torch.Tensor]) – The remapping indices, describing the index of the pixel to sample from. Shape is (H, W, 2).

  • backend (str) – The backend to use. Can be either 'torch' or 'cv2'. (Default is 'torch')

  • interpolation_method (str) – The interpolation method to use. Options are 'linear', 'nearest', 'cubic', and 'lanczos'. Refer to cv2.remap or torch.nn.functional.grid_sample for more details. (Default is 'linear')

  • border_mode (str) – The border mode to use. Options include 'constant', 'reflect', 'replicate', and 'wrap'. Refer to cv2.remap for more details. (Default is 'constant')

  • border_value (float) – The border value to use. Refer to cv2.remap for more details. (Default is 0)

  • device (str) – The device to use for computations. Commonly either 'cpu' or 'gpu'. (Default is 'cpu')

Returns:

warped_images (Union[np.ndarray, torch.Tensor]):

The warped images. The shape will be the same as the input images, which can be (N, C, H, W), (C, H, W), or (H, W).

Return type:

(Union[np.ndarray, torch.Tensor])

face_rhythm.helpers.invert_remappingIdx(remappingIdx: numpy.ndarray, method: str = 'linear', fill_value: float | None = numpy.nan) numpy.ndarray[source]

Inverts a remapping index field.

Requires the assumption that the remapping index field is invertible or bijective/one-to-one and non-occluding. Defined ‘remap_AB’ as a remapping index field that warps image A onto image B, then ‘remap_BA’ is the remapping index field that warps image B onto image A. This function computes ‘remap_BA’ given ‘remap_AB’.

RH 2023

Parameters:
  • remappingIdx (np.ndarray) – An array of shape (H, W, 2) representing the remap field.

  • method (str) –

    Interpolation method to use. See scipy.interpolate.griddata. Options are:

    • 'linear'

    • 'nearest'

    • 'cubic'

    (Default is 'linear')

  • fill_value (Optional[float]) – Value used to fill points outside the convex hull. (Default is np.nan)

Returns:

An array of shape (H, W, 2) representing the inverse remap field.

Return type:

(np.ndarray)

face_rhythm.helpers.invert_warp_matrix(warp_matrix: numpy.ndarray) numpy.ndarray[source]

Inverts a provided warp matrix for the transformation A->B to compute the warp matrix for B->A. RH 2023

Parameters:

warp_matrix (np.ndarray) – A 2x3 or 3x3 array representing the warp matrix. Shape: (2, 3) or (3, 3).

Returns:

inverted_warp_matrix (np.ndarray):

The inverted warp matrix. Shape: same as input.

Return type:

(np.ndarray)

face_rhythm.helpers.compose_remappingIdx(remap_AB: numpy.ndarray, remap_BC: numpy.ndarray, method: str = 'linear', fill_value: float | None = numpy.nan, bounds_error: bool = False) numpy.ndarray[source]

Composes two remapping index fields using scipy.interpolate.interpn.

This function computes ‘remap_AC’ from ‘remap_AB’ and ‘remap_BC’, where ‘remap_AB’ is a remapping index field that warps image A onto image B, and ‘remap_BC’ is a remapping index field that warps image B onto image C.

RH 2023

Parameters:
  • remap_AB (np.ndarray) – An array of shape (H, W, 2) representing the remap field from image A to image B.

  • remap_BC (np.ndarray) – An array of shape (H, W, 2) representing the remap field from image B to image C.

  • method (str) –

    Interpolation method to use. Either

    • 'linear': Use linear interpolation (default).

    • 'nearest': Use nearest interpolation.

    • 'cubic': Use cubic interpolation.

  • fill_value (Optional[float]) – The value used for points outside the interpolation domain. (Default is np.nan)

  • bounds_error (bool) – If True, a ValueError is raised when interpolated values are requested outside of the domain of the input data. (Default is False)

Returns:

remap_AC (np.ndarray):

An array of shape (H, W, 2) representing the remap field from image A to image C.

Return type:

(np.ndarray)

face_rhythm.helpers.compose_transform_matrices(matrix_AB: numpy.ndarray, matrix_BC: numpy.ndarray) numpy.ndarray[source]

Composes two transformation matrices to create a transformation from one image to another. RH 2023

This function is used to combine two transformation matrices, ‘matrix_AB’ and ‘matrix_BC’. ‘matrix_AB’ represents a transformation that warps an image A onto an image B. ‘matrix_BC’ represents a transformation that warps image B onto image C. The result is ‘matrix_AC’, a transformation matrix that would warp image A directly onto image C.

Parameters:
  • matrix_AB (np.ndarray) – A transformation matrix from image A to image B. The array can have the shape (2, 3) or (3, 3).

  • matrix_BC (np.ndarray) – A transformation matrix from image B to image C. The array can have the shape (2, 3) or (3, 3).

Returns:

matrix_AC (np.ndarray):

A composed transformation matrix from image A to image C. The array has the shape (2, 3) or (3, 3).

Return type:

(np.ndarray)

Raises:

AssertionError – If the input matrices do not have the shape (2, 3) or (3, 3).

Example

# Define the transformation matrices
matrix_AB = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
matrix_BC = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])

# Compose the transformation matrices
matrix_AC = compose_transform_matrices(matrix_AB, matrix_BC)
face_rhythm.helpers.flowField_to_remappingIdx(ff: numpy.ndarray | object) numpy.ndarray | object[source]

Converts a flow field into a remapping index by adding the pixel grid. RH 2023

WARNING: Strictly speaking, a flow field (displacement) and a remapping index (interpolation mapping) are different concepts; this helper performs the obvious sum and is correct under the standard convention.

Parameters:

ff (Union[np.ndarray, torch.Tensor]) – Flow field describing the displacement of each pixel. shape: (H, W, 2). Last dimension is (x, y).

Returns:

ri (Union[np.ndarray, torch.Tensor]):

Remapping index of source pixel coordinates. shape: (H, W, 2).

Return type:

(Union[np.ndarray, torch.Tensor])

face_rhythm.helpers.remappingIdx_to_flowField(ri: numpy.ndarray | object) numpy.ndarray | object[source]

Converts a remapping index into a flow field by subtracting the pixel grid. RH 2023

WARNING: Strictly speaking, a remapping index (interpolation mapping) and a flow field (displacement) are different concepts; this helper performs the obvious subtraction.

Parameters:

ri (Union[np.ndarray, torch.Tensor]) – Remapping index. shape: (H, W, 2). Last dimension is (x, y).

Returns:

ff (Union[np.ndarray, torch.Tensor]):

Flow field. shape: (H, W, 2).

Return type:

(Union[np.ndarray, torch.Tensor])

face_rhythm.helpers.cv2RemappingIdx_to_pytorchFlowField(ri: numpy.ndarray | torch.Tensor) numpy.ndarray | torch.Tensor[source]

Converts remapping indices from the OpenCV format to the PyTorch format. In the OpenCV format, the displacement is in pixels relative to the top left pixel of the image. In the PyTorch format, the displacement is in pixels relative to the center of the image. RH 2023

Parameters:

ri (Union[np.ndarray, torch.Tensor]) – Remapping indices. Each pixel describes the index of the pixel in the original image that should be mapped to the new pixel. Shape: (H, W, 2). The last dimension is (x, y).

Returns:

normgrid (Union[np.ndarray, torch.Tensor]):

”Flow field”, in the PyTorch format. Technically not a flow field, since it doesn’t describe displacement. Rather, it is a remapping index relative to the center of the image. Shape: (H, W, 2). The last dimension is (x, y).

Return type:

(Union[np.ndarray, torch.Tensor])

face_rhythm.helpers.remap_points(points: numpy.ndarray, remappingIdx: numpy.ndarray, interpolation: str = 'linear', fill_value: float = None) numpy.ndarray[source]

Remaps a set of 2D points through an index map produced for image warping.

Parameters:
  • points (np.ndarray) – Array of points to be remapped. shape: (n_points, 2), dtype: floating. Each row is an (x, y) coordinate within the image.

  • remappingIdx (np.ndarray) – Index map describing the warp. shape: (height, width, 2), dtype: floating.

  • interpolation (str) –

    Interpolation method passed to scipy.interpolate.RegularGridInterpolator. One of

    • 'linear'

    • 'nearest'

    • 'slinear'

    • 'cubic'

    • 'quintic'

    • 'pchip'

    (Default is 'linear')

  • fill_value (Optional[float]) – Value used to fill points outside the convex hull. If None, values outside the convex hull are extrapolated. (Default is None)

Returns:

points_remap (np.ndarray):

Remapped points. shape: (n_points, 2).

Return type:

(np.ndarray)

class face_rhythm.helpers.NVIDIA_Device_Checker(device_index=None, verbose=1)[source]

Bases: _Device_Checker_Base

Resource utilization checker for an NVIDIA GPU.

Requires the nvidia-ml-py3 package.

Parameters:
  • device_index (Optional[int]) – Index of the GPU to monitor. If None and only one device is present, that device is used; otherwise an error is raised. (Default is None)

  • verbose (int) – Verbosity level passed to the base class. (Default is 1)

info_static

Static device info (name, index, total memory, power limit).

Type:

Dict[str, Any]

handle

nvidia_smi device handle for the monitored GPU.

Type:

object

get_device_handles()[source]

Returns one nvmlDeviceGetHandleByIndex handle per GPU detected by NVML.

check_utilization()[source]

Returns a snapshot of the current GPU utilization metrics.

Returns:

info_changing (Dict[str, Any]):

Includes time, memory_free, memory_used, memory_used_percentage, power_used, power_used_percentage, processor_used_percentage, temperature, and fan_speed.

Return type:

(Dict[str, Any])

class face_rhythm.helpers.CPU_Device_Checker(verbose=1)[source]

Bases: _Device_Checker_Base

Resource utilization checker for the host CPU and disk.

Parameters:

verbose (int) – Verbosity level passed to the base class. (Default is 1)

info_static

Static info (CPU count, frequency, total RAM, total disk).

Type:

Dict[str, Any]

check_utilization()[source]

Returns a snapshot of CPU, memory, network, and disk utilization.

Returns:

info_changing (Dict[str, Any]):

Per-snapshot metrics including memory, network I/O, disk free/used, disk read/write throughput, and overall + per-core CPU usage percentages.

Return type:

(Dict[str, Any])

class face_rhythm.helpers.Equivalence_checker(kwargs_allclose: dict | None = {'equal_nan': True, 'rtol': 1e-07}, assert_mode=False, verbose=False)[source]

Bases: object

Class for checking if all items are equivalent or allclose (almost equal) in two complex data structures. Can check nested lists, dicts, and other data structures. Can also optionally assert (raise errors) if all items are not equivalent. RH 2023

_kwargs_allclose

Keyword arguments for the numpy.allclose function.

Type:

Optional[dict]

_assert_mode

Whether to raise an assertion error if items are not close.

Type:

bool

Parameters:
  • kwargs_allclose (Optional[dict]) – Keyword arguments for the numpy.allclose function. (Default is {'rtol': 1e-7, 'equal_nan': True})

  • assert_mode (bool) – Whether to raise an assertion error if items are not close.

  • verbose (bool) –

    How much information to print out:
    • False / 0: No information printed out.

    • True / 1: Mismatched items only.

    • 2: All items printed out.

face_rhythm.helpers.order_cp_factors_by_EVR(tensor_dense: numpy.ndarray | torch.Tensor, cp_factors: list | object, cp_weights: numpy.ndarray | torch.Tensor | None = None, orthogonalizable_EVR: bool = True) Tuple[numpy.ndarray, numpy.ndarray][source]

Sorts CP factors by descending explained variance ratio. RH 2024

Parameters:
  • tensor_dense (Union[np.ndarray, torch.Tensor]) – Dense tensor to be reconstructed.

  • cp_factors (Union[list, object]) – CP factors. Either a list of 2D factor matrices of shape (n_samples, rank) or a tensorly.CPTensor object.

  • cp_weights (Optional[Union[np.ndarray, torch.Tensor]]) – Per-rank weights of length (rank,). (Default is None)

  • orthogonalizable_EVR (bool) – If True, optimizes each factor’s scaling to maximize EVR by OLS-orthogonalizing the dense tensor against each factor. (Default is True)

Returns:

tuple containing:
order (np.ndarray):

Indices that sort the factors by descending EVR.

evrs (np.ndarray):

Sorted explained variance ratios.

Return type:

(tuple)

face_rhythm.helpers.cp_reconstruction_EVR(tensor_dense, tensor_CP)[source]

Explained variance ratio of a CP-reconstructed tensor. RH 2023

Parameters:
  • tensor_dense (Union[np.ndarray, torch.Tensor]) – Dense reference tensor. shape: (n_samples, n_features).

  • tensor_CP (Union[list, object]) – CP tensor. Either a list of 2D factor matrices of shape (n_samples, rank) or a tensorly.CPTensor object.

Returns:

ev (Union[float, torch.Tensor]):

Explained variance ratio 1 - var(tensor_dense - tensor_rec) / var(tensor_dense).

Return type:

(Union[float, torch.Tensor])

face_rhythm.helpers.rolling_mean(tensor: torch.Tensor, dim: int) torch.Tensor[source]

Computes the running mean along dim using Welford’s update. RH 2025

Parameters:
  • tensor (torch.Tensor) – Input tensor.

  • dim (int) – Dimension along which the running mean is accumulated.

Returns:

mean (torch.Tensor):

Final mean across dim (last accumulated value).

Return type:

(torch.Tensor)

face_rhythm.helpers.play_video_cv2(array=None, path_video=None, frameRate=30, path_save=None, show=True, fourcc_code='MJPG', text=None, kwargs_text={})[source]

Plays or saves a video using OpenCV. RH 2021/2024

Parameters:
  • array (Optional[np.ndarray]) – 3D (frames, H, W) or 4D (frames, H, W, channels) uint8 array. Values are clipped to [0, 255]. If None, path_video must be supplied and decord is used to read it. (Default is None)

  • path_video (Optional[Union[str, pathlib.Path]]) – Path to a video file. Used only when array is None. (Default is None)

  • frameRate (float) – Playback / output frame rate in Hz. (Default is 30)

  • path_save (Optional[Union[str, pathlib.Path]]) – Destination path for the saved video. None disables saving. (Default is None)

  • show (bool) – If True, displays the video in a cv2 window. (Default is True)

  • fourcc_code (str) – FourCC codec string passed to cv2.VideoWriter_fourcc. (Default is 'MJPG')

  • text (Optional[Union[str, List[str]]]) – Text overlay. If a list, element i is drawn on frame i. (Default is None)

  • kwargs_text (dict) – Keyword arguments forwarded to cv2.putText. (Default is {})

face_rhythm.helpers.make_tiled_video_array(videos: List[numpy.ndarray], shape: Tuple[int, int] | None = None, verbose: bool = True)[source]

Tiles a list of videos into a single grid video array. RH 2021/2024

Videos are placed top-to-bottom and then left-to-right.

Parameters:
  • videos (List[np.ndarray]) – List of video arrays with shape (frames, H, W, channels) or (frames, H, W). All videos must share the same dtype.

  • shape (Optional[Tuple[int, int]]) – Grid layout (n_rows, n_cols). If None, uses the smallest square grid that fits all videos. (Default is None)

  • verbose (bool) – If True, prints progress messages. (Default is True)

Returns:

video_array (np.ndarray):

Tiled video. shape: (max_frames, total_H, total_W, channels).

Return type:

(np.ndarray)

face_rhythm.pipelines module

Example end-to-end pipelines for running the face_rhythm package.

Each pipeline accepts a params dictionary that must contain all fields required by the steps it executes.

face_rhythm.pipelines.pipeline_basic(params)[source]

Runs the basic face_rhythm pipeline, mirroring notebooks/interactive_pipeline_basic.ipynb. RH 2023

The ROIs must be defined ahead of time and saved as an ROIs.h5 file referenced by params['ROIs']['initialize']['path_file']. Steps executed (gated by params['steps']):

  • 'load_videos': Load video data via BufferedVideoReader.

  • 'ROIs': Load ROIs and seed point positions.

  • 'point_tracking': Track points across frames.

  • 'VQT': Compute variable-Q spectrograms.

  • 'TCA': Perform tensor component analysis.

Each step also persists its outputs to the project’s analysis_files directory.

Parameters:

params (dict) – Dictionary of parameters controlling every pipeline step. See scripts/params_pipeline_basic.json for a complete example. Top-level keys consumed include 'project', 'paths_videos', 'figure_saver', 'steps', 'BufferedVideoReader', 'Dataset_videos', 'ROIs', 'PointTracker', 'VQT_Analyzer', and 'TCA'.

Returns:

results (dict):

Dictionary with keys:

  • 'path_config' (str): Path to the saved config file.

  • 'path_run_info' (str): Path to the saved run_info file.

  • 'directory_project' (str): Path to the project directory.

  • 'SEED' (int): Random seed used for the run.

  • 'params' (dict): The parameter dictionary that was used.

Return type:

(dict)

face_rhythm.point_tracking module

Point tracking via Lucas-Kanade optical flow, with mesh-relaxation and outlier handling.

PointTracker advects a set of (x, y) seed points through a BufferedVideoReader using either CPU or CUDA OpenCV LK optical flow. Mesh distances to k-nearest neighbors are regularized toward their initial values, and frames with any point displaced beyond a threshold halt and replay the surrounding region to suppress outlier streaks.

class face_rhythm.point_tracking.PointTracker(buffered_video_reader: BufferedVideoReader, point_positions: numpy.ndarray, rois_masks: ROIs = None, contiguous: bool = False, params_optical_flow: dict = {'kwargs_method': {'criteria': (3, 10, 0.03), 'maxLevel': 2, 'winSize': (15, 15)}, 'mesh_rigidity': 0.005, 'method': 'lucas_kanade', 'relaxation': 0.5}, params_clahe: dict = {'clipLimit': 40.0, 'tileGridSize': (150, 150)}, params_outlier_handling: dict = {'framesHalted_after': 30, 'framesHalted_before': 30, 'threshold_displacement': 25}, frames_freeze: numpy.ndarray | None = None, relaxation_during_freeze_frames: bool = True, idx_start: int | list | numpy.ndarray = 0, visualize_video: bool = False, params_visualization: dict = {'alpha': 1.0, 'point_sizes': 1}, verbose: bool | int = 1)[source]

Bases: FR_Module

Tracks a set of seed points across video frames with Lucas-Kanade optical flow, mesh-rigidity regularization, and outlier handling.

Wraps OpenCV LK (CPU or CUDA when available) and applies a k-nearest- neighbor mesh constraint plus a relaxation force toward the original point positions. Frames where any point is displaced beyond a threshold trigger a rewind-and-replay of the surrounding window so violating points are frozen. Optional frames_freeze masks proactively zero the optical flow delta on chosen frames.

Parameters:
  • buffered_video_reader (BufferedVideoReader) – BufferedVideoReader object containing the videos to track. Created by fr.helpers.BufferedVideoReader.

  • point_positions (np.ndarray) – Initial seed points to track. Each row is one point; columns are (x, y). Typically produced by fr.rois.ROIs via the ROIs.point_positions attribute. shape: (n_points, 2), dtype: float.

  • rois_masks (Union[np.ndarray, List[np.ndarray], ROIs]) – ROI mask(s) used to zero-out non-ROI pixels before tracking. A single 2D bool array (shape: (H, W)) or a list of such arrays. When a list is provided, the masks are intersected into a single combined mask. (Default is None)

  • contiguous (bool) – If True, all videos are treated as one continuous stream (the first frame of each video continues from the previous video). If False, point tracking restarts for each video. (Default is False)

  • params_optical_flow (dict) –

    Parameters for optical flow. Missing keys fall back to defaults. Supported keys:

    • 'method': Optical flow method. Only 'lucas_kanade' is supported.

    • 'mesh_rigidity': Strength of the mesh-distance restoring force. Depends on point spacing.

    • 'mesh_n_neighbors': Number of nearest neighbors used for the mesh constraint.

    • 'relaxation': Per-frame fraction by which points relax back toward their original positions.

    • 'kwargs_method': Extra kwargs forwarded to cv2.calcOpticalFlowPyrLK (winSize, maxLevel, criteria).

    See the OpenCV LK optical flow docs for parameter meanings. (Default is the dict shown in the signature)

  • params_clahe (dict) – Keyword arguments forwarded to cv2.createCLAHE (or cv2.cuda.createCLAHE). If None, CLAHE is not applied. (Default is {'clipLimit': 40.0, 'tileGridSize': (150, 150)})

  • params_outlier_handling (dict) –

    Parameters for outlier (violation) handling. A violation is a frame in which a point exceeds the displacement threshold from its original position; on a violation the affected point has its velocity frozen for a window around the event. Supported keys:

    • 'threshold_displacement': Maximum allowed displacement from the original position, in pixels.

    • 'framesHalted_before': Number of frames to halt before a violation.

    • 'framesHalted_after': Number of frames to halt after a violation.

    (Default is the dict shown in the signature)

  • frames_freeze (Optional[np.ndarray]) – 1D bool array marking frames whose optical flow delta should be proactively zeroed. True = freeze the OF delta for that frame. Length should equal the total number of frames across all videos in contiguous mode, or per-video length in non-contiguous mode. (Default is None)

  • relaxation_during_freeze_frames (bool) – Controls behavior on proactively frozen frames. If True, the OF delta is zeroed but mesh rigidity and relaxation forces still apply, so the mesh can maintain shape and relax toward home positions. If False, points are fully frozen and their positions are copied from the previous frame with no forces applied. (Default is True)

  • idx_start (Union[int, list, np.ndarray]) – Index of the first frame to track. If an int, it is used for all videos (or for the contiguous index when contiguous=True). If a list/array and contiguous=False, each entry is the start index for the corresponding video. (Default is 0)

  • visualize_video (bool) – If True, displays the tracked frames via cv2.imshow. Set to False on headless systems. (Default is False)

  • params_visualization (dict) – Parameters forwarded to fr.visualization.FrameVisualizer. Do not include 'points_colors' since it is reserved for outlier coloring. (Default is {'alpha': 1.0, 'point_sizes': 1})

  • verbose (Union[bool, int]) –

    Verbosity level.

    • 0: silent.

    • 1: warnings only.

    • 2: all info.

    (Default is 1)

point_positions

Initial seed point positions. shape: (n_points, 2).

Type:

np.ndarray

num_points

Number of points being tracked.

Type:

int

mask

Combined ROI mask. shape: (H, W), dtype: bool.

Type:

torch.Tensor

neighbors

Indices of the k-nearest neighbors of each point. shape: (n_points, mesh_n_neighbors), dtype: int64.

Type:

torch.Tensor

d_0

Initial mean neighbor-distance vectors per point. dtype: float32.

Type:

torch.Tensor

idx_start

Resolved per-video (or contiguous) starting frame indices.

Type:

Union[int, List[int]]

videos

List of per-video iterables built from buffered_video_reader.

Type:

list

params_optical_flow

Optical flow parameters with defaults filled in.

Type:

dict

params_outlier_handling

Outlier-handling parameters with defaults filled in.

Type:

dict

params_visualization

Visualization parameters with defaults filled in.

Type:

dict

points_tracked

Tracked point arrays. Populated by track_points; first stored as a list of arrays per video, then re-keyed as a dict {str(video_idx): np.ndarray}.

Type:

Union[list, dict]

violations

Per-video sparse COO matrices of violation flags (populated by track_points).

Type:

list

violations_sparseCOO

Per-video violation flags packed as {'row', 'col', 'data', 'shape'} dicts (populated by track_points).

Type:

dict

violation_fraction

Per-video fraction of frames that contain at least one violation.

Type:

List[float]

config

Snapshot of the configuration used to construct the tracker.

Type:

dict

run_info

Run summary populated by track_points.

Type:

dict

run_data

Run data dictionary used by FR_Module save/load.

Type:

dict

cleanup()[source]

Deletes all instance attributes and runs garbage collection to free the large arrays held by the tracker.

track_points()[source]

Runs the full point tracking workflow across all videos.

Tracks the seed points through every video using the configured optical flow, mesh, outlier handling, and freeze parameters. Populates self.points_tracked, self.violations, self.violations_sparseCOO, self.violation_fraction, and the run_info/run_data dictionaries used by FR_Module.

face_rhythm.project module

Project bootstrap: creates the on-disk project layout and config files.

prepare_project builds <dir>/config.yaml, <dir>/run_info.json and the analysis_files / visualizations subfolders used by the rest of the pipeline.

face_rhythm.project.prepare_project(directory_project='./', overwrite_config=False, update_project_paths=False, mkdir=True, initialize_visualization=True, verbose=1)[source]

Prepares the project folder and creates config.yaml and run_info.json (if they do not already exist or an overwrite is requested).

Parameters:
  • directory_project (str) – Path to the project directory. If './' is passed, the current working directory is used. (Default is './')

  • overwrite_config (bool) – Whether to overwrite the entire config.yaml file with a brand-new config. If False, update_project_paths can still be set to True. (Default is False)

  • update_project_paths (bool) –

    If True, updates the following entries within the existing config.yaml to reflect the current directory_project:

    • paths > project: directory_project/

    • paths > config: directory_project/config.yaml

    • paths > run_info: directory_project/run_info.json

    If overwrite_config is True, this argument is ignored. (Default is False)

  • mkdir (bool) – Whether to create the project directory if it does not exist. (Default is True)

  • initialize_visualization (bool) – Whether to initialize cv2.imshow visualization. On a headless server this should be set to False. (Default is True)

  • verbose (int) –

    Verbosity level. One of

    • 0: No output.

    • 1: Warnings.

    • 2: Info.

    (Default is 1)

Returns:

tuple containing:
path_config (str):

Path to the config.yaml file.

path_run_info (str):

Path to the run_info.json file.

directory_project (str):

Path to the project directory.

Return type:

(tuple)

face_rhythm.rois module

ROI selection, point-grid generation, and image warping / registration.

Provides the ROIs class (choose face regions via GUI, file, or explicit dict), the ImageAlignmentChecker and registration helpers used by the alignment module, and the interactive _Select_ROI Plotly/ipywidgets GUI.

class face_rhythm.rois.ROIs(select_mode='gui', exampleImage=None, path_file=None, coords_rois=None, point_positions=None, mask_images=None, verbose=1)[source]

Bases: FR_Module

Container for one or more face ROIs and the tracking points sampled within them. Supports three construction modes: interactive GUI, loading a saved ROIs.h5 file, or building from explicit polygon coordinates. RH 2022

Parameters:
  • select_mode (str) –

    How to populate the ROIs. One of

    • 'gui': Launch the interactive Plotly/ipywidgets selector. exampleImage must be provided.

    • 'file': Load mask_images, roi_points, and exampleImage from a previously saved ROIs.h5 file. path_file must be provided.

    • 'custom': Build masks from explicit polygon coordinates. coords_rois and exampleImage must be provided.

    (Default is 'gui')

  • exampleImage (np.ndarray) – Image to display in the GUI or to define the canvas size for 'custom' mode. Only used when select_mode is 'gui' or 'custom'. (Default is None)

  • path_file (str) – Path to a saved ROIs.h5 file. Only used when select_mode is 'file'. (Default is None)

  • coords_rois (dict) – Dictionary mapping ROI names (e.g. 'ROI_0', 'ROI_1') to polygon vertices, given as either an np.ndarray of shape (N, 2) or a list of [x, y] pairs. Only used when select_mode is 'custom'. (Default is None)

  • point_positions (np.ndarray) – Optional pre-computed array of tracking point positions, shape (n_points, 2). (Default is None)

  • mask_images (dict) – Dictionary mapping mask names to 2D boolean np.ndarray masks with the same height and width as the videos. (Default is None)

  • verbose (int) –

    Verbosity level. One of

    • 0: No output.

    • 1: Warnings only.

    • 2: All output.

    (Default is 1)

exampleImage

The reference image associated with the ROIs.

Type:

np.ndarray

roi_points

Polygon vertices for each ROI keyed by name.

Type:

dict

mask_images

Boolean masks for each ROI keyed by name.

Type:

dict

point_positions

Tracking point positions, shape (n_points, 2).

Type:

np.ndarray

num_points

Total number of tracking points.

Type:

int

img_hw

Height and width of exampleImage.

Type:

tuple

make_points(rois, point_spacing=10)[source]

Generates a regular grid of tracking points inside the intersection of the supplied ROI masks and stores them on self.point_positions.

Parameters:
  • rois (Union[List[np.ndarray], np.ndarray]) – Either a list of 2D boolean masks, a single 2D boolean mask, or a 3D boolean array stacked along axis 0. All masks must share the same shape.

  • point_spacing (int) – Spacing between adjacent grid points, in pixels. (Default is 10)

set_point_positions(point_positions)[source]

Manually overrides self.point_positions with an explicit array.

Parameters:

point_positions (np.ndarray) – Tracking point coordinates as (x, y) pairs. shape: (n_points, 2).

plot_rois(image=None, **kwargs_imshow)[source]

Plots ROI polygon outlines (and tracking points if available) on top of an image.

Parameters:
  • image (np.ndarray) – Background image to draw the ROIs on. If None, falls back to self.exampleImage; if that is also missing, a blank image is used. (Default is None)

  • **kwargs_imshow – Additional keyword arguments forwarded to matplotlib.pyplot.imshow.

Returns:

tuple containing:
fig (matplotlib.figure.Figure):

The Matplotlib figure containing the plot.

ax (matplotlib.axes.Axes):

The Matplotlib axes containing the plot.

Return type:

(tuple)

fliplr()[source]

Flips the example image, masks, ROI polygon points, and tracking points horizontally in place.

class face_rhythm.rois.ROI_Alinger(method='createOptFlow_DeepFlow', kwargs_method=None, verbose=1)[source]

Bases: object

Registers a template image to a set of new images using OpenCV optical flow, then warps the template’s ROI polygons and tracking points onto each new image. RH 2022

Parameters:
  • method (str) –

    Optical-flow method to use for non-rigid registration. One of

    • 'calcOpticalFlowFarneback'

    • 'createOptFlow_DeepFlow'

    (Default is 'createOptFlow_DeepFlow')

  • kwargs_method (dict) – Keyword arguments forwarded to the chosen optical-flow method. If None, hard-coded defaults are used. (Default is None)

  • verbose (int) –

    Verbosity level. One of

    • 0: No updates.

    • 1: Warnings only.

    • 2: All updates.

    (Default is 1)

align_and_make_ROIs(ROIs_object_template, images_new, image_template=None, template_method='image', shifts=None, normalize=True)[source]

Performs non-rigid registration of a template image onto each new image and warps the template’s tracking points, ROI polygons, masks, and the new images themselves into the template’s frame. RH 2022

Results are stored on the instance as self.flows, self.pointPositions_new, self.roiPoints_new, self.maskImages_new, self.ROIs_objects_new, and self.images_warped.

Parameters:
  • ROIs_object_template (ROIs) – A single ROIs object built from the template image. Its ROIs and tracking points are warped onto each new image.

  • images_new (List[np.ndarray]) – Images to align the template to. Each image must have shape (H, W, n_channels) and dtype uint8.

  • image_template (np.ndarray) – Template image to warp onto the new images. shape: (H, W, n_channels), dtype: uint8. If None, ROIs_object_template.exampleImage is used. (Default is None)

  • template_method (str) –

    Strategy for choosing the template per registration. One of

    • 'image': image_template is treated as a single image.

    • 'sequential': image_template is treated as the integer index of the image to use as the zero-offset reference.

    (Default is 'image')

  • shifts (np.ndarray) – Per-image (dx, dy) shifts to add to each computed flow field, e.g. from a phase-correlation pre-registration step. If None, zero shifts are applied. (Default is None)

  • normalize (bool) – If True, normalize images to [0, 255] (using each image’s own min and max) before registration. (Default is True)

class face_rhythm.rois.Image_Aligner(verbose=True)[source]

Bases: FR_Module

A class for registering points to a template image. Currently relies on available OpenCV methods for rigid and registration. RH 2023

Parameters:

verbose (bool) – Whether to print progress updates. (Default is True)

classmethod augment_images(ims: List[numpy.ndarray], use_CLAHE: bool = True, CLAHE_grid_size: int = 1, CLAHE_clipLimit: int = 1, CLAHE_normalize: bool = True) None[source]

Augments the FOV images by mixing the FOV with the ROI images and optionally applying CLAHE. RH 2023

Parameters:
  • ims (List[np.ndarray]) – A list of FOV images.

  • use_CLAHE (bool) – Whether to apply CLAHE to the images. (Default is True)

  • CLAHE_grid_size (int) – The grid size for CLAHE. See alignment.clahe for more details. (Default is 1)

  • CLAHE_clipLimit (int) – The clip limit for CLAHE. See alignment.clahe for more details. (Default is 1)

  • CLAHE_normalize (bool) – Whether to normalize the CLAHE output. See alignment.clahe for more details. (Default is True)

Returns:

FOV_images_augmented (List[np.ndarray]):

The augmented FOV images.

Return type:

(List[np.ndarray])

fit_geometric(template: int | numpy.ndarray, ims_moving: List[numpy.ndarray], template_method: str = 'sequential', mode_transform: str = 'affine', gaussFiltSize: int = 11, mask_borders: Tuple[int, int, int, int] = (0, 0, 0, 0), n_iter: int = 1000, termination_eps: float = 1e-09, auto_fix_gaussFilt_step: bool | int = 10) numpy.ndarray[source]

Performs geometric registration of ims_moving to a template, using cv2.findTransformECC. RH 2023

Parameters:
  • template (Union[int, np.ndarray]) – Depends on the value of ‘template_method’. If ‘template_method’ == ‘image’, this should be a 2D np.ndarray image, an integer index of the image to use as the template, or a float between 0 and 1 representing the fractional index of the image to use as the template. If ‘template_method’ == ‘sequential’, then template is the integer index or fractional index of the image to use as the template.

  • ims_moving (List[np.ndarray]) – List of images to be aligned.

  • template_method (str) –

    Method to use for template selection.

    • ’image’: use the image specified by ‘template’.

    • ’sequential’: register each image to the previous or next image

    (Default is ‘sequential’)

  • mode_transform (str) – Mode of geometric transformation. Can be ‘translation’, ‘euclidean’, ‘affine’, or ‘homography’. See cv2.findTransformECC for more details. (Default is ‘affine’)

  • gaussFiltSize (int) – Size of the Gaussian filter. (Default is 11)

  • mask_borders (Tuple[int, int, int, int]) – Border mask for the image. Format is (top, bottom, left, right). (Default is (0, 0, 0, 0))

  • n_iter (int) – Number of iterations for cv2.findTransformECC. (Default is 1000)

  • termination_eps (float) – Termination criteria for cv2.findTransformECC. (Default is 1e-9)

  • auto_fix_gaussFilt_step (Union[bool, int]) – Automatically fixes convergence issues by increasing the gaussFiltSize. If False, no automatic fixing is performed. If True, the gaussFiltSize is increased by 2 until convergence. If int, the gaussFiltSize is increased by this amount until convergence. (Default is 10)

Returns:

remapIdx_geo (np.ndarray):

An array of shape (N, H, W, 2) representing the remap field for N images.

Return type:

(np.ndarray)

fit_nonrigid(template: int | numpy.ndarray, ims_moving: List[numpy.ndarray], remappingIdx_init: numpy.ndarray | None = None, template_method: str = 'sequential', mode_transform: str = 'createOptFlow_DeepFlow', kwargs_mode_transform: dict | None = None) numpy.ndarray[source]

Perform geometric registration of ims_moving to a template. Currently relies on cv2.findTransformECC. RH 2023

Parameters:
  • template (Union[int, np.ndarray]) –

    • If template_method == 'image': Then template is either an image or an integer index or a float fractional index of the image to use as the template.

    • If template_method == 'sequential': then template is the integer index of the image to use as the template.

  • ims_moving (List[np.ndarray]) – A list of images to be aligned.

  • remappingIdx_init (Optional[np.ndarray]) – An array of shape (N, H, W, 2) representing any initial remap field to apply to the images in ims_moving. The output of this method will be added/composed with remappingIdx_init. (Default is None)

  • template_method (str) –

    The method to use for template selection. Either

    • 'image': use the image specified by ‘template’.

    • 'sequential': register each image to the previous or next image (will be next for images before the template and previous for images after the template)

    (Default is ‘sequential’)

  • mode_transform (str) – The type of transformation to use for registration. Either ‘createOptFlow_DeepFlow’ or ‘calcOpticalFlowFarneback’. (Default is ‘createOptFlow_DeepFlow’)

  • kwargs_mode_transform (Optional[dict]) – Keyword arguments for the transform chosen. See cv2 docs for chosen transform. (Default is None)

Returns:

remapIdx_nonrigid (np.ndarray):

An array of shape (N, H, W, 2) representing the remap field for N images.

Return type:

(np.ndarray)

transform_images_geometric(ims_moving: numpy.ndarray, remappingIdx: numpy.ndarray | None = None) numpy.ndarray[source]

Transforms images based on geometric registration warps.

Parameters:
  • ims_moving (np.ndarray) – The images to be transformed. (N, H, W)

  • remappingIdx (Optional[np.ndarray]) – An array specifying how to remap the images. If None, the remapping index from geometric registration is used. (Default is None)

Returns:

ims_registered_geo (np.ndarray):

The images after applying the geometric registration warps. (N, H, W)

Return type:

(np.ndarray)

transform_images_nonrigid(ims_moving: numpy.ndarray, remappingIdx: numpy.ndarray | None = None) numpy.ndarray[source]

Transforms images based on non-rigid registration warps.

Parameters:
  • ims_moving (np.ndarray) – The images to be transformed. (N, H, W)

  • remappingIdx (Optional[np.ndarray]) – An array specifying how to remap the images. If None, the remapping index from non-rigid registration is used. (Default is None)

Returns:

ims_registered_nonrigid (np.ndarray):

The images after applying the non-rigid registration warps. (N, H, W)

Return type:

(np.ndarray)

transform_images(ims_moving: List[numpy.ndarray], remappingIdx: List[numpy.ndarray]) List[numpy.ndarray][source]

Transforms images using the specified remapping index.

Parameters:
  • ims_moving (List[np.ndarray]) – The images to be transformed. List of arrays with shape: (H, W) or (H, W, C)

  • remappingIdx (List[np.ndarray]) – The remapping index to apply to the images.

Returns:

ims_registered (List[np.ndarray]):

The transformed images. (N, H, W)

Return type:

(List[np.ndarray])

transform_points(points: numpy.ndarray, remappingIdx: numpy.ndarray)[source]

Warps points through the supplied remapping index field. RH 2022

Parameters:
  • points (np.ndarray) – Points to warp as (x, y) pairs. shape: (n_points, 2), dtype: float.

  • remappingIdx (np.ndarray) – Remapping index field that maps output (x, y) coordinates to source (x, y) coordinates. shape: (H, W, 2), dtype: float. Last dim is (x, y).

Returns:

points_remap (np.ndarray):

Warped points clipped to image bounds. shape: (n_points, 2).

Return type:

(np.ndarray)

get_flowFields(remappingIdx: numpy.ndarray | None = None) List[numpy.ndarray][source]

Returns the flow fields based on the remapping indices.

Parameters:

remappingIdx (Optional[np.ndarray]) – The indices for remapping the flow fields. If None, geometric or nonrigid registration must be performed first. (Default is None)

Returns:

flow_fields (List[np.ndarray]):

The transformed flow fields.

Return type:

(List[np.ndarray])

face_rhythm.rois.clahe(im: numpy.ndarray, grid_size: int = 50, clipLimit: int = 0, normalize: bool = True) numpy.ndarray[source]

Perform Contrast Limited Adaptive Histogram Equalization (CLAHE) on an image.

Parameters:
  • im (np.ndarray) – Input image.

  • grid_size (int) – Size of the grid. See cv2.createCLAHE for more info. (Default is 50)

  • clipLimit (int) – Clip limit. See cv2.createCLAHE for more info. (Default is 0)

  • normalize (bool) – Whether to normalize the output image. (Default is True)

Returns:

im_out (np.ndarray):

Output image after applying CLAHE.

Return type:

(np.ndarray)

face_rhythm.spectral_analysis module

Spectral analysis of point-tracked motion via Variable-Q Transform (VQT).

VQT_Analyzer converts point trajectories (from face_rhythm.point_tracking) into per-point spectrograms with a Variable-Q transform, applies 1/f and per-timepoint normalization, and writes the resulting complex or magnitude tensors out for downstream face_rhythm.decomposition.

class face_rhythm.spectral_analysis.VQT_Analyzer(params_VQT: dict = {'F_max': 40, 'F_min': 1, 'Fs_sample': 90, 'Q_highF': 20, 'Q_lowF': 3, 'downsample_factor': 8, 'fast_length': True, 'fft_conv': True, 'filters': None, 'n_freq_bins': 55, 'padding': 'valid', 'plot_pref': False, 'symmetry': 'center', 'take_abs': True, 'taper_asymmetric': True, 'window_type': 'hann'}, batch_size: int = 10, device='cpu', normalization_factor: float = 0.99, spectrogram_exponent: float = 1.0, one_over_f_exponent: float = 1.0, verbose: int = 1)[source]

Bases: FR_Module

Computes normalized Variable-Q Transform (VQT) spectrograms for point displacement traces. RH 2022

Parameters:
  • params_VQT (dict) – Keyword arguments forwarded to vqt.VQT (the Variable Q-Transform implementation in vqt). Notable keys include Fs_sample (sampling rate in Hz), Q_lowF and Q_highF (Q-factors at the low and high frequency bounds), F_min and F_max (frequency range in Hz), n_freq_bins, window_type, downsample_factor, and take_abs. (Default is the dict shown in the signature)

  • batch_size (int) – Number of points processed per VQT batch. (Default is 10)

  • device (str) – Torch device on which the VQT model is run (e.g. 'cpu' or 'cuda'). (Default is 'cpu')

  • normalization_factor (float) – Strength of the per-timepoint power normalization, in the range [0, 1]. 0 disables normalization; 1 forces every time point to have equal total power. (Default is 0.99)

  • spectrogram_exponent (float) – Exponent applied to the spectrogram magnitudes prior to normalization. (Default is 1.0)

  • one_over_f_exponent (float) – Exponent for the 1/f correction; the spectrogram is multiplied by freqs ** one_over_f_exponent. 0 disables the correction. (Default is 1.0)

  • verbose (int) – Verbosity level. 0 is silent, higher values print and show progress bars. (Default is 1)

spectrograms

Dict mapping each input key to its normalized spectrogram array. Populated by transform_all().

Type:

dict

x_axis

Dict mapping each input key to the time axis (in samples) of its spectrogram. Populated by transform_all().

Type:

dict

freqs

Dict mapping each input key to the frequency bin centers (in Hz). Populated by transform_all().

Type:

dict

point_positions

Reshaped reference positions used to subtract offsets from traces.

Type:

torch.Tensor

vqt_model

The underlying VQT filter-bank model.

Type:

vqt.VQT

config

Constructor arguments echoed for FR_Module serialization.

Type:

dict

run_data

Output payload (filters, frequencies, spectrograms, axes) used by FR_Module for export.

Type:

dict

cleanup()[source]

Deletes every attribute on the instance and triggers garbage collection to free large tensors held by the analyzer.

transform(points_tracked: numpy.ndarray, point_positions: numpy.ndarray)[source]

Transforms a single batch of tracked points into a normalized VQT spectrogram.

Parameters:
  • points_tracked (np.ndarray) – Tracked point coordinates. shape: (n_frames, n_points, 2).

  • point_positions (np.ndarray) – Reference positions of the tracked points used to compute displacements. shape: (n_points, 2).

Returns:

tuple containing:
spectrograms (np.ndarray):

Normalized spectrograms for the x and y displacement components. shape: (2, n_points, n_freq_bins, n_frames_ds), where n_frames_ds is the downsampled frame count.

x_axis (np.ndarray):

Time axis of the spectrogram, in samples at the original Fs_sample rate. shape: (n_frames_ds,).

freqs (np.ndarray):

Frequency bin centers in Hz. shape: (n_freq_bins,).

Return type:

(tuple)

transform_all(points_tracked: dict, point_positions: numpy.ndarray)[source]

Generates spectrograms for every entry in a dict of tracked-point arrays and stores the results on the instance.

Parameters:
  • points_tracked (dict) – Mapping from a name to a tracked-points array of shape (n_frames, n_points, 2).

  • point_positions (np.ndarray) – Reference positions of the tracked points used to compute displacements. shape: (n_points, 2).

Set Attributes:
spectrograms (dict):

Dict mapping each input key to its normalized spectrogram.

x_axis (dict):

Dict mapping each input key to the spectrogram time axis.

freqs (dict):

Dict mapping each input key to the frequency bin centers in Hz.

run_data (dict):

Updated with spectrograms, x_axis, and point_positions for FR_Module export.

demo_transform(points_tracked: dict, point_positions: numpy.ndarray, idx_point: list = [0], name_points: str = '0', plot: bool = True)[source]

Runs a single-point demo transform for visual sanity checking and prints the projected memory footprint of the full spectrogram set.

Parameters:
  • points_tracked (dict) – Mapping from a name to a tracked-points array of shape (n_frames, n_points, 2).

  • point_positions (np.ndarray) – Reference positions of the tracked points. shape: (n_points, 2).

  • idx_point (list) – Indices of points within the selected entry to transform and plot. (Default is [0,])

  • name_points (str) – Key into points_tracked selecting which array to use. (Default is '0')

  • plot (bool) – If True, displays a matplotlib figure with the x and y spectrograms. (Default is True)

Returns:

tuple containing:
spec (np.ndarray):

Demo spectrogram. shape: (2, n_freq_bins, n_frames_ds).

x_axis (np.ndarray):

Time axis of the spectrogram, in samples at the original Fs_sample rate. shape: (n_frames_ds,).

freqs (np.ndarray):

Frequency bin centers in Hz. shape: (n_freq_bins,).

Return type:

(tuple)

face_rhythm.util module

Miscellaneous project utilities: FR_Module base class, config I/O, system info, batch launcher.

Contains the shared FR_Module base class (save config / run_info / run_data for every pipeline stage), YAML helpers, matplotlib -> numpy array helpers, a system-info collector used to snapshot the environment in run_info.json, and a SLURM batch-run wrapper.

face_rhythm.util.get_default_parameters(path_defaults=None, directory_project=None, directory_videos=None, filename_videos_strMatch=None, path_ROIs=None)[source]

Returns a dictionary of default parameters for running face-rhythm pipelines. RH 2023

Parameters:
  • path_defaults (Optional[str]) – Path to a JSON file containing a parameters dictionary. If provided, parameters are loaded from this file. If None, the built-in defaults are used. (Default is None)

  • directory_project (Optional[str]) – Directory to use as the project directory. Passed through to fr.project.prepare_project. (Default is None)

  • directory_videos (Optional[str]) – Directory containing the videos. Passed through to fr.helpers.find_paths to discover video paths. (Default is None)

  • filename_videos_strMatch (Optional[str]) – Regex that video filenames must match. Passed through to fr.helpers.find_paths to filter discovered videos. (Default is None)

  • path_ROIs (Optional[str]) – Path to the file containing the ROIs. Used by fr.rois.ROIs when running in 'file' mode instead of 'gui' mode. (Default is None)

Returns:

params (dict):

Dictionary containing the default (or loaded) parameters for each pipeline stage.

Return type:

(dict)

class face_rhythm.util.FR_Module[source]

Bases: object

Superclass for all face-rhythm module classes. Provides shared helpers for saving run_data, run_info, and config files. RH 2022

run_info

Per-run metadata populated by the subclass. Saved by save_run_info().

Type:

Optional[dict]

run_data

Per-run output data populated by the subclass. Saved by save_run_data().

Type:

Optional[dict]

module_name

Name of the concrete subclass; used as the top-level key in the config and run_info files.

Type:

str

save_config(path_config=None, overwrite=True, verbose=1)[source]

Appends self.config to the config.yaml file. RH 2022

self.config is created by the subclass and should contain all parameters used to run the module.

Parameters:
  • path_config (str) – Path to the config.yaml file. (Default is None)

  • overwrite (bool) – If True, overwrites the existing field for this module inside config.yaml. (Default is True)

  • verbose (int) –

    Verbosity level. Either

    • 0: Silent.

    • 1: Print warnings.

    • 2: Print all info.

    (Default is 1)

save_run_info(path_run_info=None, path_config=None, overwrite=True, verbose=1)[source]

Appends self.run_info to the run_info.json file.

Exactly one of path_run_info or path_config must be supplied.

Parameters:
  • path_run_info (Optional[str]) – Path to the run_info.json file. If None, path_config must be provided and must contain config['paths']['run_info']. If the file does not exist, it will be created. (Default is None)

  • path_config (Optional[str]) – Path to the config.yaml file. If None, path_run_info must be provided. (Default is None)

  • overwrite (bool) – If True, overwrites the existing field for this module inside run_info.json. (Default is True)

  • verbose (int) –

    Verbosity level. Either

    • 0: Silent.

    • 1: Print warnings.

    • 2: Print all info.

    (Default is 1)

save_run_data(path_run_data=None, path_config=None, overwrite=True, use_compression=False, track_order=True, verbose=1)[source]

Saves self.run_data to an .h5 file under the project’s analysis_files directory. RH 2022

self.run_data is created by the subclass and should contain all the data generated by the module. Exactly one of path_run_data or path_config must be supplied. The project directory should already exist (use face_rhythm.project.prepare_project).

Parameters:
  • path_run_data (Optional[str]) – Path to the output .h5 file. If None, path_config must be provided and must contain config['paths']['project']. Resolved path will be <project>/analysis_files/<module_name>.h5. If the file does not exist, it will be created. (Default is None)

  • path_config (Optional[str]) – Path to the config.yaml file. If None, path_run_data must be provided. (Default is None)

  • overwrite (bool) – If True, overwrites the existing .h5 file. (Default is True)

  • use_compression (bool) – If True, uses compression when writing the .h5 file. (Default is False)

  • track_order (bool) – If True, preserves insertion order of keys inside the .h5 file. (Default is True)

  • verbose (int) –

    Verbosity level. Either

    • 0: Silent.

    • 1: Print warnings.

    • 2: Print all info.

    (Default is 1)

face_rhythm.util.load_yaml_safe(path, verbose=0)[source]

Loads a YAML file, falling back to yaml.Loader if FullLoader fails.

Parameters:
  • path (str) – Path to the .yaml file.

  • verbose (int) – Verbosity level. Higher values print more info. (Default is 0)

Returns:

data (dict):

Parsed YAML file as a dictionary.

Return type:

(dict)

face_rhythm.util.load_config_file(path, verbose=0)[source]

Loads a config.yaml file as a dictionary.

Parameters:
  • path (str) – Path to the config.yaml file.

  • verbose (int) – Verbosity level. Higher values print more info. (Default is 0)

Returns:

config (dict):

Parsed config.yaml file as a dictionary.

Return type:

(dict)

face_rhythm.util.load_run_info_file(path, verbose=0)[source]

Loads a run_info.json file as a dictionary.

Parameters:
  • path (str) – Path to the run_info.json file.

  • verbose (int) – Verbosity level. Higher values print more info. (Default is 0)

Returns:

run_info (dict):

Parsed run_info.json file as a dictionary.

Return type:

(dict)

class face_rhythm.util.Saver_Viz_Base(path_config: str = None, dir_save: str = None, formats_save: list = ['png'], kwargs_method: dict = {}, overwrite: bool = False, verbose: int = 1)[source]

Bases: object

Superclass for saving visualizations (e.g. Figure_Saver, Image_Saver).

Parameters:
  • path_config (Optional[str]) – Path to the config.yaml file. Optional if dir_save is specified. (Default is None)

  • dir_save (Optional[str]) – Directory to save visualizations into. Optional if path_config is specified. (Default is None)

  • formats_save (List[str]) – File formats to save visualizations as. Valid values depend on the saving method used by the subclass. (Default is ['png'])

  • kwargs_method (Dict[str, Any]) – Keyword arguments forwarded to the underlying save method. (Default is {})

  • overwrite (bool) – If True, overwrites existing files. (Default is False)

  • verbose (int) –

    Verbosity level. Either

    • 0: Silent.

    • 1: Print warnings.

    • 2: Print warnings and info.

    (Default is 1)

path_config

Stored path to the config.yaml file.

Type:

Optional[str]

dir_save

Resolved directory used for saving outputs.

Type:

str

formats_save

Stored list of file formats.

Type:

List[str]

kwargs_method

Stored keyword arguments forwarded to the save method.

Type:

Dict[str, Any]

overwrite

Stored overwrite flag.

Type:

bool

verbose

Stored verbosity level.

Type:

int

class face_rhythm.util.Figure_Saver(path_config: str = None, dir_save: str = None, formats_save: list = ['png'], kwargs_savefig: dict = {'bbox_inches': 'tight', 'dpi': 300, 'pad_inches': 0.1, 'transparent': True}, overwrite: bool = False, verbose: int = 1)[source]

Bases: Saver_Viz_Base

Saves matplotlib figures to disk in one or more file formats. RH 2022

Parameters:
  • path_config (Optional[str]) – Path to the config.yaml file. If None, dir_save must be specified. (Default is None)

  • dir_save (Optional[str]) – Directory to save the figure into. Used when path_config is None. (Default is None)

  • formats_save (List[str]) – File formats to save the figure as. Common choices are 'png', 'svg', 'eps', and 'pdf'. (Default is ['png'])

  • kwargs_savefig (Dict[str, Any]) – Keyword arguments forwarded to matplotlib.figure.Figure.savefig. (Default is {'bbox_inches': 'tight', 'pad_inches': 0.1, 'transparent': True, 'dpi': 300})

  • overwrite (bool) – If True, overwrites existing files. (Default is False)

  • verbose (int) –

    Verbosity level. Either

    • 0: Silent.

    • 1: Print warnings.

    • 2: Print warnings and info.

    (Default is 1)

kwargs_savefig

Stored savefig keyword arguments.

Type:

Dict[str, Any]

save_figure(fig, name_save: str = None, dir_save: str = None, formats_save: str = None, kwargs_savefig: dict = None)[source]

Saves a single matplotlib figure to one or more file formats.

Parameters:
  • fig (matplotlib.figure.Figure) – Figure to save.

  • name_save (Optional[str]) – Name of the file to save the figure as (without extension). If None, the figure’s label is used. (Default is None)

  • dir_save (Optional[str]) – Directory to save the figure into. If None, the directory stored on the instance is used. (Default is None)

  • formats_save (Optional[Union[str, List[str]]]) – File format(s) to save the figure as. If None, the formats stored on the instance are used. (Default is None)

  • kwargs_savefig (Optional[Dict[str, Any]]) – Keyword arguments forwarded to matplotlib.figure.Figure.savefig. If None, the stored kwargs are used. (Default is None)

class face_rhythm.util.Image_Saver(path_config: str = None, dir_save: str = None, formats_save: list = ['png'], kwargs_PIL_save: dict = {}, overwrite: bool = False, verbose: int = 1)[source]

Bases: Saver_Viz_Base

Saves images and animated GIFs to disk using PIL. RH 2022

Parameters:
  • path_config (Optional[str]) – Path to the config.yaml file. If None, dir_save must be specified. (Default is None)

  • dir_save (Optional[str]) – Directory to save the image into. Used when path_config is None. (Default is None)

  • formats_save (List[str]) – File formats to save the image as. Common choices are 'png', 'jpg', and 'tif'. (Default is ['png'])

  • kwargs_PIL_save (Dict[str, Any]) – Keyword arguments forwarded to PIL.Image.Image.save. (Default is {})

  • overwrite (bool) – If True, overwrites existing files. (Default is False)

  • verbose (int) –

    Verbosity level. Either

    • 0: Silent.

    • 1: Print warnings.

    • 2: Print warnings and info.

    (Default is 1)

kwargs_PIL_save

Stored PIL.Image.save keyword arguments.

Type:

Dict[str, Any]

save_image(array_image, name_save: str = None, dir_save: str = None, formats_save: str = None, kwargs_PIL_save: dict = None)[source]

Saves a single image array as one or more files using PIL.

Parameters:
  • array_image (np.ndarray) – Image to save. shape: (H, W) or (H, W, C) with C in {1, 3}. If dtype is float, values must lie in [0, 1] and will be scaled by 255 and cast to uint8. If dtype is int, values must lie in [0, 255] and will be cast to uint8.

  • name_save (Optional[str]) – Name of the file to save the image as (without extension). If None, 'image' is used. (Default is None)

  • dir_save (Optional[str]) – Directory to save the image into. If None, the directory stored on the instance is used. (Default is None)

  • formats_save (Optional[Union[str, List[str]]]) – File format(s) to save the image as. If None, the formats stored on the instance are used. (Default is None)

  • kwargs_PIL_save (Optional[Dict[str, Any]]) – Keyword arguments forwarded to PIL.Image.Image.save. If None, the stored kwargs are used. (Default is None)

save_gif(array_images, name_save: str = None, dir_save: str = None, frame_rate: float = 5.0, loop: int = True, optimize: bool = True, kwargs_PIL_save: dict = None)[source]

Saves a sequence of images as an animated GIF using PIL.

Parameters:
  • array_images (List[np.ndarray]) – List of frames to save. Each frame has shape (H, W) or (H, W, C) with C in {1, 3}.

  • name_save (Optional[str]) – Name of the file to save the GIF as (without extension). If None, 'image' is used. (Default is None)

  • dir_save (Optional[str]) – Directory to save the GIF into. If None, the directory stored on the instance is used. (Default is None)

  • frame_rate (float) – Playback frame rate in frames per second. (Default is 5.0)

  • loop (Union[int, bool]) – Number of times the GIF should loop. True loops forever. (Default is True)

  • optimize (bool) – If True, applies PIL’s GIF size optimization. (Default is True)

  • kwargs_PIL_save (Optional[Dict[str, Any]]) – Keyword arguments forwarded to PIL.Image.Image.save. If None, the stored kwargs are used. (Default is None)

face_rhythm.util.system_info(verbose: bool = False) Dict[source]

Collects information about the OS, CPU, RAM, GPU, and key Python packages, and optionally prints it. RH 2022

Parameters:

verbose (bool) – If True, prints each section to stdout as it is collected. (Default is False)

Returns:

versions (Dict):

Dictionary containing the system snapshot. Keys include 'datetime', 'face_rhythm', 'operating_system', 'cpu_info', 'user', 'ram', 'gpu_info', 'conda_env', 'python', 'gcc', 'torch', 'cuda', 'cudnn', 'torch_devices', and 'pkgs'.

Return type:

(Dict)

face_rhythm.util.batch_run(paths_scripts, params_list, sbatch_config_list, max_n_jobs=2, dir_save=None, name_save='jobNum_', verbose=True)[source]

Submits a batch of SLURM jobs that each run a Python script with a parameter file. Adapted from BNPM. RH 2021

A typical workflow is to sweep one script over a list of parameter dictionaries: each entry in params_list is written to its own job directory as params.json, the corresponding SBATCH script is materialized, and sbatch is invoked. Variants with multiple scripts or multiple SBATCH configs are also supported – any of paths_scripts, params_list, and sbatch_config_list may have length 1 (broadcast) or length n_jobs.

Parameters:
  • paths_scripts (List[str]) – Paths to the Python scripts to run. Length must be 1 or n_jobs. Each script should accept the kwargs --path_params and --directory_save injected by this function.

  • params_list (List[Dict[str, Any]]) – Parameter dictionaries, one per job. Length must be 1 or n_jobs. Each dictionary is written as params.json inside its job directory and its path is passed to the script.

  • sbatch_config_list (List[str]) – SBATCH script bodies, one per job. Length must be 1 or n_jobs. Each string must contain the literal python "$@" on its final command line; this is replaced with the resolved python <script> --path_params <...> --directory_save <...> invocation before being written to disk.

  • max_n_jobs (Optional[int]) – Safety cap on the number of jobs that may be submitted. If the inferred n_jobs exceeds this value, a ValueError is raised. Set to None to disable the cap. (Default is 2)

  • dir_save (Union[str, pathlib.Path]) – Outer directory under which each job’s subdirectory is created. Created if it does not exist. Must be supplied – there is no sensible default. (Default is None)

  • name_save (Union[str, List[str]]) – Base name for each job’s subdirectory; the job index is always appended. If a string, it is reused for every job; if a list, it must have n_jobs items. (Default is 'jobNum_')

  • verbose (bool) – If True, prints a status line per submitted job. (Default is True)

face_rhythm.visualization module

Frame and video visualization: overlay points/text on images and write videos.

FrameVisualizer wraps OpenCV’s video writer and draw primitives; helper functions play back buffered readers with overlaid trajectories and produce interactive image stacks for Jupyter contexts.

class face_rhythm.visualization.FrameVisualizer(display=False, handle_cv2Imshow='FaceRhythmPointVisualizer', path_save=None, frame_height_width=(480, 640), frame_rate=None, fourcc='MJPG', error_checking=True, verbose: int = 1, point_sizes=None, points_colors=None, alpha=None, text=None, text_positions=None, text_color=None, text_size=None, text_thickness=None)[source]

Bases: object

Wraps OpenCV draw primitives and an optional cv2.VideoWriter to overlay points and text on single frames, optionally displaying them via cv2.imshow and/or writing them to a video file. RH 2022

Parameters:
  • display (bool) – If True, display each frame using cv2.imshow. (Default is False)

  • handle_cv2Imshow (str) – Window name passed to cv2.imshow. Used to close the window later. (Default is 'FaceRhythmPointVisualizer')

  • path_save (Optional[str]) – If not None, frames are written to this video file path. Use an .avi extension (e.g. 'directory/filename.avi'). (Default is None)

  • frame_height_width (Tuple[int, int]) – Height and width of the displayed and/or saved video. (Default is (480, 640))

  • frame_rate (Optional[int]) – Frame rate of the displayed and/or saved video. If None, playback runs at top speed and saved videos default to 60 fps. (Default is None)

  • fourcc (str) – Four-character codec passed to cv2.VideoWriter. (Default is 'MJPG')

  • error_checking (bool) – If True, perform input validation in visualize_image_with_points. (Default is True)

  • verbose (int) –

    Verbosity level.

    • 0: No messages.

    • 1: Warnings.

    • 2: Info.

    (Default is 1)

  • point_sizes (Optional[Union[int, List[int]]]) – Optional override applied in every call to visualize_image_with_points. Passed to cv2.circle. If an int, all points use this radius; if a list, each element is the radius for one batch of points. (Default is None)

  • points_colors (Optional[Union[Tuple[int, int, int], List]]) – Optional override applied in every call to visualize_image_with_points. Passed to cv2.circle. If a tuple of 3 ints in [0, 255], all points use this color; if a list, each element is a color or a per-point color array of shape (N, 3) for one batch. (Default is None)

  • alpha (Optional[float]) – Optional override applied in every call to visualize_image_with_points. Transparency of the overlaid points; values other than 1 are slow. (Default is None)

  • text (Optional[Union[str, List[str]]]) – Optional override applied in every call to visualize_image_with_points. If None, no text is drawn; if a string, the same string is drawn at every position; if a list, each element is drawn at the corresponding row of text_positions. (Default is None)

  • text_positions (Optional[np.ndarray]) – Optional override applied in every call to visualize_image_with_points. Must be specified if text is not None. shape: (n_text, 2), order (x, y). (Default is None)

  • text_color (Optional[Union[str, List[str]]]) – Optional override applied in every call to visualize_image_with_points. Passed to cv2.putText. If a string, the same color is used for all text; if a list, each element is the color for one text item. (Default is None)

  • text_size (Optional[Union[int, List[int]]]) – Optional override applied in every call to visualize_image_with_points. Passed to cv2.putText. If an int, the same scale is used for all text; if a list, each element is the scale for one text item. (Default is None)

  • text_thickness (Optional[Union[int, List[int]]]) – Optional override applied in every call to visualize_image_with_points. Passed to cv2.putText. If an int, the same thickness is used for all text; if a list, each element is the thickness for one text item. (Default is None)

display

Whether cv2.imshow is called on each visualized frame.

Type:

bool

error_checking

Whether input validation runs in visualize_image_with_points.

Type:

bool

handle_cv2Imshow

Window name used by cv2.imshow.

Type:

str

path_save

Resolved absolute path to the output video file, or None.

Type:

Optional[str]

frame_height_width

Height and width of frames written to the video file.

Type:

Tuple[int, int]

frame_rate

Frame rate used for both display timing and the video writer.

Type:

Optional[int]

fourcc

Four-character codec used by cv2.VideoWriter.

Type:

str

video_writer

Underlying cv2.VideoWriter instance, or None if path_save is not set.

Type:

Optional[object]

visualize_image_with_points(image, points=None, point_sizes=None, points_colors=(0, 255, 255), alpha=None, text=None, text_positions=None, text_color='white', text_size=1, text_thickness=1)[source]

Draws points and text onto a single image and optionally displays and/or writes the result. Input validation is intentionally minimal for performance, so the caller must follow the documented formats.

Parameters:
  • image (np.ndarray) – Image to draw on. shape: (H, W, 3), dtype: uint8. The last dimension is channels.

  • points (Optional[Union[np.ndarray, List[np.ndarray]]]) – Points to overlay. If a single np.ndarray of shape (n_points, 2) and integer dtype, it is treated as one batch and clamped to the image bounds. If a list, each element is one batch of shape (n_points, 2) and dtype int; column order is (x, y). (Default is None)

  • point_sizes (Optional[Union[int, List[int]]]) – Radius passed to cv2.circle. If an int, every point uses this size; if a list, each element is the size for one batch of points. (Default is None)

  • points_colors (Union[Tuple[int, int, int], List]) – Color passed to cv2.circle. If a tuple of 3 ints in [0, 255], every point uses this color; if a list, each element is either a 3-tuple for one batch or an np.ndarray of shape (n_points, 3) with per-point colors in [0, 255]. (Default is (0, 255, 255))

  • alpha (Optional[float]) – Transparency of the overlaid points; values other than 1 are slow. (Default is None)

  • text (Optional[Union[str, List[str]]]) – Text passed to cv2.putText. If None, no text is drawn; if a string, the same string is drawn at every row of text_positions; if a list, each element is drawn at the matching row. (Default is None)

  • text_positions (Optional[np.ndarray]) – Positions for each text item. Required if text is not None. shape: (n_text, 2), order (x, y). (Default is None)

  • text_color (Union[str, List[str]]) – Color passed to cv2.putText. If a string, all text uses this color; if a list, each element is the color for one text item. (Default is 'white')

  • text_size (Union[int, List[int]]) – Font scale passed to cv2.putText. If an int, all text uses this scale; if a list, each element is the scale for one text item. (Default is 1)

  • text_thickness (Union[int, List[int]]) – Stroke thickness passed to cv2.putText. If an int, all text uses this thickness; if a list, each element is the thickness for one text item. (Default is 1)

Returns:

image_out (np.ndarray):

Copy of image with points and text drawn on top. shape: (H, W, 3), dtype: uint8.

Return type:

(np.ndarray)

close()[source]

Closes the OpenCV display window and releases the video writer if either is active.

face_rhythm.visualization.play_video_with_points(bufferedVideoReader, frameVisualizer=None, points=None, idx_frames=None)[source]

Plays a video with optional point overlays and optionally writes it to disk via the supplied FrameVisualizer. RH 2022

Parameters:
  • bufferedVideoReader (BufferedVideoReader) – Source of frames, created with fr.helpers.BufferedVideoReader.

  • frameVisualizer (FrameVisualizer) – Visualizer that draws and optionally saves each frame, created with fr.visualization.FrameVisualizer. Required in practice despite the default. (Default is None)

  • points (Optional[np.ndarray]) – Points to overlay on the video. shape: (num_frames, num_points, 2). (Default is None)

  • idx_frames (Optional[np.ndarray]) – Indices of frames to play. If None, all frames in the reader are played. (Default is None)

face_rhythm.visualization.display_toggle_image_stack(images, image_size=None, clim=None, interpolation='nearest')[source]

Renders an HTML image slider in a Jupyter notebook to scrub through a stack of images. RH 2023

Parameters:
  • images (List[Union[np.ndarray, torch.Tensor]]) – Images to display, each as a 2D or 3D np.ndarray or torch.Tensor. All images must share an interpretation compatible with PIL fromarray.

  • image_size (Optional[Union[Tuple[int, int], float]]) –

    Output size per image.

    • Tuple[int, int]: explicit (width, height) applied to every image.

    • float: scale factor applied to each image’s native shape.

    • None: images are displayed at their native size.

    (Default is None)

  • clim (Optional[Tuple[float, float]]) – (min, max) intensity bounds used to scale pixel values to [0, 255]. If None, the per-image min and max are used. (Default is None)

  • interpolation (str) –

    Resampling method used when resizing. One of

    • 'nearest'

    • 'box'

    • 'bilinear'

    • 'hamming'

    • 'bicubic'

    • 'lanczos'

    Mapped to the matching PIL.Image.Resampling.* constant. (Default is 'nearest')

face_rhythm.visualization.complex_colormap(mags: numpy.ndarray, angles: numpy.ndarray, normalize_mags: bool = True, color_sin: Tuple[int, int, int] = (255, 0, 0), color_cos: Tuple[int, int, int] = (0, 0, 255)) numpy.ndarray[source]

Generates an RGB colormap for complex values, where hue tracks the angle and brightness tracks the magnitude.

Parameters:
  • mags (np.ndarray) – Magnitudes of the complex values. Must broadcast against angles.

  • angles (np.ndarray) – Angles in radians. Must share shape with mags.

  • normalize_mags (bool) – If True, apply min-max normalization to mags before scaling brightness. (Default is True)

  • color_sin (Tuple[int, int, int]) – RGB color contributed in proportion to sin(angles). (Default is (255, 0, 0))

  • color_cos (Tuple[int, int, int]) – RGB color contributed in proportion to cos(angles). (Default is (0, 0, 255))

Returns:

rgb (np.ndarray):

RGB values per element. shape: (mags.size, 3).

Return type:

(np.ndarray)