Package 'NOVA' reference manual

Title:	Neural Output Visualization and Analysis
Description:	A toolkit for analyzing and visualizing Multi-Electrode Array (MEA) neural recordings, from raw CSV discovery through Principal Component Analysis (PCA), heatmaps, per-metric plots, and trajectory visualization. A lightweight trajectory-summary layer describes how experimental conditions move through state space relative to baseline (distance travelled, path directness, and timing), with robust timepoint ordering and a plain-language interpretation helper. Provides publication-ready visualizations with flexible customization options for neuroscience research applications.
Authors:	Alex Tudoras [aut, cre]
Maintainer:	Alex Tudoras <[email protected]>
License:	GPL (>= 3)
Version:	0.5.0
Built:	2026-07-15 20:45:51 UTC
Source:	https://github.com/atudoras/nova

Analyze and Visualize PCA Variable Importance

Description

This function performs comprehensive analysis of variable importance in Principal Component Analysis, generating multiple visualization types including loading biplots, importance rankings, PC comparisons, and heatmaps. It extracts variable contributions to specified principal components and creates publication-ready plots with detailed statistical summaries.

Usage

analyze_pca_variable_importance_general(
  pca_result = NULL,
  output_dir = tempdir(),
  experiment_name = "PCA_Analysis",
  pc_x = "PC1",
  pc_y = "PC2",
  color_scheme = "default",
  top_n = 15,
  min_loading_threshold = 0.1,
  save_plots = TRUE,
  show_labels = TRUE,
  verbose = TRUE
)
analyze_pca_variable_importance_general(
  pca_result = NULL,
  output_dir = tempdir(),
  experiment_name = "PCA_Analysis",
  pc_x = "PC1",
  pc_y = "PC2",
  color_scheme = "default",
  top_n = 15,
  min_loading_threshold = 0.1,
  save_plots = TRUE,
  show_labels = TRUE,
  verbose = TRUE
)

Arguments

pca_result

A PCA result object. Can be either a prcomp object directly, or a list containing a PCA object in fields named 'pca_result', 'pca', 'result', or 'prcomp'.

output_dir

Character string specifying the directory for saving plots and results (default: "pca_plots").

experiment_name

Character string used as a prefix for output files and plot titles (default: "PCA_Analysis").

pc_x

Character string specifying the principal component for x-axis analysis (default: "PC1").

pc_y

Character string specifying the principal component for y-axis analysis (default: "PC2").

color_scheme

Character string specifying the color palette. Options: "default", "viridis", "colorbrewer" (default: "default").

top_n

Numeric value specifying the number of top variables to focus on in detailed analyses (default: 15).

min_loading_threshold

Numeric value specifying the minimum loading threshold for importance filtering (default: 0.1).

save_plots

Logical indicating whether to save plots and results to disk (default: TRUE).

show_labels

Logical indicating whether to show variable labels on the biplot (default: TRUE).

verbose

Logical indicating whether to print detailed progress messages (default: TRUE).

Details

The function calculates multiple importance metrics for each variable:

PC loadings: Direct loading values for specified principal components
Combined importance: Euclidean distance combining both PC loadings
Contribution percentages: Percent contribution to each PC's total variance
Ranking: Variables ranked by combined importance score

Four visualization types are generated:

Loading Biplot: Scatter plot showing variable loadings on both PCs with size indicating importance
Importance Bar Chart: Ranked bar chart of top variables by combined importance
PC Comparison: Side-by-side comparison of absolute loadings for both PCs
Loading Heatmap: Color-coded matrix showing loading values and directions

The function automatically:

Validates input PCA objects from various sources
Calculates variance explained by each principal component
Creates publication-ready plots with consistent theming
Exports detailed CSV files with variable rankings and analysis summaries
Provides comprehensive statistical summaries

Color schemes provide different aesthetic options:

default: Blue/red palette suitable for most publications
viridis: Colorblind-friendly viridis color scale
colorbrewer: ColorBrewer palettes optimized for scientific visualization

View top variables using head(results$selected_variables)

Value

A list containing:

plots: Named list of ggplot objects: 'biplot', 'importance_bar', 'pc_comparison', 'heatmap'
variable_importance: Data frame with comprehensive variable importance metrics for all variables
selected_variables: Data frame containing the top N most important variables with detailed statistics
analysis_summary: List with key analysis metrics and variance explained information
config_used: List documenting all parameters used in the analysis

Output Files

When save_plots = TRUE, the function creates files in the specified output directory (default: "pca_plots"). For CRAN compliance, use tempdir() for the output directory:

PNG files for each visualization type
CSV file with complete variable importance rankings
CSV file with selected top variables and detailed metrics
CSV file with analysis summary and metadata

Create Enhanced Heatmaps for Multi-Electrode Array (MEA) Data Analysis

Description

This function generates comprehensive heatmap visualizations for MEA data analysis, including individual grouping variable heatmaps, combined interaction heatmaps, and variable correlation matrices. It provides flexible scaling, clustering, and customization options with automatic quality filtering and missing data handling.

Usage

create_mea_heatmaps_enhanced(
  data = NULL,
  processing_result = NULL,
  config = NULL,
  value_column = "Normalized_Value",
  variable_column = "Variable",
  grouping_columns = c("Treatment", "Genotype"),
  sample_id_columns = c("Experiment", "Well"),
  timepoint_column = "Timepoint",
  scale_method = "z_score",
  aggregation_method = "mean",
  missing_value_handling = "remove",
  cluster_method = "euclidean",
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  create_individual_heatmaps = TRUE,
  create_combined_heatmap = TRUE,
  create_variable_correlation = TRUE,
  output_dir = NULL,
  save_plots = FALSE,
  plot_format = "png",
  plot_width = 10,
  plot_height = 8,
  dpi = 300,
  fontsize = 10,
  angle_col = 45,
  show_rownames = TRUE,
  show_colnames = TRUE,
  return_data = TRUE,
  verbose = TRUE,
  quality_threshold = 0.8,
  min_observations = 3,
  use_raw = FALSE,
  filter_timepoints = NULL,
  filter_treatments = NULL,
  filter_genotypes = NULL,
  split_by = NULL
)
create_mea_heatmaps_enhanced(
  data = NULL,
  processing_result = NULL,
  config = NULL,
  value_column = "Normalized_Value",
  variable_column = "Variable",
  grouping_columns = c("Treatment", "Genotype"),
  sample_id_columns = c("Experiment", "Well"),
  timepoint_column = "Timepoint",
  scale_method = "z_score",
  aggregation_method = "mean",
  missing_value_handling = "remove",
  cluster_method = "euclidean",
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  create_individual_heatmaps = TRUE,
  create_combined_heatmap = TRUE,
  create_variable_correlation = TRUE,
  output_dir = NULL,
  save_plots = FALSE,
  plot_format = "png",
  plot_width = 10,
  plot_height = 8,
  dpi = 300,
  fontsize = 10,
  angle_col = 45,
  show_rownames = TRUE,
  show_colnames = TRUE,
  return_data = TRUE,
  verbose = TRUE,
  quality_threshold = 0.8,
  min_observations = 3,
  use_raw = FALSE,
  filter_timepoints = NULL,
  filter_treatments = NULL,
  filter_genotypes = NULL,
  split_by = NULL
)

Arguments

data

A data frame containing MEA measurement data. If NULL, must provide processing_result.

processing_result

A list object from MEA data processing containing normalized_data or raw_data components. Takes precedence over the data parameter if provided.

config

Configuration list from MEA processing. If NULL and processing_result is provided, will attempt to use config from processing_result$config_used.

value_column

Character string specifying the column containing measurement values (default: "Normalized_Value").

variable_column

Character string specifying the column containing variable names (default: "Variable").

grouping_columns

Character vector of column names to use for grouping (default: c("Treatment", "Genotype")). Function will auto-detect which columns are available.

sample_id_columns

Character vector of columns jointly identifying one sample (default: c("Experiment", "Well")). Used to key the rows of the split_by = "combination" heatmap. Well IDs repeat across plates, so dropping "Experiment" pools the same well from different experiments into a single row. The main per-group heatmaps deliberately pool replicate wells and are unaffected.

timepoint_column

Character string specifying the timepoint column (default: "Timepoint").

scale_method

Character string specifying scaling method. Options: "z_score" (default), "min_max", "robust", "none".

aggregation_method

Character string specifying how to aggregate multiple measurements. Options: "mean" (default), "median", "sum".

missing_value_handling

Character string specifying how to handle missing values. Options: "remove" (default), "impute_mean", "impute_zero".

cluster_method

Character string specifying clustering distance method. Options: "euclidean" (default), "correlation", "manhattan".

cluster_rows

Logical indicating whether to cluster rows (default: TRUE).

cluster_cols

Logical indicating whether to cluster columns (default: TRUE).

create_individual_heatmaps

Logical indicating whether to create separate heatmaps for each grouping variable (default: TRUE).

create_combined_heatmap

Logical indicating whether to create interaction heatmap when multiple grouping variables are present (default: TRUE).

create_variable_correlation

Logical indicating whether to create variable correlation heatmap (default: TRUE).

output_dir

Character string specifying output directory (default: NULL, no files saved)

save_plots

Logical indicating whether to save plots to disk (default: FALSE)

plot_format

Character string specifying file format for saved plots (default: "png").

plot_width

Numeric value specifying plot width in inches (default: 10).

plot_height

Numeric value specifying plot height in inches (default: 8).

dpi

Numeric value specifying resolution for saved plots (default: 300).

fontsize

Numeric value specifying font size for heatmap labels (default: 10).

angle_col

Numeric value specifying angle for column labels in degrees (default: 45).

show_rownames

Logical indicating whether to show row names (default: TRUE).

show_colnames

Logical indicating whether to show column names (default: TRUE).

return_data

Logical indicating whether to return processed data matrices (default: TRUE).

verbose

Logical indicating whether to print progress messages (default: TRUE).

quality_threshold

Numeric value between 0-1 specifying minimum data completeness per variable (default: 0.8).

min_observations

Numeric value specifying minimum observations required per group (default: 3).

use_raw

Logical. If TRUE, plot raw electrode values instead of normalized values. Default FALSE.

filter_timepoints

Character vector of timepoint names to include. NULL (default) includes all timepoints.

filter_treatments

Character vector of treatment names to include. NULL (default) includes all treatments.

filter_genotypes

Character vector of genotype names to include. NULL (default) includes all genotypes.

split_by

Character string controlling plot splitting. Use "combination" to render a single heatmap of all wells annotated by both Treatment and Genotype strips. Pass any column name (e.g. "Treatment" or "Genotype") to produce one heatmap per level of that column. NULL (default) produces a single combined heatmap.

Details

The function performs several key operations:

Quality filtering: Removes variables with insufficient data completeness
Missing value handling: Multiple strategies for dealing with NA values
Data aggregation: Combines multiple measurements per group using specified method
Scaling: Applies normalization methods appropriate for heatmap visualization
Clustering: Hierarchical clustering of rows and/or columns using specified distance metrics
Visualization: Creates publication-ready heatmaps with proper color schemes and annotations

For scaling methods:

z_score: Centers data around mean with unit variance (best for comparing relative changes)
min_max: Scales to 0-1 range (best for absolute comparisons)
robust: Uses median and MAD for outlier-resistant scaling
none: No scaling applied

The function automatically adjusts plot dimensions based on data size and uses optimized color palettes appropriate for the scaling method chosen (diverging palettes for z_score/robust, sequential palettes for min_max).

Value

A list containing:

individual_heatmaps: Named list of heatmap objects for each grouping variable
combined_heatmap: Heatmap object for grouping variable interactions (if applicable)
variable_correlation: List with correlation heatmap and correlation matrix
metadata: List containing processing information and parameters used

Each heatmap object contains: heatmap (pheatmap object), scaled_data (processed matrix), raw_data (aggregated input data), annotation (row annotations), annotation_colors (color schemes), and scaling_info (scaling parameters).

Discover MEA Data Structure

Description

This function scans a directory containing MEA (Multi-Electrode Array) experiment folders and analyzes the structure of CSV files to identify experiments, timepoints, measured variables, treatments, and genotypes. It provides a comprehensive overview of the data organization without loading all files into memory.

Usage

discover_mea_structure(
  main_dir,
  experiment_pattern = "MEA\\d+",
  file_pattern = "\\.csv$",
  verbose = TRUE
)
discover_mea_structure(
  main_dir,
  experiment_pattern = "MEA\\d+",
  file_pattern = "\\.csv$",
  verbose = TRUE
)

Arguments

main_dir

Character. Path to the main directory containing experiment folders

experiment_pattern

Character. Regex pattern to identify experiment directories (default: "MEA\d+")

file_pattern

Character. Regex pattern to identify data files (default: "\.csv$")

verbose

Logical. Whether to print progress messages (default: TRUE)

Details

The function expects MEA CSV files with standard format: - Row 121: Well identifiers (A1, A2, B1, etc.) - Row 122: Treatment conditions - Row 123: Genotype information - Row 124: Exclusion flags - Rows 125-168: Variable names and measurements

Discover structure of MEA data (requires data directory)

Value

A list containing: - experiments: List of experiment info (directories, files, timepoints, metadata) - all_timepoints: Vector of all unique timepoints found across experiments - all_variables: Vector of all unique measured variables - potential_baselines: Timepoints that might serve as baseline conditions - experiment_count: Total number of experiments found - discovery_timestamp: When the analysis was performed

Plain-language interpretation of a trajectory summary

Description

Turns a nova_trajectory_summary() result into a short, cautious narrative – describing what happened (how far each condition moved, how directly, and when it peaked) without over-claiming dynamical mechanism.

Usage

nova_describe(x, ...)
nova_describe(x, ...)

Arguments

x

A nova_trajectory_summary object.

...

Unused.

Value

A character vector of sentences (printed, returned invisibly).

Examples

df <- data.frame(
  PC1 = c(0, 2, 3, 3, 0, 0.1, 0, 0.1), PC2 = c(0, 0, 0, 0, 0, 0, 0, 0),
  Treatment = rep(c("Mover", "Still"), each = 4),
  Timepoint = rep(c("baseline", "30min", "1h", "2h"), 2))
nova_describe(nova_trajectory_summary(df, group_var = "Treatment", verbose = FALSE))
df <- data.frame(
  PC1 = c(0, 2, 3, 3, 0, 0.1, 0, 0.1), PC2 = c(0, 0, 0, 0, 0, 0, 0, 0),
  Treatment = rep(c("Mover", "Still"), each = 4),
  Timepoint = rep(c("baseline", "30min", "1h", "2h"), 2))
nova_describe(nova_trajectory_summary(df, group_var = "Treatment", verbose = FALSE))

Extract ordered state-space trajectories from a PCA (or embedding) result

Description

Converts a NOVA PCA result (or any data frame carrying embedding coordinates, a timepoint column, and a grouping column) into a tidy table of ordered trajectories suitable for every nova_dynamics analysis. Each trajectory is one path through state space: replicate observations are averaged within each (group [, unit], timepoint), then ordered by parsed real time with baseline first.

Usage

nova_extract_trajectories(
  pca_results,
  dims = c("PC1", "PC2"),
  group_var = NULL,
  unit_var = NULL,
  timepoint_var = "Timepoint",
  timepoint_order = NULL
)
nova_extract_trajectories(
  pca_results,
  dims = c("PC1", "PC2"),
  group_var = NULL,
  unit_var = NULL,
  timepoint_var = "Timepoint",
  timepoint_order = NULL
)

Arguments

pca_results

Either the list returned by pca_analysis_enhanced() (uses its $plot_data) or a data frame with the same columns.

dims

Character vector of embedding coordinate columns (default c("PC1","PC2")). Length >= 2.

group_var

Grouping column defining distinct trajectories (e.g. "Treatment"). Auto-detected if NULL.

unit_var

Optional replicate-unit column(s) (e.g. "Well", or c("Experiment", "Well")). If NULL, one mean trajectory per group is returned; if supplied, one trajectory per (group, unit). Pass every column needed to identify a unit: well IDs repeat across plates, so "Well" alone merges the same well from different experiments into one replicate.

timepoint_var

Timepoint column (auto-detected among common names).

timepoint_order

Optional explicit ordering; otherwise computed by nova_order_timepoints().

Value

A tibble (class nova_trajectories) with columns traj_id, group, [unit], time_label, time_rank, time_numeric, the requested dims, and n_obs; carrying attributes dims, group_var, unit_var, timepoint_order, and variance_explained (when available).

Examples

df <- data.frame(
  PC1 = rnorm(12), PC2 = rnorm(12),
  Treatment = rep(c("A", "B"), each = 6),
  Timepoint = rep(c("baseline", "30min", "1h"), 4)
)
tr <- nova_extract_trajectories(df, group_var = "Treatment")
df <- data.frame(
  PC1 = rnorm(12), PC2 = rnorm(12),
  Treatment = rep(c("A", "B"), each = 6),
  Timepoint = rep(c("baseline", "30min", "1h"), 4)
)
tr <- nova_extract_trajectories(df, group_var = "Treatment")

Order timepoint labels into a biologically correct sequence

Description

Produces an ordering in which baseline-like labels always come first, the remaining labels are sorted by parsed real time (minutes), and any genuinely unparseable labels are appended alphabetically. This is the canonical replacement for hardcoded timepoint lists: it correctly orders compound labels such as "1h15" / "1h30" / "1h45" that naive alphabetical sorting mis-ranks.

Usage

nova_order_timepoints(timepoints, baseline_first = TRUE)
nova_order_timepoints(timepoints, baseline_first = TRUE)

Arguments

timepoints

Character/factor vector of timepoint labels (duplicates allowed).

baseline_first

Logical; force baseline-like labels to the front (default TRUE).

Value

Character vector of unique labels in dynamical order.

Examples

nova_order_timepoints(c("1h30", "baseline", "15min", "1h", "2h", "0min", "1h15"))
nova_order_timepoints(c("1h30", "baseline", "15min", "1h", "2h", "0min", "1h15"))

NOVA qualitative / sequential colour palette

Description

Returns colours consistent with the existing NOVA trajectory palette (a Paired-style qualitative ramp) or a viridis sequential ramp, so dynamics figures match the rest of the package.

Usage

nova_palette(n, type = c("qual", "seq"))
nova_palette(n, type = c("qual", "seq"))

Arguments

n

Number of colours required.

type

"qual" (categorical groups) or "seq" (continuous).

Value

Character vector of n hex colours.

Examples

nova_palette(4)
nova_palette(7, type = "seq")
nova_palette(4)
nova_palette(7, type = "seq")

NOVA dynamics ggplot2 theme

Description

A polished, publication-oriented theme that extends the package's existing theme_minimal aesthetic with a light panel border, bold titles, muted gridlines, and a left-aligned caption slot for method annotations.

Usage

nova_theme(base_size = 12, base_family = "")
nova_theme(base_size = 12, base_family = "")

Arguments

base_size

Base font size (default 12).

base_family

Base font family (default "").

Value

A ggplot2 theme object.

Examples

library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) + geom_point() + nova_theme()
library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) + geom_point() + nova_theme()

Parse timepoint labels to numeric minutes

Description

Vectorised, dependency-free parser that converts heterogeneous MEA timepoint labels to a common numeric scale (minutes). Recognises baseline-like labels (returned as NA), compound shorthands ("1h30", "1h30min"), minutes/hours/seconds/days/weeks, days-in-vitro ("DIV7"), and bare numerics (assumed minutes).

Usage

nova_time_to_minutes(x)
nova_time_to_minutes(x)

Arguments

x

Character (or factor) vector of timepoint labels.

Value

Numeric vector of minutes; NA for baseline-like or unparseable labels.

Examples

nova_time_to_minutes(c("baseline", "0min", "1h", "1h30", "90min", "DIV7"))
nova_time_to_minutes(c("baseline", "0min", "1h", "1h30", "90min", "DIV7"))

Summarise how conditions move through state space

Description

A compact, descriptive summary of each group's trajectory through an embedding (PCA / UMAP / latent) relative to its baseline. It reports how far each condition moved, whether it moved directly or wandered, and when its displacement peaked – and draws the two figures that this kind of data actually supports: distance-from-baseline over time (with replicate error bands) and a state-space trajectory map. It deliberately does *not* compute velocities, regimes, or transition models, which require far richer time series.

Usage

nova_trajectory_summary(
  x,
  dims = c("PC1", "PC2"),
  group_var = NULL,
  unit_var = NULL,
  timepoint_var = "Timepoint",
  timepoint_order = NULL,
  verbose = TRUE
)
nova_trajectory_summary(
  x,
  dims = c("PC1", "PC2"),
  group_var = NULL,
  unit_var = NULL,
  timepoint_var = "Timepoint",
  timepoint_order = NULL,
  verbose = TRUE
)

Arguments

x

A pca_analysis_enhanced() result, a data frame with embedding + timepoint + grouping columns, or a nova_trajectories object.

dims

Embedding columns (default c("PC1","PC2")); the first two are plotted, all are used for distances.

group_var

Grouping column (auto-detected if NULL).

unit_var

Replicate column(s) for the error bands. Auto-detected as c("Experiment", "Well") when both are present, since well IDs repeat across plates; set NULL to disable the bands.

timepoint_var, timepoint_order

Timepoint column / explicit order (otherwise nova_order_timepoints(), baseline first).

verbose

Logical.

Value

An object of class nova_trajectory_summary with: metrics (per group: net displacement, path length, directness = net/path, peak timepoint, peak/final displacement), displacement (per group x timepoint mean +/- SEM), trajectories (group-mean paths), and plots (displacement, map).

Examples

df <- data.frame(
  PC1 = c(0, 1, 2, 3, 0, 1, 0, 1),
  PC2 = c(0, 0, 0, 0, 0, 1, 0, 1),
  Treatment = rep(c("Direct", "Wander"), each = 4),
  Well = rep(c("W1", "W1", "W2", "W2"), 2),
  Timepoint = rep(c("baseline", "30min", "1h", "2h"), 2)
)
s <- nova_trajectory_summary(df, group_var = "Treatment", verbose = FALSE)
s$metrics
df <- data.frame(
  PC1 = c(0, 1, 2, 3, 0, 1, 0, 1),
  PC2 = c(0, 0, 0, 0, 0, 1, 0, 1),
  Treatment = rep(c("Direct", "Wander"), each = 4),
  Well = rep(c("W1", "W1", "W2", "W2"), 2),
  Timepoint = rep(c("baseline", "30min", "1h", "2h"), 2)
)
s <- nova_trajectory_summary(df, group_var = "Treatment", verbose = FALSE)
s$metrics

Which columns identify one replicate well

Description

Plates reuse well IDs — '"A1"' exists on every plate — so a well is only identified once its 'Experiment' is known. Counting or grouping on 'Well' alone silently merges the same well ID from different plates into one replicate, which understates replication and inflates precision.

Usage

nova_unit_cols(data, warn = TRUE)
nova_unit_cols(data, warn = TRUE)

Arguments

data

A data frame carrying MEA data.

warn

Logical. Warn when 'Well' is present but 'Experiment' is not, so the identity silently narrows to a column that does not identify a well (default 'TRUE'). Set 'FALSE' for genuinely single-experiment data.

Details

Ask this rather than re-deriving it. Every version of this bug in NOVA's history came from a function deciding for itself what identified a well, and each one decided differently.

Value

The identity columns present in 'data', most significant first; 'character(0)' if none are.

Examples

d <- data.frame(Experiment = c("MEA1", "MEA2"), Well = c("A1", "A1"))
nova_unit_cols(d)
# Well alone would count these two distinct wells as one:
length(unique(d$Well))
length(unique(nova_unit_id(d)))
d <- data.frame(Experiment = c("MEA1", "MEA2"), Well = c("A1", "A1"))
nova_unit_cols(d)
# Well alone would count these two distinct wells as one:
length(unique(d$Well))
length(unique(nova_unit_id(d)))

Collapse well identity into a single label

Description

Use for display, row names and counting. Where a grouping has to be exact, group on the columns themselves ([nova_unit_cols()]) rather than on this string: two identity columns pasted together can in principle collide if the values contain the separator.

Usage

nova_unit_id(data, cols = NULL)
nova_unit_id(data, cols = NULL)

Arguments

data

A data frame carrying MEA data.

cols

Identity columns; defaults to [nova_unit_cols()].

Value

Character vector of one label per row, or 'NULL' if 'data' carries no identity columns.

Examples

d <- data.frame(Experiment = c("MEA1", "MEA2"), Well = c("A1", "A1"))
nova_unit_id(d)
d <- data.frame(Experiment = c("MEA1", "MEA2"), Well = c("A1", "A1"))
nova_unit_id(d)

Enhanced PCA Analysis for MEA Data

Description

This function performs Principal Component Analysis (PCA) on MEA data with extensive flexibility for data input sources, parameter configuration, and output options. It handles missing values, applies variance filtering, creates visualization plots, and provides comprehensive results suitable for downstream analysis.

Usage

pca_analysis_enhanced(
  normalized_data = NULL,
  data_path = NULL,
  config = NULL,
  processing_result = NULL,
  min_var = NULL,
  impute = NULL,
  scale_data = NULL,
  n_components = NULL,
  variance_cutoff = NULL,
  grouping_variables = NULL,
  sample_id_components = NULL,
  value_column = "Normalized_Value",
  variable_column = "Variable",
  timepoint_column = "Timepoint",
  output_path = NULL,
  verbose = TRUE
)
pca_analysis_enhanced(
  normalized_data = NULL,
  data_path = NULL,
  config = NULL,
  processing_result = NULL,
  min_var = NULL,
  impute = NULL,
  scale_data = NULL,
  n_components = NULL,
  variance_cutoff = NULL,
  grouping_variables = NULL,
  sample_id_components = NULL,
  value_column = "Normalized_Value",
  variable_column = "Variable",
  timepoint_column = "Timepoint",
  output_path = NULL,
  verbose = TRUE
)

Arguments

normalized_data

Data.frame. Pre-loaded MEA data in long format (default: NULL)

data_path

Character. Path to Excel file containing MEA data (default: NULL)

config

List. Configuration object with analysis parameters (default: NULL)

processing_result

List. Output from process_mea_flexible function (default: NULL)

min_var

Numeric. Minimum variance threshold for variable inclusion (default: 0.01)

impute

Logical. Whether to impute missing values (default: TRUE)

scale_data

Logical. Whether to scale variables before PCA (default: TRUE)

n_components

Integer. Number of principal components to extract (default: 2)

variance_cutoff

Numeric. Cumulative variance percentage threshold (default: 70)

grouping_variables

Character vector. Variables for sample grouping (default: c("Treatment", "Genotype"))

sample_id_components

Character vector. Variables to create unique sample IDs (default: c("Experiment", "Well", "Timepoint", "Treatment", "Genotype")). These must jointly identify one observation: a column that varies within a Sample but is missing here causes distinct observations to be silently averaged together. Components present in the data are also carried through to plot_data, keeping replicate structure available downstream.

value_column

Character. Name of column containing values for PCA (default: "Normalized_Value")

variable_column

Character. Name of column containing variable names (default: "Variable")

timepoint_column

Character. Name of column containing timepoint information (default: "Timepoint")

output_path

Character. Optional path to save elbow plot (default: NULL, no file saved)

verbose

Logical. Whether to print detailed progress messages (default: TRUE)

Details

The function provides three flexible data input methods: 1. **processing_result**: Direct output from process_mea_flexible function 2. **data_path**: Path to Excel file with normalized_data sheet 3. **normalized_data**: Pre-loaded data frame in long format

Data processing includes: - Automatic detection of available columns - Flexible sample ID creation from specified components - Missing value imputation (mean, median, or zero) - Variance-based variable filtering - Automatic scaling option - Creation of elbow plot for component selection

The function handles common MEA data challenges: - Missing timepoint or treatment information - Inconsistent column naming - Mixed data types and missing values - Variable numbers of experiments and conditions

Method 1: Use output from MEA processing function process_mea_flexible("/path/to/data", baseline_timepoint = "baseline") pca_analysis_enhanced(processing_result = mea_result)

Method 2: Load from saved Excel file pca_analysis_enhanced(data_path = "/path/to/processed_data.xlsx")

Method 3: Use pre-loaded data with custom parameters normalized_data = my_data

Value

A list containing: - pca_result: Complete prcomp() object with PCA results - plot_data: Data frame ready for plotting with PC scores and metadata - variance_explained: Vector of variance explained by each component - cumulative_variance: Vector of cumulative variance explained - elbow_plot: ggplot2 object showing variance explained by components - elbow_data: Data frame underlying the elbow plot - components_needed: Number of components needed for various variance thresholds - count_summary: Summary of sample counts by groups (if applicable) - data_info: Information about data processing steps - config_used: Configuration parameters actually used - processing_source: Source of input data ("processing_result", "excel_file", or "direct_data")

Enhanced PCA Plotting for Neural and Omics Data

Description

Creates publication-ready PCA plots with scientific color palettes, flexible aesthetic mapping, and multiple visualization options. Designed specifically for neural activity and omics datasets with support for complex experimental designs including treatments, genotypes, and timepoints.

Usage

pca_plots_enhanced(
  pca_output = NULL,
  plot_data = NULL,
  pca_result = NULL,
  output_dir = NULL,
  processing_result = NULL,
  experiment_name = NULL,
  grouping_variables = NULL,
  color_variable = "Treatment",
  shape_variable = "Genotype",
  secondary_shape_variable = "Timepoint",
  pannels_var = NULL,
  components = c(1, 2),
  gray_color_value = NULL,
  save_plots = FALSE,
  plot_width = 12,
  plot_height = 10,
  dpi = 300,
  verbose = TRUE
)
pca_plots_enhanced(
  pca_output = NULL,
  plot_data = NULL,
  pca_result = NULL,
  output_dir = NULL,
  processing_result = NULL,
  experiment_name = NULL,
  grouping_variables = NULL,
  color_variable = "Treatment",
  shape_variable = "Genotype",
  secondary_shape_variable = "Timepoint",
  pannels_var = NULL,
  components = c(1, 2),
  gray_color_value = NULL,
  save_plots = FALSE,
  plot_width = 12,
  plot_height = 10,
  dpi = 300,
  verbose = TRUE
)

Arguments

pca_output

List. Complete PCA output object from pca_analysis_enhanced() (optional)

plot_data

Data.frame. Data containing PC coordinates and metadata variables

pca_result

List. PCA result object (e.g., from prcomp() or princomp())

output_dir

Character. Directory path for saving plots (default: NULL, no files saved)

processing_result

List. Result object from process_mea_flexible() (optional)

experiment_name

Character. Name for the experiment (used in titles and filenames)

grouping_variables

Character vector. Available metadata variables for plotting (default: c("Treatment", "Genotype", "Timepoint"))

color_variable

Character. Variable name for color aesthetic (default: "Treatment")

shape_variable

Character. Variable name for shape aesthetic (default: "Genotype")

secondary_shape_variable

Character. Alternative shape variable (default: "Timepoint")

pannels_var

Character. Variable for panel faceting (default: NULL)

components

Numeric vector. PC components to plot (default: c(1, 2))

gray_color_value

Character. Specific value of color_variable to display in gray (default: NULL)

save_plots

Logical. Whether to save plots to files (default: FALSE)

plot_width

Numeric. Plot width in inches (default: 12)

plot_height

Numeric. Plot height in inches (default: 10)

dpi

Numeric. Plot resolution (default: 300)

verbose

Logical. Whether to print progress messages (default: TRUE)

Details

The function creates up to 5 different plot variants. Files are only saved when save_plots = TRUE AND output_dir is explicitly provided.

Value

A list containing:

plots: Named list of ggplot objects for each plot type
plot_data: Data.frame with plotting data and metadata
variance_explained: Numeric vector of variance explained by each component
components_plotted: Numeric vector of components used in plots
color_palette: Named character vector of colors used
shape_palette: Named numeric vector of shapes used
plotting_config: List of configuration parameters used
saved_files: Character vector of saved file paths (if save_plots = TRUE)

Plot a Single MEA Metric Across Conditions

Description

Creates a bar (mean + error), box, violin, or line plot for one measured variable from processed MEA data.

Usage

plot_mea_metric(
  data,
  metric,
  x_var = "Timepoint",
  group_by = "Treatment",
  facet_by = NULL,
  filter_timepoints = NULL,
  filter_treatments = NULL,
  filter_genotypes = NULL,
  value_column = NULL,
  error_type = c("sem", "sd", "ci95"),
  plot_type = c("bar", "box", "violin", "line"),
  colors = NULL,
  show_points = TRUE,
  point_alpha = 0.6,
  title = NULL
)
plot_mea_metric(
  data,
  metric,
  x_var = "Timepoint",
  group_by = "Treatment",
  facet_by = NULL,
  filter_timepoints = NULL,
  filter_treatments = NULL,
  filter_genotypes = NULL,
  value_column = NULL,
  error_type = c("sem", "sd", "ci95"),
  plot_type = c("bar", "box", "violin", "line"),
  colors = NULL,
  show_points = TRUE,
  point_alpha = 0.6,
  title = NULL
)

Arguments

data

Data frame - long-format MEA data (must contain 'Variable' column).

metric

Character. Exact name of the variable to plot.

x_var

Character. Column to use as the x-axis (default "Timepoint").

group_by

Character. Column to use for fill/colour grouping (default "Treatment").

Character or NULL. Column name for faceting. NULL = no facets.

filter_timepoints

Character vector or NULL. Subset to these timepoints.

filter_treatments

Character vector or NULL. Subset to these treatments.

filter_genotypes

Character vector or NULL. Subset to these genotypes.

value_column

Character. Which column holds the numeric values. Defaults to "Normalized_Value" if present, else "Value".

error_type

Character. "sem" (default), "sd", or "ci95".

plot_type

Character. "bar" (default), "box", "violin", or "line".

colors

Named character vector of colours, or NULL for ggplot2 defaults.

show_points

Logical. Overlay individual data points (default TRUE).

point_alpha

Numeric. Transparency of data points (default 0.6).

title

Character or NULL. Plot title. NULL = metric name.

Value

A ggplot object.

Examples

## Not run: 
plot_mea_metric(processed$all_data, "Mean Firing Rate (Hz)")
plot_mea_metric(processed$all_data, "Burst Rate (Hz)",
                plot_type = "violin", facet_by = "Genotype")

## End(Not run)

## Not run: 
plot_mea_metric(processed$all_data, "Mean Firing Rate (Hz)")
plot_mea_metric(processed$all_data, "Burst Rate (Hz)",
                plot_type = "violin", facet_by = "Genotype")

## End(Not run)

Plot PCA Trajectories for Time Series Data

Description

This function creates comprehensive visualizations of PCA trajectories over time, showing both individual and group-averaged trajectories with optional smoothing.

Usage

plot_pca_trajectories_general(
  pca_results,
  pc_x = "PC1",
  pc_y = "PC2",
  trajectory_grouping = NULL,
  timepoint_var = "Timepoint",
  timepoint_order = NULL,
  individual_var = "Experiment",
  point_size = 3,
  alpha = 0.7,
  line_size = 2,
  smooth_lines = FALSE,
  color_palette = NULL,
  color_by = "group",
  save_plots = FALSE,
  output_dir = NULL,
  plot_prefix = "PCA_trajectories",
  width = 12,
  height = 8,
  dpi = 150,
  return_list = TRUE,
  verbose = TRUE
)
plot_pca_trajectories_general(
  pca_results,
  pc_x = "PC1",
  pc_y = "PC2",
  trajectory_grouping = NULL,
  timepoint_var = "Timepoint",
  timepoint_order = NULL,
  individual_var = "Experiment",
  point_size = 3,
  alpha = 0.7,
  line_size = 2,
  smooth_lines = FALSE,
  color_palette = NULL,
  color_by = "group",
  save_plots = FALSE,
  output_dir = NULL,
  plot_prefix = "PCA_trajectories",
  width = 12,
  height = 8,
  dpi = 150,
  return_list = TRUE,
  verbose = TRUE
)

Arguments

pca_results

A data frame or list containing PCA results

pc_x

Character string specifying the principal component for x-axis (default: "PC1")

pc_y

Character string specifying the principal component for y-axis (default: "PC2")

trajectory_grouping

Character vector of column names for grouping trajectories

timepoint_var

Character string specifying the timepoint column (default: "Timepoint")

timepoint_order

Character vector specifying the order of timepoints

individual_var

Character string for individual trajectory identification (default: "Experiment")

point_size

Numeric value controlling point size (default: 3)

alpha

Numeric value controlling transparency (default: 0.7)

line_size

Numeric value controlling line thickness (default: 2)

smooth_lines

Logical indicating whether to apply smoothing (default: FALSE)

color_palette

Character vector of colors for groups

color_by

Character string controlling colour mapping. Use "group" (default) to colour by the full trajectory_grouping combination, or "Treatment" to colour by Treatment only with Genotype labels shown at each trajectory's end point via ggrepel.

save_plots

Logical indicating whether to save plots (default: FALSE)

output_dir

Character string specifying output directory (default: NULL)

plot_prefix

Character string prefix for filenames (default: "PCA_trajectories")

width

Numeric plot width in inches (default: 12)

height

Numeric plot height in inches (default: 8)

dpi

Numeric plot resolution (default: 150)

return_list

Logical indicating whether to return results as list (default: TRUE)

verbose

Logical indicating whether to print messages (default: TRUE)

Value

A list containing plots, trajectories, and metadata

Process MEA Data Flexibly

Description

This function processes Multi-Electrode Array (MEA) data files by reading CSV files, extracting measurements and metadata, applying filters, and optionally normalizing to baseline conditions. It automatically excludes standard deviation variables and handles exclusion flags to produce clean, analysis-ready datasets.

Usage

process_mea_flexible(
  main_dir,
  selected_experiments = NULL,
  selected_timepoints = NULL,
  grouping_variables = c("Treatment", "Genotype"),
  baseline_timepoint = NULL,
  unique_id_vars = c("Well", "Variable"),
  exclude_std_variables = TRUE,
  experiment_pattern = "MEA\\d+",
  timepoint_fusions = NULL,
  verbose = TRUE,
  output_path = NULL
)
process_mea_flexible(
  main_dir,
  selected_experiments = NULL,
  selected_timepoints = NULL,
  grouping_variables = c("Treatment", "Genotype"),
  baseline_timepoint = NULL,
  unique_id_vars = c("Well", "Variable"),
  exclude_std_variables = TRUE,
  experiment_pattern = "MEA\\d+",
  timepoint_fusions = NULL,
  verbose = TRUE,
  output_path = NULL
)

Arguments

main_dir

Character. Path to the main directory containing experiment folders

selected_experiments

Character vector. Experiment names to process (default: NULL = all)

selected_timepoints

Character vector. Timepoints to include (default: NULL = all)

grouping_variables

Character vector. Metadata columns to include ("Treatment", "Genotype")

baseline_timepoint

Character. Timepoint to use for normalization (default: NULL = no normalization)

unique_id_vars

Character vector. Variables that uniquely identify observations for normalization. grouping_variables and Experiment are added automatically, so the default is correct for multi-plate datasets; supply extra columns only if a single plate still contains more than one baseline per well/variable.

exclude_std_variables

Logical. Whether to automatically exclude standard deviation variables (default: TRUE)

experiment_pattern

Character. Regex pattern for experiment directories (default: "MEA\d+")

timepoint_fusions

Timepoint fusions to generate

verbose

Logical. Whether to print progress messages (default: TRUE)

output_path

Character. Optional path for output file (default: NULL saves to main_dir with auto-generated name)

Details

The function automatically detects and excludes variables containing "Std", "std", or "STD" in their names (e.g., "Number of Spikes - Std") while keeping average/mean variables (e.g., "Number of Spikes - Avg"). Wells marked with "Ex" or "ex" in row 124 are excluded.

By default, no files are written. To save output, provide an explicit output_path parameter. Normalization creates fold-change values relative to baseline timepoint.

Baseline matching is keyed on unique_id_vars plus grouping_variables plus Experiment, so each well is normalised to the baseline of its own plate. The key must identify exactly one baseline row; if it does not, the function warns rather than silently duplicating rows. Fold-changes are ratios, so they are asymmetric (halving = 0.5, doubling = 2.0) and undefined against a zero baseline, which yields NA.

Process data without saving (returns data frames only) Save output by providing explicit path

Value

A list containing: - raw_data: Processed data in long format - normalized_data: Baseline-normalized data (if baseline_timepoint specified) - processing_params: List of parameters used for processing - output_path: Path to saved Excel file (only if output_path was provided) - experiment_name: Combined experiment identifier

Process an already-tidy MEA metrics table

Description

Maps a published or hand-assembled metrics table onto NOVA's processed schema, so that data which never came from an Axion CSV export can still be fed to pca_analysis_enhanced, nova_trajectory_summary, create_mea_heatmaps_enhanced and plot_mea_metric.

Usage

process_mea_table(
  data,
  experiment,
  well,
  timepoint,
  treatment,
  genotype = NULL,
  metrics = NULL,
  variable_column = NULL,
  value_column = NULL,
  exclude_metrics = NULL,
  metric_labels = NULL,
  timepoint_prefix = NULL,
  normalize = c("none", "baseline", "control"),
  baseline_timepoint = NULL,
  control = NULL,
  control_within = NULL,
  verbose = TRUE
)
process_mea_table(
  data,
  experiment,
  well,
  timepoint,
  treatment,
  genotype = NULL,
  metrics = NULL,
  variable_column = NULL,
  value_column = NULL,
  exclude_metrics = NULL,
  metric_labels = NULL,
  timepoint_prefix = NULL,
  normalize = c("none", "baseline", "control"),
  baseline_timepoint = NULL,
  control = NULL,
  control_within = NULL,
  verbose = TRUE
)

Arguments

data

A data frame.

experiment

Character vector of column(s) identifying the experiment / plate / culture. Pasted together when more than one. See the section above — this is the argument most often got wrong.

well

Column naming the well.

timepoint

Column naming the timepoint. Values are parsed by nova_time_to_minutes, which understands "DIV7", "1h30", "90min" and bare numbers, so "DIV" columns work directly.

treatment

Column naming the condition/group.

genotype

Optional column naming the genotype.

metrics

Character vector of metric columns (wide input). Leave NULL for long input and supply variable_column/value_column instead.

variable_column, value_column

Metric-name and value columns (long input).

exclude_metrics

Metric columns to drop, e.g. ones the publisher states were not used.

metric_labels

Optional named character vector renaming metrics for display, c(meanfiringrate = "Mean Firing Rate (Hz)").

timepoint_prefix

Optional prefix for bare numeric timepoints, e.g. "DIV" turns 5 into "DIV5".

normalize

One of "none", "baseline", "control".

baseline_timepoint

Timepoint label used when normalize = "baseline". Defaults to the first in nova_order_timepoints order.

control

Logical vector, one per row of data, marking control wells. Required when normalize = "control". Given explicitly rather than guessed: e.g. control = df$dose == 0.

control_within

Optional extra columns a control must match before it can serve as a reference, e.g. "Compound" to use each compound's own vehicle wells rather than every control on the plate. Fewer, better-matched controls versus more, more stable ones — a real trade-off, so it is yours to make.

verbose

Logical.

Details

Accepts wide input (one row per well x timepoint, metrics in columns) or long input (one row per well x timepoint x metric). Optionally normalises, either to each well's own baseline timepoint or to control wells measured alongside it.

Value

A list shaped like process_mea_flexible's: raw_data, normalized_data (NULL when normalize = "none"), processing_params, experiment_name. When normalize = "control", normalized_data also carries Control_Value and n_control_wells, so the divisor can be inspected rather than taken on trust.

Experiment identity

experiment may name several columns, and often must. Well IDs repeat on every plate, so a well is only identified once its experiment is known — and the experiment itself is not always one column. In the EPA DNT dataset, plate serial numbers are reused across culture dates, so identity is c("date", "Plate.SN"): keying on the serial alone merges two different cultures into one well. Pass every column needed; they are pasted into a single Experiment.

Normalisation

"none": Leave values as they are. Normalized_Value is not created.
"baseline": Each well against its own value at baseline_timepoint, keyed on experiment + well + metric + grouping. A fold-change over time.
"control": Each well against the mean of the control wells in the same experiment at the same timepoint — the toxicology convention, and the only option when the earliest timepoint is not a usable reference (e.g. a developmental assay where every well is silent at the first timepoint, so the ratio is undefined).

Both are ratios: asymmetric (halving is 0.5, doubling is 2.0) and undefined against a zero divisor, which yields NA rather than Inf.

Examples

# A wide published table: one row per well x timepoint, metrics in columns.
# Note plate P2 reads twice P1 throughout -- a plate effect, which is exactly
# what normalisation has to respect rather than smear across plates.
df <- expand.grid(well = c("A1", "A2"), plate = c("P1", "P2"), div = c(5, 7),
                  stringsAsFactors = FALSE)
df$drug        <- ifelse(df$well == "A1", "ctrl", "cpd")
df$firing_rate <- c(1, 2, 2, 4, 1.5, 3, 3, 6)
df$n_bursts    <- c(2, 5, 4, 10, 3, 6, 6, 12)

# Each well against its own DIV5 reading.
res <- process_mea_table(
  df, experiment = "plate", well = "well", timepoint = "div",
  treatment = "drug", metrics = c("firing_rate", "n_bursts"),
  timepoint_prefix = "DIV", normalize = "baseline", verbose = FALSE
)
head(res$normalized_data)

# Or against the control wells on the same plate at the same timepoint.
res2 <- process_mea_table(
  df, experiment = "plate", well = "well", timepoint = "div",
  treatment = "drug", metrics = c("firing_rate", "n_bursts"),
  timepoint_prefix = "DIV", normalize = "control",
  control = df$drug == "ctrl", verbose = FALSE
)
head(res2$normalized_data)

# Plate serials reused across culture dates? Identity is both columns.
# process_mea_table(df, experiment = c("date", "plate"), ...)
# A wide published table: one row per well x timepoint, metrics in columns.
# Note plate P2 reads twice P1 throughout -- a plate effect, which is exactly
# what normalisation has to respect rather than smear across plates.
df <- expand.grid(well = c("A1", "A2"), plate = c("P1", "P2"), div = c(5, 7),
                  stringsAsFactors = FALSE)
df$drug        <- ifelse(df$well == "A1", "ctrl", "cpd")
df$firing_rate <- c(1, 2, 2, 4, 1.5, 3, 3, 6)
df$n_bursts    <- c(2, 5, 4, 10, 3, 6, 6, 12)

# Each well against its own DIV5 reading.
res <- process_mea_table(
  df, experiment = "plate", well = "well", timepoint = "div",
  treatment = "drug", metrics = c("firing_rate", "n_bursts"),
  timepoint_prefix = "DIV", normalize = "baseline", verbose = FALSE
)
head(res$normalized_data)

# Or against the control wells on the same plate at the same timepoint.
res2 <- process_mea_table(
  df, experiment = "plate", well = "well", timepoint = "div",
  treatment = "drug", metrics = c("firing_rate", "n_bursts"),
  timepoint_prefix = "DIV", normalize = "control",
  control = df$drug == "ctrl", verbose = FALSE
)
head(res2$normalized_data)

# Plate serials reused across culture dates? Identity is both columns.
# process_mea_table(df, experiment = c("date", "plate"), ...)

Package 'NOVA'

Help Index

Analyze and Visualize PCA Variable Importance

Description

Usage

Arguments

Details

Value

Output Files

See Also

Create Enhanced Heatmaps for Multi-Electrode Array (MEA) Data Analysis

Description

Usage

Arguments

Details

Value

Discover MEA Data Structure

Description

Usage

Arguments

Details

Value

Plain-language interpretation of a trajectory summary

Description

Usage

Arguments

Value

Examples

Extract ordered state-space trajectories from a PCA (or embedding) result

Description

Usage

Arguments

Value

Examples

Order timepoint labels into a biologically correct sequence

Description

Usage

Arguments

Value

Examples

NOVA qualitative / sequential colour palette

Description

Usage

Arguments

Value

Examples

NOVA dynamics ggplot2 theme

Description

Usage

Arguments

Value

Examples

Parse timepoint labels to numeric minutes

Description

Usage

Arguments

Value

Examples

Summarise how conditions move through state space

Description

Usage

Arguments

Value

Examples

Which columns identify one replicate well

Description

Usage

Arguments

Details

Value

See Also

Examples

Collapse well identity into a single label

Description

Usage

Arguments

Value

See Also

Examples

Enhanced PCA Analysis for MEA Data