Spatial Transcriptomics at Scale: How to Overcome the Top 5 Data Hurdles

Alper Kucukural, PhD
CTO, ViaScientific

How Via Foundry enables scalable, reproducible spatial transcriptomics research

For years, researchers have worked to uncover how genes are expressed in different tissues, but older techniques always came with trade-offs that hid part of the story. Bulk RNA sequencing, for instance, blends signals from countless cells, erasing the unique patterns that emerge from distinct neighborhoods within a tissue. Single-cell RNA sequencing does better at isolating individual cells, yet it still tears them from their natural environment, losing details about the spatial interactions that drive healthy development or disease progression. These constraints left scientists with big questions: How do cells truly collaborate or clash within tissues? Where do critical changes first arise, and how do they ripple outward to affect nearby cells?

Spatial transcriptomics finally delivers answers by adding a new dimension to gene expression analysis: location. In this approach, every RNA molecule is mapped to its precise coordinates within the tissue, creating an intricate, layer-by-layer portrait of how cells organize, communicate, and evolve. This richer view not only builds on the foundations set by bulk and single-cell sequencing but breaks through their limitations entirely. Researchers can now pinpoint which cells are active in which tissue regions, detect early shifts that signal disease, and design more targeted therapies that take into account the true complexity of living tissues.

Yet, despite its transformative potential (or perhaps precisely because of it) spatial transcriptomics introduces substantial computational, analytical, and organizational challenges. As the complexity and resolution of data skyrocket, researchers grapple with hurdles such as:

  1. Managing Massive Datasets
  2. Ensuring Reproducibility
  3. Correcting for Spatial Effects
  4. Integrating Multi-Modal Data
  5. Bridging Skills Gaps

Tackling these obstacles is essential to fully unlock the technology's promise. This article delves deeply into these critical issues and highlights why enterprise-scale platforms like Via Foundry are vital to making spatial transcriptomics both accessible and effective for researchers.

Challenge 1: Managing Massive Datasets

Spatial transcriptomics datasets can be 10 to 100 times larger than those from single-cell RNA sequencing (scRNA-seq), due to their integration of high-resolution imaging, spatial barcoding, and sequencing data. A single experiment can generate hundreds of gigabytes to multiple terabytes of data (depending on resolution and tissue size), comparable to processing entire human genomes dozens or even hundreds of times over.

This data intensity creates major computational challenges:

  • Higher-resolution spatial experiments demand considerable memory and processing power, often exceeding 128GB RAM and 32 CPU cores per sample, with processing times extending over several hours.
  • Standard desktop or laptop computers lack the necessary compute resources, making local analysis impractical for most researchers.

To overcome these limitations, researchers typically rely on institutional high-performance computing (HPC) clusters, cloud-based infrastructure, or local servers, each with trade-offs in scalability, accessibility, and computational efficiency.

Challenge 2: Ensuring Reproducibility

Reproducibility remains a significant challenge in spatial transcriptomics due to the diversity of platforms and computational workflows. The field encompasses multiple technologies, such as 10x Genomics Visium, Slide-seq, MERFISH, GeoMx, and Stereo-seq, each with distinct spatial resolutions, gene detection sensitivities, and technical biases.

Key factors contributing to reproducibility challenges include:

  • Platform variability: Differences in spatial resolution, gene detection sensitivity, and technical biases create inconsistencies between datasets.
  • Lack of standardized workflows/rapidly changing standards: Unlike scRNA-seq, spatial transcriptomics does not yet have universally accepted computational pipelines. Rapid evolution of analytical methods makes it challenging to reliably replicate analyses or compare findings over time.
  • Custom-built pipelines: Many researchers develop in-house workflows, often with minimal documentation, making results difficult to replicate.

Challenge 3: Correcting for Spatial Effects

Batch effects (technical variations between experiments) are a well-documented issue in bioinformatics, but spatial transcriptomics introduces additional complexity. Unlike standard batch correction methods, which operate at the gene level, spatial data requires adjustments that account for the physical positioning of cells and tissue architecture.

Key challenges include:

  • Limitations of standard batch correction: Traditional methods of batch correction are designed for bulk and single-cell RNA sequencing and do not account for spatial dependencies.
  • Environmental and technical variability: Small changes in staining protocols, imaging conditions, or sequencing depth can obscure biological signals.
  • Challenges in cross-experiment comparisons: If spatial effects are not properly addressed, integrating and comparing datasets becomes unreliable.

Challenge 4: Integrating Multi-Modal Data

Integrating spatial and single-cell transcriptomics enables deeper biological insights by combining the spatially resolved gene expression data from spatial transcriptomics with the higher-resolution cellular expression profiles from single-cell RNA sequencing. However, integrating these data types remains a significant computational and methodological challenge. These data types originate from distinct experimental approaches, each requiring specialized preprocessing and analytical pipelines.

Key challenges include:

  • Differences in data resolution: Spatial transcriptomics captures gene expression at a lower resolution than single-cell RNA sequencing.
  • Algorithmic adaptation: Many integration methods require modification to account for dataset-specific biases.
  • Computational constraints: Integrating high-dimensional spatial and single-cell datasets is resource-intensive.

Challenge 5: Bridging Skills Gaps

Spatial transcriptomics extends beyond traditional bioinformatics, requiring expertise in image processing alongside computational biology. Unlike standard RNA sequencing data, spatial datasets often involve complex imaging modalities that demand specialized analysis techniques and software.

This involves:

  • Image segmentation complexities: Defining cell boundaries within tissue remains an open problem with no universal solution.
  • Divergent analysis pipelines: Some spatial methods, such as FISH, rely more on imaging than sequencing, requiring fundamentally different computational approaches.
  • Storage and computational demands: Large-scale imaging datasets introduce additional challenges in data storage, retrieval, and processing.

Enterprise-Scale Solutions for Spatial Transcriptomics

Spatial transcriptomics offers unprecedented insight into gene expression and tissue architecture, but its widespread adoption is often hindered by the complexity and scale of the data it produces. These challenges exceed the limits of traditional bioinformatics tools and workflows. As datasets grow larger and analytical methods more sophisticated, the need for robust, scalable, and standardized infrastructure becomes increasingly clear. Supporting spatial transcriptomics at scale calls for a comprehensive solution to support research. 

The following outlines the most critical capabilities such a solution must provide to empower researchers and accelerate discovery.

Managing Massive Datasets with Scalable Infrastructure

Getting more value out of spatial transcriptomics means moving faster from raw data to biological insight without getting bogged down by compute bottlenecks. Via Foundry enables researchers to process terabyte-scale spatial datasets by orchestrating cloud and on-prem infrastructure, without the overhead of managing these systems. 

The result: accelerated discoveries, more reproducible results, and fewer delays. At Via, we’re committed to continuously evolving our platform to meet the growing demands of spatial research, so scientists can focus on breakthroughs, not infrastructure.

Ensuring Reproducibility with Standardized Yet Flexible Workflows

Via Foundry addresses reproducibility issues by:

  • Offering pre-configured workflows for leading technologies (10x Genomics Visium, MERFISH, Slide-seq, GeoMx).
  • Ensuring version-controlled pipelines, ensuring transparent, reproducible analyses.
  • Enabling comparative benchmarking, allowing researchers to assess different analytical methods side by side.

These features ensure that results remain consistent across experiments, while still allowing researchers to adapt workflows to their specific needs.

Correcting for Spatial and Batch Effects

As previously discussed, data variability across experiments and within tissue samples can obscure biological signals. Via Foundry enhances dataset comparability by:

  • Providing normalization techniques that adjust for library size differences and spatial variability.
  • Implementing correction methods that reduce both batch-related and spatial artifacts introduced by differences in experimental conditions.
  • Automating quality control, detecting inconsistencies in sequencing depth, imaging conditions, and tissue-specific effects to improve dataset harmonization.

By standardizing preprocessing, researchers can be confident that biological variation (not technical noise) drives results.

Integrating Multi-Modal Data for a Unified View of Biology

Multi-modal integration is critical for understanding spatially resolved gene expression in the context of single-cell transcriptomics. Via Foundry simplifies this process by:

  • Aligning spatial and single-cell datasets, ensuring accurate transcriptomic representation.
  • Automating data harmonization, reducing preprocessing overhead and improving dataset compatibility.

With built-in support for multi-modal research, researchers can extract deeper biological insights across different data layers, all within a single computational environment.

A Developer and Biologist Friendly Research Environment

Spatial transcriptomics workflows demand expertise across both bioinformatics and imaging. Via Foundry bridges this skills gap with a research environment that is developer-friendly while remaining accessible to biologists, enabling:

  • A sandbox testing environment, where researchers can refine and compare pipelines before full-scale analysis.
  • Pre-optimized yet customizable workflows, balancing ease of use with advanced computational flexibility.
  • A Trusted Research Environment (TRE) for secure, collaborative research at scale.

With these capabilities, researchers can experiment, iterate, and scale their analyses without unnecessary technical constraints.

Conclusion

Broad adoption of spatial transcriptomics is essential, not only for refining the technology itself but also for unlocking scientific discoveries at a faster pace. When adoption lags, breakthroughs stall, delaying essential insights into disease dynamics, tissue organization, and new therapeutic strategies. As scalable computational resources, reproducible analytical workflows, and integrated multi-modal data frameworks become more broadly accessible, researchers are empowered to prioritize scientific inquiry and innovation over logistical hurdles and infrastructure complexities.

Platforms like Via Foundry accelerate this progress by eliminating technical barriers, empowering researchers to unlock novel insights, formulate groundbreaking hypotheses, and produce reproducible discoveries at unprecedented scale. By making spatial transcriptomics more accessible and scalable, Via Foundry equips researchers with the tools and freedom they need to drive the next wave of transformative biomedical innovation.

Let's Get Started

Foundry unlocks the power of multi-omics data so you can generate extraordinary scientific insights.