Chimeric Antigen Receptor T-cell (CAR-T) therapy represents a revolutionary approach to cancer treatment, where a patient’s T-cells are genetically modified to recognize and attack cancer cells. The success of CAR-T therapy critically depends on identifying suitable target antigens—proteins expressed on cancer cells that can serve as targets for the engineered T-cells.
This vignette provides a comprehensive workflow for evaluating potential CAR-T targets, guiding non-bioinformaticians through essential computational analyses. We demonstrate this workflow using ERBB2 (also known as HER2/neu) as an example, a receptor tyrosine kinase overexpressed in approximately 25-30% of breast cancers Slamon, D.J. et al. and an established target for monoclonal antibody therapy (trastuzumab) Yoon, J. et al..
This vignette is designed for clinicians and researchers without coding experience, utilizing web-based tools with graphical interfaces. While many tools necessarily operate through web servers, we recommend adopting computational approaches whenever possible for improved reproducibility, traceability, and reduced error rates. Collaborating with a bioinformatician can help implement more robust, automated workflows for comprehensive target assessment.
Understanding protein isoform diversity is crucial for CAR-T target assessment, as different isoforms may have altered membrane topology, expression patterns, or epitope accessibility. We begin by identifying all coding isoforms of the potential CAR-T target using Ensembl.
Step 1: Search for the gene
Step 2: Navigate to the gene page
Step 3: Access the transcript information
Step 4: Indentify coding isoforms
Ensembl transcripts table showing protein-coding isoforms with transcript IDs, protein lengths, and UniProt matches.
Step 5: Examine individual isoforms
Step 6: Export transcript data
Protein sequences are essential for downstream computational analyses. They can be found in UniProt, which provides high-quality, manually curated protein sequences, though not all Ensembl-predicted isoforms may be represented. UniProt applies rigorous curation standards requiring experimental evidence, while Ensembl includes all computationally predicted coding transcripts. This discrepancy means some predicted isoforms may lack UniProt entries.
To retrieve FASTA sequences of a gene and its protein-coding isoforms from UniProt:
Step 1: Search for the protein
Step 2: Identify the canonical entry
Step 3: Download FASTA sequences
UniProt download interface showing options for retrieving FASTA sequences of canonical and isoform variants.
Step 4: Individual isoform access
Subcellular localization prediction is fundamental for CAR-T target validation. Targets must localize to the cell membrane to be accessible for CAR recognition. DeepLoc2.1 provides state-of-the-art localization and membrane association predictions.
DeepLoc2.1 uses transformer-based protein language models to predict where proteins localize within cells. The method analyzes amino acid sequences to identify sorting signals and structural features that determine cellular targeting. The deep learning model was trained on thousands of experimentally validated protein localizations and can predict multiple simultaneous localizations, reflecting the biological reality that some proteins function in multiple cellular compartments.
Step 1: Access the web server
Step 2: Input your protein sequence
Step 3: Submit the prediction
Step 4: Interpret the results
The results page contains several key sections for each analysed isoform. For comprehensive interpretation of predicted localizations, confidence scores, and attention plots, refer to the DeepLoc2.1 user manual.
Expression analysis across healthy and cancer tissues is crucial for assessing target specificity and potential off-target effects. The UCSC Xena Browser provides access to uniformly processed TCGA (cancer) and GTEx (normal tissue) expression data.
Note: The Xena Browser interface can be challenging to navigate for new users. While the following workflow provides basic functionality, we strongly recommend utilizing computational approaches for more comprehensive and reliable expression analysis (refer to the coding vignette for advanced methods).
Step 1: Access the Xena Browser
Step 2: Select the TCGA TARGET GTEx cohort
This cohort contains uniformly processed samples from TCGA (cancer), TARGET (pediatric cancer), and GTEx (normal tissue).
Step 3: Add your gene of interest
Xena Browser interface showing gene selection and data type options.
Step 4: Add the phenotype categories
Step 5: Visualize and interpret the data
Box plot visualization comparing ERBB2 expression across cancer types and normal tissues.
Step 6: Download results
From here you can: - Export the visualization as images - Download the underlying data as TSV files for further analysis (top right of the page)
For isoform-specific expression analysis:
Transcript-level expression comparison showing isoform-specific patterns across cancer and normal tissues.
Membrane topology prediction is essential for understanding protein architecture and identifying accessible extracellular domains for CAR targeting. DeepTMHMM provides accurate predictions for both α-helical and β-barrel transmembrane proteins.
DeepTMHMM employs deep learning protein language models to predict transmembrane protein topology. The method analyzes amino acid sequences to identify hydrophobic transmembrane regions, signal peptides, and the orientation of protein domains relative to cellular membranes. The model integrates evolutionary information and physical properties of amino acids to achieve state-of-the-art accuracy in distinguishing between cytoplasmic, extracellular, and membrane-spanning regions.
Step 1: Access DeepTMHMM
Step 2: Prepare your protein sequences
Step 3: Submit the analysis
DeepTMHMM submission interface for protein topology prediction.
Step 4: Interpret the results
DeepTMHMM provides comprehensive topology predictions through several output sections. For detailed interpretation of topology classifications, coordinate systems, and confidence assessments, refer to the DeepTMHMM user manual.
Three-dimensional structural analysis provides crucial insights into epitope accessibility, surface exposure, and potential binding sites. We examine both experimental structures (PDB) and computational predictions (AlphaFold3).
Step 1: Access the RCSB PDB
Step 2: Search for your protein