Chimeric Antigen Receptor T-cell (CAR-T) therapy represents a revolutionary approach to cancer treatment, where a patient’s T-cells are genetically modified to recognize and attack cancer cells. The success of CAR-T therapy critically depends on identifying suitable target antigens—proteins expressed on cancer cells that can serve as targets for the engineered T-cells.
This vignette provides a comprehensive workflow for evaluating potential CAR-T targets, guiding non-bioinformaticians through essential computational analyses. We demonstrate this workflow using ERBB2 (also known as HER2/neu) as an example, a receptor tyrosine kinase overexpressed in approximately 25-30% of breast cancers Slamon, D.J. et al. and an established target for monoclonal antibody therapy (trastuzumab) Yoon, J. et al..
This vignette is designed for clinicians and researchers without coding experience, utilizing web-based tools with graphical interfaces. While many tools necessarily operate through web servers, we recommend adopting computational approaches whenever possible for improved reproducibility, traceability, and reduced error rates. Collaborating with a bioinformatician can help implement more robust, automated workflows for comprehensive target assessment.
Understanding protein isoform diversity is crucial for CAR-T target assessment, as different isoforms may have altered membrane topology, expression patterns, or epitope accessibility. We begin by identifying all coding isoforms of the potential CAR-T target using Ensembl.
Step 1: Search for the gene
Step 2: Navigate to the gene page
Step 3: Access the transcript information
Step 4: Indentify coding isoforms
Ensembl transcripts table showing protein-coding isoforms with transcript IDs, protein lengths, and UniProt matches.
Step 5: Examine individual isoforms
Step 6: Export transcript data
Protein sequences are essential for downstream computational analyses. They can be found in UniProt, which provides high-quality, manually curated protein sequences, though not all Ensembl-predicted isoforms may be represented. UniProt applies rigorous curation standards requiring experimental evidence, while Ensembl includes all computationally predicted coding transcripts. This discrepancy means some predicted isoforms may lack UniProt entries.
To retrieve FASTA sequences of a gene and its protein-coding isoforms from UniProt:
Step 1: Search for the protein
Step 2: Identify the canonical entry
Step 3: Download FASTA sequences
UniProt download interface showing options for retrieving FASTA sequences of canonical and isoform variants.
Step 4: Individual isoform access
Subcellular localization prediction is fundamental for CAR-T target validation. Targets must localize to the cell membrane to be accessible for CAR recognition. DeepLoc2.1 provides state-of-the-art localization and membrane association predictions.
DeepLoc2.1 uses transformer-based protein language models to predict where proteins localize within cells. The method analyzes amino acid sequences to identify sorting signals and structural features that determine cellular targeting. The deep learning model was trained on thousands of experimentally validated protein localizations and can predict multiple simultaneous localizations, reflecting the biological reality that some proteins function in multiple cellular compartments.
Step 1: Access the web server
Step 2: Input your protein sequence
Step 3: Submit the prediction
Step 4: Interpret the results
The results page contains several key sections for each analysed isoform. For comprehensive interpretation of predicted localizations, confidence scores, and attention plots, refer to the DeepLoc2.1 user manual.
Expression analysis across healthy and cancer tissues is crucial for assessing target specificity and potential off-target effects. The UCSC Xena Browser provides access to uniformly processed TCGA (cancer) and GTEx (normal tissue) expression data.
Note: The Xena Browser interface can be challenging to navigate for new users. While the following workflow provides basic functionality, we strongly recommend utilizing computational approaches for more comprehensive and reliable expression analysis (refer to the coding vignette for advanced methods).
Step 1: Access the Xena Browser
Step 2: Select the TCGA TARGET GTEx cohort
This cohort contains uniformly processed samples from TCGA (cancer), TARGET (pediatric cancer), and GTEx (normal tissue).
Step 3: Add your gene of interest
Xena Browser interface showing gene selection and data type options.
Step 4: Add the phenotype categories
Step 5: Visualize and interpret the data
Box plot visualization comparing ERBB2 expression across cancer types and normal tissues.
Step 6: Download results
From here you can: - Export the visualization as images - Download the underlying data as TSV files for further analysis (top right of the page)
For isoform-specific expression analysis:
Transcript-level expression comparison showing isoform-specific patterns across cancer and normal tissues.
Membrane topology prediction is essential for understanding protein architecture and identifying accessible extracellular domains for CAR targeting. DeepTMHMM provides accurate predictions for both α-helical and β-barrel transmembrane proteins.
DeepTMHMM employs deep learning protein language models to predict transmembrane protein topology. The method analyzes amino acid sequences to identify hydrophobic transmembrane regions, signal peptides, and the orientation of protein domains relative to cellular membranes. The model integrates evolutionary information and physical properties of amino acids to achieve state-of-the-art accuracy in distinguishing between cytoplasmic, extracellular, and membrane-spanning regions.
Step 1: Access DeepTMHMM
Step 2: Prepare your protein sequences
Step 3: Submit the analysis
DeepTMHMM submission interface for protein topology prediction.
Step 4: Interpret the results
DeepTMHMM provides comprehensive topology predictions through several output sections. For detailed interpretation of topology classifications, coordinate systems, and confidence assessments, refer to the DeepTMHMM user manual.
Three-dimensional structural analysis provides crucial insights into epitope accessibility, surface exposure, and potential binding sites. We examine both experimental structures (PDB) and computational predictions (AlphaFold3).
Step 1: Access the RCSB PDB
Step 2: Search for your protein
PDB search results showing available experimental structures for ERBB2.
Note: PDB often lacks complete structures of potential CAR-T targets, particularly full-length membrane proteins. In such cases, AlphaFold3 (detailed below) provides comprehensive structural predictions. However, PDB is particularly valuable when it contains antibody-bound complexes, as these reveal clinically validated epitopes that could potentially be used in CAR-T therapy design.
Example PDB structure showing antibody-bound complex.
For isoforms or complete proteins lacking experimental structures, AlphaFold3 provides highly accurate computational predictions.
AlphaFold3 uses advanced deep learning architecture combining transformer-based language models with diffusion networks to predict protein structures. The method analyzes amino acid sequences and evolutionary relationships to predict how proteins fold into their three-dimensional shapes. AlphaFold3 can also model protein complexes with DNA, RNA, and small molecules, providing unprecedented capability for studying molecular interactions.
Step 1: Check existing predictions
AlphaFold database ERBB2 existing prediction.
Step 2: Submit new predictions (if needed)
The non-canonical isoforms of the potential CAR-T target may not be in the database. Hence, to get their 3D structure, one needs to predict it using Alphafold3. - Navigate to the AlphaFold server https://alphafoldserver.com/ - On the AlphaFold Server, create an account or log in - Input your protein sequence in FASTA format - For CAR-T analysis, you can predict: - Individual protein isoforms - Protein-protein complexes, for example the protein of interest bound to an antibody that could be used as part of the CAR on the T-cell - Click “Continue and preview job”
Step 5: Interpret AlphaFold3 results
For comprehensive understanding of confidence scores (pLDDT), predicted aligned error (PAE), and structural interpretation, refer to the AlphaFold documentation.
AlphaFold3 prediction results showing structure with confidence coloring and quality metrics.
Multiple sequence alignment (MSA) is a critical step that allows comparison of protein sequences across isoforms to understand structural and functional conservation. This analysis is essential for determining whether potential epitopes are maintained across different protein variants.
Multiple sequence alignment algorithms identify regions of similarity and difference between related protein sequences by optimally aligning amino acids based on evolutionary relationships and structural constraints. MSA reveals conserved domains, variable regions, and helps predict which areas of the protein are functionally important and structurally preserved across isoforms.
At this stage, MSA becomes crucial because you need to determine:
Recommended Tools: Use online MSA tools such as Clustal Omega or MUSCLE to align your protein isoforms. Analyze the alignment in conjunction with your topology predictions to identify the most suitable target variants.
For computational MSA approaches and advanced analysis methods, refer to the coding vignette.
Multiple sequence alignment of ERBB2 protein isoforms using MUSCLE.
The goal of this workflow is to verify whether a specific epitope represents a suitable CAR-T target based on the comprehensive analyses performed in previous steps. To conduct this validation, you first need to obtain the epitope sequence, which can be found through literature searches, patent databases, or other published sources. However, there is no standardized workflow for epitope identification, as the available information varies greatly depending on the target protein and existing research.
Once you have identified a potential epitope sequence, use the results from this workflow to assess its suitability: confirm the epitope is located in accessible extracellular domains (topology analysis), verify it is conserved across therapeutically relevant isoforms (sequence alignment), ensure it is expressed in target cancers while minimally present in healthy tissues (expression analysis), and validate its structural accessibility (3D structure analysis).