ASpedia Manual

  • Alternative Splicing Database Introduction    [ Top ]

    ASpedia is a comprehensive functional annotation database for human alternative splicing (AS) events (including gene model hg19 ENSEMBL, and RefSeq). ASpedia supports the major classes of AS described below.

    1. Skipped exons (SE)

    2. Alternative 5’ splicing site (A5SS)

    3. Alternative 3’ splicing site (A3SS)

    4. Mutually exclusive exons (MXE)

    5. Retailed intron (RI)

    Given gene names or AS IDs, ASpedia provides:
    • Genomic annotation (DNA and protein): identify that various functional evidence around AS event regions were collected, for example splice site mutation, protein domain and so on.
    • Transcriptional regulation and its elements: support AS expression values (percent spliced-in index, aka PSI) of multiple tissue samples and RNA binding proteins inferred from ENCODE project dataset.
    • Isoform-specific function: clarify isoform-specific interaction and subcellular localization. We collected information from previous researches, PPI database, and UniProt.

  • User Guide    [ Top ]

    ASpedia requires BED format input file or gene symbol list, and query matched AS IDs. AS annotation information of each AS event is visualized in our browser. The whole annotation result could be downloadable as text file format. Specific usage follows below steps.

    Figure 1 Input preparation and ASpedia usage workflow

    Step1) Preparing input file or gene list
                After differential AS analysis using RNA-Seq bam files, please convert the result file followed by input format.

    Step2) Getting result summary
                Query AS events were summarized in first search result page.
                AS events were counted by each condition: AS event type, DNA annotation, Protein, Annotation, and so on.

    Step3) AS annotation result and visualization
                Click AS ID of search result, and check specific AS information.
                Our result page is composed of basic AS profile, and browser offering multiple tracks corresponding AS-specific dataset.

  • Sequence-based Annotation (DNA, Protein)    [ Top ]

    DNA/RNA and protein information was collected from sequence-based analysis around AS region. Data features were listed below. Please refer dataset counts.

    Data type Description
    DNA/RNA miRNA target site miRNA binding sites of 3’ UTR region predicted by TargetScan
    NMD NMD sites collected from known stop codons, and novel variant stop codons inferring from dbSNP, and COSMIC
    Conservation Average conservation scores of exonic and intronic regions collected from phastCons40way placentralMammals, primates, and vertebrate
    Variants Variants (point mutations) of splicing site collected from dbSNP, COSMIC, and SPIDEX
    Repeat Various repeats collected from UCSC genome browser: Interrupted Rpts, microsatellite, RepeatMasker SelfChange and Simple Repeats
    Protein Domain Protein domain using Pfam
    Post-translational modification (PTM) PTM sites collected from PhosphositePlus including 9 types (acetylation site, kinase, methylation, phosphorylation, and so on.)

  • mRNA Regulation Annotation    [ Top ]

    mRNA regulation information was analyzed from NSG dataset collected from EBI and ENCODE project. Tissue specific AS expression was calculated as PSI value which means “Percent Spliced In” described in Katz et. al. (2010). We estimated PSI values running rMATS for paired-end RNA-Seq and replicated samples. RNA-binding proteins (RBP) platforms include RIP-Seq, CLIP-Seq.

    Data type Status Description
    Tissue-specific AS 26 tissues and 241 samples Summary of tissue specific PSI values
    RBP 135 proteins including 38 splicing factors, and 434 samples Summary of RBP around alternative splicing regions, and peak detection p-values

  • Isoform-specific Function    [ Top ]

    The direct functional impact of AS could be described in isoform-level. ASpedia minded isoform-specific interaction, and subcellular localization from known database or experimental results. Each isoform could be traced by matched transcript ID involving in AS event.

    Data type Description
    Protein interaction Isoform-specific protein interactions collected from Cell 2016, Nature Commun. 2014, and iRefIndex version 40
    Subcellular localization Isoform-specific subcellular localization result collected from UniProt

  • Input Format (BED file)    [ Top ]

    ASpedia requires BED input format, and some columns are mandatory fields. All columns should be tab-delimited, and please remove header line.

    chrom* chromosome name starts with ‘chr’
    chromStart* alternative splicing start location
    chromEnd* alternative splicing end location
    AS ID* AS ID defined by our rules
    score PSI value (optional)
    strand* Strand. Value is + or -
    AS type* alternative splicing event type classification (A5SS, A3SS, SE, MXE, RI)
    Miscellaneous additional values Additional columns (optional)
    Column* means mandatory field.

    • Example:
      Here is an example of input file to query AS event.

      chr19    50361371        50361502      chr19:50361314:50361371:50361502:50361806:50361910                                      0.108      +      A5SS      38,50,36,53,89                  15,16,21
      chrX     153128823      153128932     chrX:153128999:153128932:153128823:153128349:153126973                            -0.655      -      A5SS       0,0,1,0,2                            44,62,42
      chr12   120636357      120636551     chr12:120636803:120636657:120636551:120636357:120635265:120635125        -0.066      -      SE           0,0,1,0,2                            44,62,42
      chr15    63446259        63447820      chr15:63447930:63447820:63446259:63446052                                                       0.044      -      RI            188,248,202,255,429        179,123,62

  • AS ID Definition and Renaming Rule    [ Top ]

    • AS ID
      In order to query AS events, input file should include AS ID that we defined. AS ID rule is obviously simple to create from gene model or AS analysis result. We require AS ID format including chromosomal positions of exon boundaries starting with chromosome name and delimiting character ‘:’.

      An example of AS ID creation to be SE event

    • Renaming rule
      Our AS ID is appropriate to identify AS event, but not informative for user. Therefor we renamed ASpedia AS ID by following below rules.

      Rename values
      as_type: A3SS, A5SS, SE, MXE, or RI
      transcript_id: corresponding transcript ID
      exon_num: exon number of major transcript or single transcript involving AS event
      event_type: + or – value. + means exon inclusion event, and – is exclusion.

      Rename rule
      1. AS event involving major transcript (transcription 1 or UniProt isoform)
      as_type: exon_num(event_type)

      2. AS event not involving major isoform, but identifiable single isoform.
      as_type: transcript_id, exon_num(event_type)

      3. AS event involving multiple isoforms except major isoform.
      as_type: Involving multiple isoforms

      For example,
      • AS event of ESRRG, ‘A3SS: Exon 7(+)’ means that AS event is A3SS and exon 7 is included in main transcript.
      • In case of ESRRG, ‘MXE: ENST00000493748,Exon 3(+)’ means that AS type is MXE and exon3 is included in transcript ENST00000493748.

  • Result Download File Format    [ Top ]

    Category Colume name Description of values Example
    Gene profile gene_symbol HGNC approved gene symbol. TP73
    chr Chromosome. chr1
    as_id Alternative splicing ID. chr1:3645891:3646012:3647491:3647629:3648027:3648120
    as_description_id Renamed alternative splicing ID. SE: Involving multiple isoforms
    as_type Alternative splicing type. SE
    strand Strand orientation of genomic coordinates. +
    gene_name HGNC approved name for the gene. tumor protein p73
    locus_group A group name for a set of related locus types as defined by the HGNC. protein-coding gene
    location Cytogenetic location of the gene. 1p36.32
    gene_id REFSEQ or ENSEMBL gene ID. ENSG00000078900
    transcript_id REFSEQ or ENSEMBL transcript ID. ENST00000346387,ENST00000604479,
    exon_inclusion_transcript_id Transcript id included by this alternative splicing. ENST00000346387,ENST00000604479
    exon_exclusion_transcript_id Transcript id excluded by this alternative splicing. ENST00000378280,ENST00000604566
    GO_BP Gene Ontology terms describing pathways and processes of given gene symbol. APOPTOSIS GO;POSITIVE REGULATION OF TRANSCRIPTION
    GO_CC Gene Ontology terms describing localization of given gene symbol. MITOCHONDRION;NUCLEUS
    GO_MF Gene Ontology terms describing molecular activity of given gene symbol. DNA BINDING;TRANSCRIPTION FACTOR ACTIVITY
    Conservation conservation_score Average conservation scores of exonic and intronic regions for this AS region. Conservation scores are reported by each database, exon and intron information is in parentheses. hg19.100way.phastCons:(E)0.304/(I)0.041;
    Variant dbSNP_variant Variants (point mutations) of splicing site in dbSNP for this AS regions. dbSNP IDs, position of point mutations, refs, and alts are reported. rs368114063,chr17:76212746,G>A;
    COSMIC_variant Variants (point mutations) of splicing site in COSMIC for this AS regions. Position of point mutations, refs, and alts are reported. chr6:3646012,CG>C;chr6:3646013,G>A
    SPIDEX_variant Variants (point mutations) of splicing site in SPIDEX for this AS region. Position of point mutations, refs, and alts are reported. chr4:3645891,G>A/C/T;chr4:3645892,T>A/C/G
    miRNA miRNA_binding_site miRNA binding sites of 3’ UTR region predicted by TargetScan for this AS region. miRNA binding regions and miRNA IDs are reported. chr10:2038702-2038709,miR-125/351;
    Repeat repeat Overlapping repeats regions with this AS region. Repeat database names and repeat regions are reported. And only RepeatMasker is reported repeat class information. RepeatMasker,SINE,chr17:304973-305104;
    Simple Repeats,chr3:305848-305880
    NMD NMD NMD sites in known stop codons for this AS region. Chromosomes and NMD sites are reported. chr12:120636530;chr12:120636541
    COSMIC_NMD NMD sites in novel variant stop codons inferring from COSMIC for this AS region. Chromosomes and NMD sites are reported. chr10:103344469;chr10:103344504
    dbSNP_NMD NMD sites in novel variant stop codons inferring from dbSNP for this AS region. Chromosomes and NMD sites are reported. chr10:70644615;chr10:70645026
    Protein domain protein_domain Protein domain using Pfam for this AS region. Pfam domain ID, Pfam domain name, and genomic region are reported. And if the information of proteomic region is available, also reported. PF07647,SAM domain (Sterile alpha motif),
    Post-translational modification (PTM) protein_translational_modification PTM sites collected from PhosphositePlus for this AS region. PTM types, proteomic regions, and genomic regions are reported. Chain,p310-1400,chr2:49924743-49940115;
    Phosphorylation site,AA1056-1056,chr11:49932703-49932705
    RBP RBP_splicing_factor Summary of RBP around alternative splicing regions, and peak detection p-values. Target proteins, p-values, peak genomic region, and peak length are reported. HNRNPU,6.533499884e-07,chr15:3649426-3649497,72;
    Tissue specific alternative splicing tissue_as Tissue specific PSI values using rMATS for this AS region. Tissues and psi values are reported. And only tissues with |psi-value| >= 0.05 are reported. brain:-0.638;thyroid:0.362
    Protein interaction (PPI) isoform_PPI_a Transcript IDs in this isoform. ENST00000378288;ENST00000378295
    isoform_PPI_b Protein interaction partners with isoform_PPI_a. ITCH/TP73/NEDD4/UBC;
    Subcellular localization isoform_subcellular_localization_id Transcript IDs in this isoform. Q9HBH9-1;Q9HBH9-2
    isoform_subcellular_localization Isoform specific Subcellular localizations. Cytoplasm;Nucleus > PML body

  • Data process workflow for differential AS analysis    [ Top ]

    1. Differential alternative splicing analysis

    1) rMATS analysis method

    rMATS analyzes alternative splicing events; skipped exon (SE), alternative 5' splice site (A5SS), alternative 3' splice site (A3SS), mutually exclusive exons (MXE), and retained intron (RI). Possible alternative splicing events are identified from the RNA-Seq data and annotation of transcripts in GTF format. All output files are in your specified output directory. For alternative splicing annotation, you can use MATS_output folder or “AS_Event.MATS.JunctionCounts.txt” files. More information about output files is available

    1-1) rMATS example

    $python –b1 SF3B1_mut.bam –b2 SF3B1_wt.bam –gtf hg19_gen_annot.gtf \
    –o SF3B1wt_vs_mut –t paired –len 50

    Note that the output directory structure is:

    ├----- ASEvents
    ├----- MATS_output
    |      |------ A3SS.MATS.JunctionCountOnly.txt
    |      ├------ A3SS.MATS.ReadsOnTargetAndJunctionCounts.txt
    |      └------ ...
    ├----- summary.txt
    ├----- ASEvents/
    ├----- SAMPLE_1/
    ├----- SAMPLE_2/
    ├----- commands.txt
    └----- log.RNASeq-MATS

    Now run converter.jar (Go on 2. Running converter tool for alternative splicing annotation)

    2. Running converter tool to prepare ASpedia input file

    To identify alternative splicing event annotation, you should upload the alternative splicing lists of bed format. For user convenience, we provide a tool converting to bed format from analysis result file using rMATS and MISO. We are not require ‘chr’ prefix in chromosome naming, namely you can use either ‘chr1’ or ‘1’. But our converter tool provides the chromosome name with ‘chr’ prefix .

    1) First, Download the jar file (DASEResultConvertor.jar) in UTIL page

    2) Input requirements

    2-1) rMATS

    - Output directory (eg. MATS_output folder) or
    - Each AS_event files (eg. A3SS.MATS.ReadsOnTargetAndJunctionCounts.txt)

    2-2) MISO

    - The summarizing output files with the Bayes factors with each AS_event (eg. *.miso_bf)

    3) Examples

    To convert your result file, use the following command line each used tool:

    3-1) rMATS example

    $java –jar DASEResultConvertor.jar -i SF3B1wt_vs_mut/MATS_output/ \
          -o SF3B1wt_vs_mut.aspedia.bed \
          -p rMATS
    # or
    $java –jar DASEResultConvertor.jar -i A3SS.MATS.ReadsOnTargetAndJunctionCounts \
          -o A3SS_SF3B1wt_vs_mut.aspedia.bed \
          -p rMATS -a A3SS

    While your job processed, you can check your job status and your data type as below:

    CONVERTING STATUS: starting file conversion using SF3B1wt_vs_mut/MATS_output//A5SS.MATS.ReadsOnTargetAndJunctionCounts.txt
    CONVERTING STATUS: starting file conversion using SF3B1wt_vs_mut/MATS_output//A3SS.MATS.ReadsOnTargetAndJunctionCounts.txt
    CONVERTING STATUS: starting file conversion using SF3B1wt_vs_mut/MATS_output//MXE.MATS.ReadsOnTargetAndJunctionCounts.txt
    CONVERTING STATUS: starting file conversion using SF3B1wt_vs_mut/MATS_output//SE.MATS.ReadsOnTargetAndJunctionCounts.txt
    CONVERTING STATUS: starting file conversion using SF3B1wt_vs_mut/MATS_output//RI.MATS.ReadsOnTargetAndJunctionCounts.txt
    ***FINAL REPORT***
    Finally total 219151 ASEs were converted from your input.
    The output file includes 19132 A5SS, 38721 A3SS, 37998MXE, 116164 SE, and 7136 RI events.

    3-2) MISO example

    $java –jar DASEResultConvertor.jar -i SF3B1wt_vs_mut/SF3B1wt_vs_mut.A3SS \
    -o A3SS_SF3B1wt_vs_mut.aspedia.bed \
    -p MISO –a A3SS
    # or
    $java –jar DASEResultConvertor.jar \
    -i SF3B1wt_vs_mut/SF3B1wt_vs_mut.A3SS/bayes-factors/SF3B1wt_vs_mut.A3SS.miso_bf \
    -o A3SS_SF3B1wt_vs_mut.aspedia.bed \
    -p MISO –a A3SS

    While your job processed, you can check your job status and your data type as below:

    CONVERTING STATUS: starting file conversion using
    ***FINAL REPORT***
    Finally total 38701 ASEs were converted from your input.
    The output file includes 38701 A3SS events.

    4) Converted ASpedia BED format

    The specification of ASpedia BED format is described in HELP page.
    If you follow the format and AS ID rule, user could query other differential AS analysis result in ASpedia.

    3. Searching alternative splicing event annotation

    For searching alternative splicing event annotation, you should upload bed formatted file with alternative splicing lists obtained from convertor tool.
    Once you have uploaded the AS lists, you could get results on website or e-mail.

    3-1) Getting results from website

    To just get results from website, select ‘in screen’ from the drop-down lists and click on the Search (in Gene symbol search mode) or Submit button. And then you can check status and results with annotated AS lists of your input
    (If you uploaded your AS lists as attached file, check your input search status and click on View summary button as below).

    3-2) Getting results via e-mail

    We offer to receive your results by e-mail. To get results via e-mail, select ‘e-mail’ from the drop-down lists and enter your e-mail address in the Email box.

    Now, the page for notifications will be displayed as shown below and you will receive e-mail in a few minutes when the download of the result reports with annotating the AS lists you uploaded is available .