ToolBus/PathPort Web-Services

The PathPort project at VBI has implemented a number of web-services that provide the core data access and analysis tools capabilities for the system. These tools are utilized to provide the information that is displayed via the visualization plugins within the context of ToolBus. Most of these web-services simply wrap existing programs (e.g., Primer3, MUMmer 3.0) and databases (e.g., VBI's own bacteria phylogenetic and annotated DNA databases). Many of these resources are open-source and have no usage restrictions while others have restrictions.

Web-services can have restricted access because of the licensing terms of the underlying algorithm/software. Some programs, such as the GENSCAN have been acquired by VBI under the terms of an academic license, and therefore cannot be made available outside of VBI.

IMPORTANT NOTE:
For those of you working from behind VERY restrictive firewalls,
please note that all VBI services are published on ports 6565 and 7575.

Data Services

VBIdas EMBLdas GenomeTool GUS Minet(Pronet,Pathway)
Piml(Pathinfo) Pimllink Est2Genome Phylogenetic Phylip

Sequence Alignment, Similarity Search and Comparison

ClusterBlast TimelogicBlast LocalBlast MSblast Ssaha
Ssahasnp Smith-Waterman ClustalW Water Stretcher
Blat Hmmer Rfam/Infernal COGnitor Blocks
MUMmer Sean

Gene Prediction

Glimmer GlimmerM GlimmerHMM Tigrscan Unveil
SNAP rRNAscan tRNAscan Tfscan RepeatMasker
Genscan Genemark Grailexp Orpheus Fgenesh
RankGene TICO

Statistic Analysis

Rsom Agnes Kmean Rpca Rsvm
Rknn Rlda Diana Cluster Hclust
T/F-Test Rfda Anova SAM GeneIdMap

Others

Dnacuts Probedesign Yoda GoAnalysis SignalP
Lipop InterProScan MSTag MSFit ProteinPredict
TagIdent Psipred Memsat rbsFinder Contig

Freely Available Public Services

  • http://pathport.vbi.vt.edu/services/wsdls/beta/vbidas.wsdl
    http://208.187.245.3:8080/axis/services/vbidas?wsdl
    http://64.92.170.71:6565/axis/services/vbidas?wsdl
    Makes available sequence and annotation information from VBI's version of NCBI's RefSeq and/or Genbank database for viri, bacteria, archea, plasmodia, mouse, rat, mosquito, fruit fly genomes. The MySQL relational database schema was designed to follow Distributed Annotation System (DAS) standard.
  • http://pathport.vbi.vt.edu/services/wsdls/embldas.wsdl
    http://69.93.57.58:8080/axis/services/embldas?wsdl
    http://66.240.238.41:8080/axis/services/embldas?wsdl
    http://208.187.245.3:8080/axis/services/embldas?wsdl
    http://64.92.170.71:6565/axis/services/embldas?wsdl
    http://206.225.85.121:8080/axis/services/embldas?wsdl
    Utilizes a European Bioinformatics Institute (EBI)'s XEMBL web-service to retrieve organism DNA and annotations from EMBL database. XEMBL returns the results in BSML format, which are then converted into DAS/1 by embldas. EBI's service can be somewhat slow, thus slowing this service's response. Generally it should take no more than several minutes. Please also note that this service's availability is dependent upon the availability of EBI's service (over which we have no control).
  • http://pathport.vbi.vt.edu/services/wsdls/beta/genometool.wsdl
    http://64.92.170.71:6565/axis/services/genometool?wsdl
    Built on the top of Vbidas search tool. User can navigate chromosome data and corresponding contig assembly, followed by sequence and annotation information from VBI's version of NCBI's RefSeq and/or Genbank database for viri, bacteria, archea, plasmodia, mouse, rat, mosquito, fruit fly and human genomes.
  • http://pathport.vbi.vt.edu/services/wsdls/guspathport.wsdl
    GUS web service provides genome sequence and feature annotations. The database was developed on Genomics Unified Schema (GUS) platform. GUS is built on a strongly-typed relational schema and includes the GUS Application Framework to assist in data acquisition and analysis tool development. The GUS schema integrates genome, transcript, and proteome of one or more organisms, gene regulation and networks, ontologies, gene expression and inter-organism comparisons.
    http://pathport.vbi.vt.edu/services/wsdls/guspathport.wsdl
    Guspathport contacts an Oracle database that uses RefSeq as the genome data source.
    http://pathport.vbi.vt.edu/services/wsdls/guspatric.wsdl
    Guspatric contacts the database that contains annotation for organisms for the Patric project, one of the eight Bioinfomatics Resource Centers (BRC) project funded by National Institutes of Health through the National Institute of Allergy and Infectious Diseases (NIAID) .
  • http://pathport.vbi.vt.edu/services/wsdls/beta/pathway.wsdl
    A database of protein interaction pathways involved in the infection process for pathogens from both the CDC's and NIAID's A, B, and C pathogen lists. Currently only pathways for a few pathogens are present, but VBI's researchers are continuing to gather and curate data for this fully referenced and growing database. If you have specific questions, concerns, or suggestions regarding this information please email us at pronet@vbi.vt.edu
  • http://pathport.vbi.vt.edu/services/wsdls/beta/piml.wsdl
    A database of background information on pathogens from both the CDC's and NIAID's A, B, and C pathogen lists (CDC list, NIAID list). Includes taxonomic, epidemiological, laboratory work, and host information. This fully referenced data has been gathered and curated by VBI researchers. If you have specific questions, concerns, or suggestions regarding this information please email us at pathinfo@vbi.vt.edu
    The piml web-service retrieves pathogen background information including sections on taxonomy, organism, epidemiology, host-list, and labwork. The information is fully referenced with links to PubMed abstracts when available. Currently, the pathogens are documented in Xindice database. Xindice is a database designed to store XML data (commonly referred to as a native XML database). The user selects a specific pathogen and the corresponding pathogen ID is submitted to Xindice database, whereupon, the returned pathogen document is displayed by the PIML viewer.
  • http://pathport.vbi.vt.edu/services/wsdls/beta/pimllink.wsdl
    This webservice takes an organism name and tries to find a close match in the pathogen information database for it. It then retrieve pathogen information from PIML webservice. Currently it is used by the phylogenetic viewer so that users can click on tree tips to find corresponding background information on those tips if available in the database.
    Algorithm: Pimllink webservice apply a similarity metric algorithm to get the most approximate pathogen name which is matching the user provided pathogen name, then fetch pathgen information with this most approximate matching string. PimlLink applied a new string similarity metric algorithm that rewards both common substrings and a common ordering of those substrings. In addition, the algorithm considers not only the single longest common substring, but other common substrings as well. The solution is to find out how many adjacent character pairs are contained in both strings by considering adjacent characters: not only the characters, but also the character ordering in the original string are taken into account, since each character pair contains some information about the original ordering.
  • http://pathport.vbi.vt.edu/services/wsdls/beta/phylogenetic.wsdl
    http://206.225.85.121:8080/axis/services/phylogenetic?wsdl
    Wraps a sophisticated database and set of associated phylogenetic construction tools to create phylogenetic trees for bacterial organisms and their genes. You can even give your own sequence for inclusion in the tree. The database and tools this services makes were (and continue to be) developed under the direction of Dr. Allan Dickerman at VBI.
  • http://pathport.vbi.vt.edu/services/wsdls/phylip.wsdl
    http://69.93.57.58:8080/axis/services/phylip?wsdl
    http://66.240.238.41:8080/axis/services/phylip?wsdl
    http://208.187.245.3:8080/axis/services/phylip?wsdl
    http://64.92.170.71:6565/axis/services/phylip?wsdl
    http://206.225.85.121:8080/axis/services/phylip?wsdl
    The Phylip web-service takes a group of sequences to generate a tree using Phylip package and PAML package. The Phylip package contains many programs for inferring phylogenies. It carries out different algorithms on different kinds of data. PROTPARS estimates phylogenies from protein sequences(input using the standard one-letter code for amino acids) using the parsimony method, in a variant which counts only those nucleotide changes that change the amino acid, on the assumption that silent changes are more easily accomplished. DNAPARS estimates phylogenies by the parsimony method using nucleic acid sequences. Allows use the full IUB ambiguity codes, and estimates ancestral nucleotide states. Gaps treated as a fifth nucleotide state. DNAML estimates phylogenies from nucleotide sequences by maximum likelihood. The model employed allows for unequal expected frequencies of the four nucleotides, for unequal rates of transitions and transversions, and for different (prespecified) rates of changes in different categories of sites, with the program inferring which sites have which rates. PAML package contains programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood (ML). The program CODEML is formed by merging two old programs: codonml, which implements the codon substitution method of Goldman and Yang(1994) for protein-coding DNA sequences, and aaml, which implements models for amino acid sequences. These two are now distinguished by a variable named seqtype in the control file codeml.ctl, that is, 1 for codon sequences and 2 for amino acid sequences.
  • http://pathport.vbi.vt.edu/services/wsdls/beta/blastbt.wsdl
    Uses the Linux Beowulf cluster at VBI's Core Computational Facility to run BLASTN, BLASTP, and translating BLAST analyses on multiple organism databases simultaneously and to combine the answers before returning the results in BLAST XML format.
  • http://pathport.vbi.vt.edu/services/wsdls/beta/timelogicblast.wsdl
    Uses the Timelogic Server at VBI's Core Computational Facility to run BLASTN, BLASTP, and translating BLAST analyses on multiple organism databases simultaneously and to combine the answers before returning the results in BLAST XML format.
  • http://pathport.vbi.vt.edu/services/wsdls/beta/blastlocal.wsdl
    Uses PathPort server to run BLASTN, BLASTP, and translating BLAST analyses on multiple organism databases simultaneously and to combine the answers before returning the results in BLAST XML format.
  • http://pathport.vbi.vt.edu/services/wsdls/msblast.wsdl
    MS Blast is a specialized BLAST -based protocol developed for identification of proteins by sequence similarity searches using peptide sequences produced by the interpretation of tandem mass spectra. a specialized BLAST -based protocol developed for identification of proteins by sequence similarity searches using peptide sequences produced by the interpretation of tandem mass spectra.
  • http://pathport.vbi.vt.edu/services/wsdls/ssaha.wsdl
    Ssaha web service is to use ssaha (Sequence Search and Alignment by Hashing Algorithm) for rapidly finding near exact matches in DNA or protein databases at VBI. For sequence search on DNA level, choices of genome sequences and CDS sequence are provided. Meanwhile. DNA sequence search can be performed according to synonymous codon translation. Protein sequence can be searched against protein database or nucleotide (genome or CDS) database on basis of translation. Returned result will be mapped on Homolog model and viewable in Homolog viewer.
  • http://pathport.vbi.vt.edu/services/wsdls/ssahasnp.wsdl
    ssahaSNP applies ssaha (Sequence Search and Alignment by Hashing Algorithm) to high-throughput SNP detection, in which high quality region of the base sequence from each trace read is compared to reference genome sequence. Both query sequence and reference sequence should have associated quality values, which assures that SNPs are only detected in high quality region. ssahaSNP protocol is used in The SNP Consortium. ssahaSNP web service is to use ssaha algorithm for rapidly finding near exact matches in DNA or protein databases at VBI. For sequence search on DNA level, choices of genome sequences and CDS sequence are provided. Meanwhile. DNA sequence search can be performed according to synonymous codon translation. Protein sequence can be searched against protein database or nucleotide (genome or CDS) database on basis of translation.
  • http://pathport.vbi.vt.edu/services/wsdls/timelogicsmithwaterman.wsdl
    Utilize VBI's recently acquired TimeLogic FPGA hardware system to run the Smith-Waterman algorithm on multiple organism databases.
  • http://pathport.vbi.vt.edu/services/wsdls/beta/msa.wsdl
    http://69.93.57.58:8080/axis/services/msa?wsdl
    http://66.240.238.41:8080/axis/services/msa?wsdl
    http://208.187.245.3:8080/axis/services/msa?wsdl
    http://64.92.170.71:6565/axis/services/msagt?wsdl
    http://206.225.85.121:8080/axis/services/msa?wsdl
    Wraps EBI's ClustalW program for doing global multiple sequence alignments for either DNA or protein sequences. This web-service accepts multiple sequence data files. Supported input formats include FASTA, EMBL/SwissProt, and NBRF/PIR. Results are returned in MSAML format.
  • http://pathport.vbi.vt.edu/services/wsdls/beta/water.wsdl
    http://66.240.238.41:8080/axis/services/water?wsdl
    http://208.187.245.3:8080/axis/services/water?wsdl
    http://64.92.170.71:6565/axis/services/watergt?wsdl
    http://206.225.85.121:8080/axis/services/water?wsdl
    Wraps the water local alignment program from the EMBOSS suite. Water uses the Smith-Waterman algorithm (modified for speed enhancements), a dynamic programming approach to calculate the local alignment between two sequences. Local alignment methods are very useful for scanning databases or when matches between small regions of sequences (e.g. between protein domains) are needed . Results are returned in MSAML format.
  • http://pathport.vbi.vt.edu/services/wsdls/beta/stretcher.wsdl
    http://66.240.238.41:8080/axis/services/stretcher?wsdl
    http://208.187.245.3:8080/axis/services/stretcher?wsdl
    http://64.92.170.71:6565/axis/services/stretchergt?wsdl
    http://206.225.85.121:8080/axis/services/stretcher?wsdl
    Wraps the stretcher global alignment program from the EMBOSS suite. Stretcher finds the best global alignment between two sequences. In a global pairwise alignment it is assumed that the two sequences have diverged from a common ancestor and the program should try to stretch the two sequences. Stretcher introduces gaps where necessary, in order to show the alignment over the whole length of the two sequences that best illustrates their similarities. Stretcher calculates and finds an optimal global alignment with a modification of the classic dynamic programming algorithm using linear space. Returns MSAML formatted results.
  • http://pathport.vbi.vt.edu/services/wsdls/blat.wsdl
    BLAT , "BLAST-Like Alignment Tool", is a software a tool which performs rapid mRNA/DNA and cross-species protein alignments. It is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. This webservice is dedicated to EST genome alignment using BLAT. ESTs are a valuable resource in whole genome annotation. Alignment of ESTs to the genome sequences can gives you intron, exon and boundary information and is far more accurate than gene prediction programs.
  • http://pathport.vbi.vt.edu/services/wsdls/hmmer.wsdl
    http://69.93.57.58:8080/axis/services/hmmer?wsdl
    The Hmmer web-service wraps HMMER to compare a protein sequence against a HMM database, for example, the TIGRFAM database, or the Pfam database, and returns information of the protein family information for the query sequence. For additional information on HMMER software package, please refer to: hmmer
    For additional information on TIGRFAMs database, please refer to: Tigrfam
    For additional information on Pfam database, please refer to: Pfam
  • http://pathport.vbi.vt.edu/services/wsdls/rfam.wsdl
    http://69.93.57.58:8080/axis/services/rfam?wsdl
    The Rfam web-service query the Rfam database and return RNA family information. Covariance model searches are extremely compute intensive. A small model (like tRNA) can search a sequence database at a rate of around 300 bases/sec. The compute time scales roughly to the 4th power of the length of the RNA, so larger models quickly become infeasible without significant compute resources. The Rfam web-service wraps rfam_scan.pl script, wich uses presreening BLAST to increase search speed For additional information on Rfam database, please refer to: Rfam. For additional information about Infernal, please refer to: Infernal.
  • http://pathport.vbi.vt.edu/services/wsdls/cognitor.wsdl
    http://69.93.57.58:8080/axis/services/cognitor?wsdl
    The Cognitor web-service compares a protein sequence against COG database, and returns information of the COG that the query protein belong to.
    For additional information on Cognitor and COG refer to: COGnitor
  • http://pathport.vbi.vt.edu/services/wsdls/blocks.wsdl
    http://69.93.57.58:8080/axis/services/blocks?wsdl
    The Blocks web-service compares a protein sequence against BLOCKS database, and returns information of the block hit for the query protein.
    For additional information on Blacks, please refer to: Blocks
  • http://pathport.vbi.vt.edu/services/wsdls/beta/mummer.wsdl
    http://64.92.170.71:6565/axis/services/mummergt?wsdl
    Wraps the MUMmer 3.0 program developed by The Institute for Genomic Research (TIGR) for doing large sequence comparisons. It rapidly aligns entire genomes, as well as finds "Maximum Unique Matches" (MUMs) between two DNA or protein sequences. The central algorithm of MUMmer takes two input sequences and finds all subsequences longer than a specified minimum length that are identical between the two input sequences and their reverse complement. These matches are guaranteed to be maximal, in that they cannot be extended on either end without incurring a mismatch. The algorithm is implemented using a suffix-tree based data structure, which permits very fast and memory-efficient comparisons of the sequences. Currently, VBI hosts the following mummer flavored webservices, each running the same algorithm but contacting different databases.
  • http://pathport.vbi.vt.edu/services/wsdls/beta/mummer.wsdl
    The MySQL database contains sequence and annotation information from NCBI's RefSeq and/or Genbank database.
  • http://pathport.vbi.vt.edu/services/wsdls/mummergus.wsdl
    The Oracle database was developed on Genomics Unified Schema (GUS) platform and use RefSeq as the genome data source.
  • http://pathport.vbi.vt.edu/services/wsdls/mummerpatric.wsdl
    The database contains annotation for organisms for the Patric project, one of the eight Bioinfomatics Resource Centers (BRC) projects funded by National Institutes of Health through the National Institute of Allergy and Infectious Diseases (NIAID) .
  • http://pathport.vbi.vt.edu/services/wsdls/sean.wsdl
    Sean webservice scans the aligned sequences a character at a time to find potential SNPs by comparing the individual character with the corresponding character in the consensus. If there is a difference in the sequence either side is checked against the same region in the consensus over the required window. By default this is set to 15 bp and is configurable by the user. If the windows are identical the position of the base change and the base are noted, only if at least one other identical base change is located at the same location in another sequence is it stored as a potential SNP. The only other check is to ensure that the consensus character is present in at least two sequences also as the consensus does not always contain the dominant base at a particular position. The results are returned in DAS/1 format.
  • http://pathport.vbi.vt.edu/services/wsdls/beta/glimmer.wsdl
    http://69.93.57.58:8080/axis/services/glimmer?wsdl
    http://66.240.238.41:8080/axis/services/glimmer?wsdl
    http://208.187.245.3:8080/axis/services/glimmer?wsdl
    http://64.92.170.71:6565/axis/services/glimmergt?wsdl
    http://206.225.85.121:8080/axis/services/glimmer?wsdl
    Wraps the Glimmer 2.10 program, from The Institute for Genomic Research (TIGR), for doing gene prediction in genomic DNA sequences, especially the genomes of bacteria and archaea. It uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA. Glimmer consists of two main programs. The first is the training program, "build-imm", which takes an input set of sequences and builds and outputs the IMM for them. These sequences can be complete genes or just partial open reading frames (ORFs). The second program is Glimmer, which uses IMM to identify putative genes in entire genome. Glimmer automatically resolves conflicts between most overlapping genes by choosing one of them. It also identifies genes that are suspected to truly overlap, and flags these for closer inspection by the user. These ``suspect'' gene candidates have been a very small percentage of the total for all the genomes analyzed thus far. Results are returned in DAS/1 format.
  • http://pathport.vbi.vt.edu/services/wsdls/glimmerm.wsdl
    http://69.93.57.58:8080/axis/services/glimmerm?wsdl
    http://66.240.238.41:8080/axis/services/glimmerm?wsdl
    http://208.187.245.3:8080/axis/services/glimmerm?wsdl
    http://64.92.170.71:6565/axis/services/glimmerm?wsdl
    http://206.225.85.121:8080/axis/services/glimmerm?wsdl
    Wraps the GlimmerM program, from The Institute for Genomic Research (TIGR), for doing gene prediction for eukaryotic genomic DNA sequences. "It is based on a dynamic programming algorithm that considers all combinations of possible exons for inclusion in a gene model and chooses the best of these combinations. The decision about what gene model is best is a combination of the strength of the splice sites and the score of the exons generated by an interpolated Markov model (IMM). The system has been trained for Arabidopsis thaliana and Aspergillus and should work well on closely related organisms." Results are returned in DAS/1 format.
  • http://pathport.vbi.vt.edu/services/wsdls/glimmerhmm.wsdl
    glimmerhmm webservice wraps GlimmerHMM , a gene finder program based on a Generalized Hidden Markov Model (GHMM). "Although the gene finder conforms to the overall mathematical framework of a GHMM, additionally it incorporates splice site models adapted from the GeneSplicer program and a decision tree adapted from GlimmerM. It also utilizes Interpolated Markov Models for the coding and noncoding models . Currently, GlimmerHMM's GHMM structure includes introns of each phase, intergenic regions, and four types of exons (initial, internal, final, and single)".
  • http://pathport.vbi.vt.edu/services/wsdls/tigrscan.wsdl
    http://69.93.57.58:8080/axis/services/tigrscan?wsdl
    http://66.240.238.41:8080/axis/services/tigrscan?wsdl
    http://208.187.245.3:8080/axis/services/tigrscan?wsdl
    http://64.92.170.71:6565/axis/services/tigrscan?wsdl
    http://206.225.85.121:8080/axis/services/tigrscan?wsdl
    The TIGRSCAN web-service wraps the native TIGRSCAN application to do gene prediction.It is a gene finder based on the Generalized Hidden Markov Model framework
  • http://pathport.vbi.vt.edu/services/wsdls/unveil.wsdl
    The unveil web-service is a gene identification web-service which analyzes genomic DNA sequences from a variety of organisms. The webservice utilizes Hidden Markov Model to identify non-overlapping gene models on either or both strands during a single pass over a given DNA sequence. Individual components of the model can be independently retrained using the Baum-Welch EM algorithm.
  • http://pathport.vbi.vt.edu/services/wsdls/snap.wsdl
    Wraps Semi-HMM-based Nucleic Acid Parser (SNAP). SNAP predicts transcript models solely on the basis of the underlying genomic sequence and does not take any experimental evidence into account. SNAP Protein Reports are not available for all species, but SNAP performs better than GENSCAN in some species.
  • http://pathport.vbi.vt.edu/services/wsdls/rrnascan.wsdl
    http://69.93.57.58:8080/axis/services/rrnascan?wsdl
    The rRNAscan web-service identifies ribosomal RNA genes in genomic DNA or RNA sequences. Currently it detects 5S rRNA by initially building matrices or profiles from multiple ribosomal RNA sequence alignments and then finding matches between the concensus sequence from the GRIBSKOV or HENIKOFF profile / matrix and the query sequence.
  • http://pathport.vbi.vt.edu/services/wsdls/trnascan.wsdl
    http://69.93.57.58:8080/axis/services/trnascan?wsdl
    http://66.240.238.41:8080/axis/services/trnascan?wsdl
    http://208.187.245.3:8080/axis/services/trnascan?wsdl
    http://64.92.170.71:6565/axis/services/trnascan?wsdl
    http://206.225.85.121:8080/axis/services/trnascan?wsdl
    Detects transfer RNA genes in DNA sequences using the tRNAscan-SE-v1.23 program developed by the Department of Genetics at Washington University's School of Medicine (St. Louis) and the Department of Genetics at Stanford University. The web-service works in three main phases:
    1. tRNAscan detects tRNA by looking for short, well conserved intragenic promoter sequences found in the TPC and D arm regions of prototypic tRNAs.
    2. tRNAscan extracts the DNA subsequences identified as possible tRNAs and passes these segments to an RNA search program that employs the Cove program.
    3. tRNAscan takes confirmed tRNAs and runs another Cove program that displays RNA secondary structure.
    Returns DAS/1 formatted results.
  • http://pathport.vbi.vt.edu/services/wsdls/tfscan.wsdl
    http://66.240.238.41:8080/axis/services/tfscan?wsdl
    http://208.187.245.3:8080/axis/services/tfscan?wsdl
    http://64.92.170.71:6565/axis/services/tfscan?wsdl
    http://206.225.85.121:8080/axis/services/tfscan?wsdl
    Wraps the tfscan transcription factor binding site prediction program from the EMBOSS suite. The TRANSFAC Database is a database of eukaryotic cis-acting regulatory DNA elements and trans-acting factors. It covers a range of organisms from yeast to human. The SITE data from TRANSFAC contains information on individual, putative, regulatory protein binding sites. It has been divided into the following taxonomic groups: Fungi, Insects, Plants, Vertebrates, and Other. Tfscan takes an input sequence, combined with a user-selected organism, and performs a fast match of the TRANSFAC sequences against the input sequence. The result is a list of positions which match the binding sites in the TRANSFAC SITE database, returned in DAS/1 formatted.
  • http://pathport.vbi.vt.edu/services/wsdls/repeatmasker.wsdl
    RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program
  • http://pathport.vbi.vt.edu/services/wsdls/rankgene.wsdl
    Rankgene web-service will analyze gene expression data, feature selection and rank genes based on the predictive power of each gene to classify samples into functional or disease categories.
  • http://pathport.vbi.vt.edu/services/wsdls/tico.wsdl
    Tico web service wraps up TiCo package, "a tool for postprocessing predictions of prokaryotic genes with the objective to improve the accuracy of annotated Translation Initiation Sites (TIS)." "Currently the tool can be used to analyze and reannotate predictions obtained by the program GLIMMER2 (Delcher et al., 1999). The underlying algorithm of TiCo is based on a completely unsupervised method for scoring potential translation starts. The method works without specific assumptions about characteristic sequence features such as Shine-Dalgarno motifs or start codon usage. In addition, empirical thresholds have been avoided in order to prevent the algorithm from specialization with respect to particular test data." For additional information on Tico, please refer to: http://tico.gobics.de/index.jsp
  • http://pathport.vbi.vt.edu/services/wsdls/rsom.wsdl
    http://69.93.57.58:8080/axis/services/rsom?wsdl
    http://66.240.238.41:8080/axis/services/rsom?wsdl
    http://208.187.245.3:8080/axis/services/rsom?wsdl
    http://64.92.170.71:6565/axis/services/rsom?wsdl
    http://206.225.85.121:8080/axis/services/rsom?wsdl
    Utilizes an R package based program, GeneSom, which uses self-organizing maps (SOM) to perform cluster for microarray data analysis. "GeneSom applies SOM algorithm for gene clustering of microarray data. SOM is an unsupervised neural network learning algorithm for the analysis and organization of gene expression profiles. It puts a number of vectors into a high-dimensional input data space to place the data sets in an ordered fashion. It constructs a nonlinear projection of the data to a "map" display that is used for visualizing similarities, relationships, and cluster structures.The same SOM display can be used for visualizing the relationships between data sets, such as the gene expression and the functional classes of gene."
  • http://pathport.vbi.vt.edu/services/wsdls/agnes.wsdl
    http://69.93.57.58:8080/axis/services/agnes?wsdl
    http://66.240.238.41:8080/axis/services/agnes?wsdl
    http://208.187.245.3:8080/axis/services/agnes?wsdl
    http://64.92.170.71:6565/axis/services/agnes?wsdl
    http://206.225.85.121:8080/axis/services/agnes?wsdl
    Utilizes an R package based program, Agnes, which uses agglomerative clustering methods to perform hierarchical cluster data analysis. Unlike other agglomerative clustering methods such as hclust, agnes yields the agglomerative coefficient which measures the amount of clustering structure found. Algorithm outline: At the beginning, each observation is a small cluster by itself. Clusters are then merged until only one large cluster remains which contains all observations. At each stage, two nearest clusters are combined to form one large cluster.
  • http://pathport.vbi.vt.edu/services/wsdls/kmean.wsdl
    http://69.93.57.58:8080/axis/services/kmean?wsdl
    http://66.240.238.41:8080/axis/services/kmean?wsdl
    http://208.187.245.3:8080/axis/services/kmean?wsdl
    http://64.92.170.71:6565/axis/services/kmean?wsdl
    http://206.225.85.121:8080/axis/services/kmean?wsdl
    Utilizes an R package based program, Kmeans, to perform non-hierarchical cluster data analysis. Algorithm outline: This nonhierarchical method initially takes the number of components of the population equal to the final required number of clusters. In this step itself the final required number of clusters is chosen such that the points are mutually farthest apart. Next, it examines each component in the population and assigns it to one of the clusters depending on the minimum distance. The centroid's position is recalculated every time a component is added to the cluster and this continues until all the components are grouped into the final required number of clusters.
  • http://pathport.vbi.vt.edu/services/wsdls/rpca.wsdl
    http://69.93.57.58:8080/axis/services/rpca?wsdl
    http://66.240.238.41:8080/axis/services/rpca?wsdl
    http://208.187.245.3:8080/axis/services/rpca?wsdl
    http://64.92.170.71:6565/axis/services/rpca?wsdl
    http://206.225.85.121:8080/axis/services/rpca?wsdl
    Utilizes an R package based program, Princomp, to perform statistical principle component analysis on microarray data. PCA is designed to capture the variance in a dataset in terms of principle components. It is trying to reduce the dimensionality of the data to summarize the most important (i.e. defining) parts while simultaneously filtering out noise. PCA is a classical statistical approach for classifying samples of unknown classes. As a "supervised" computer learning method, pca exploits prior knowledge of gene function to identify unknown samples of similar function from expression data.
  • http://pathport.vbi.vt.edu/services/wsdls/rsvm.wsdl
    http://69.93.57.58:8080/axis/services/rsvm?wsdl
    http://66.240.238.41:8080/axis/services/rsvm?wsdl
    http://208.187.245.3:8080/axis/services/rsvm?wsdl
    http://64.92.170.71:6565/axis/services/rsvm?wsdl
    http://206.225.85.121:8080/axis/services/rsvm?wsdl
    Utilizes an R package based program, SVM, to perform classification on microarray data. SVM is used to train a support vector machine. It can be used to perform general regression, classification, and density-estimation as well. SVM is considered as a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar functions from expression data.
  • http://pathport.vbi.vt.edu/services/wsdls/rknn.wsdl
    http://69.93.57.58:8080/axis/services/rknn?wsdl
    http://66.240.238.41:8080/axis/services/rknn?wsdl
    http://208.187.245.3:8080/axis/services/rknn?wsdl
    http://64.92.170.71:6565/axis/services/rknn?wsdl
    http://206.225.85.121:8080/axis/services/rknn?wsdl
    Utilizes an R package based program, KNN, to perform classification on microarray data. KNN (k-nearest neighbor classification) method classifies test set from training set. For each row of the test set, the `k' nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the `k'th nearest vector, all candidates are included in the vote.
  • http://pathport.vbi.vt.edu/services/wsdls/rlda.wsdl
    http://69.93.57.58:8080/axis/services/rlda?wsdl
    http://66.240.238.41:8080/axis/services/rlda?wsdl
    http://208.187.245.3:8080/axis/services/rlda?wsdl
    http://64.92.170.71:6565/axis/services/rlda?wsdl
    http://206.225.85.121:8080/axis/services/rlda?wsdl
    Utilizes an R package based program, LDA, to perform classification on microarray data. Linear discriminant analysis (LDA) is a classical statistical approach for classifying samples of unknown classes. LDA is considered a supervised computer learning method because it exploits prior knowledge of gene function to identify unknown samples of similar function from expression data.
  • http://pathport.vbi.vt.edu/services/wsdls/diana.wsdl
    http://69.93.57.58:8080/axis/services/diana?wsdl
    http://66.240.238.41:8080/axis/services/diana?wsdl
    http://208.187.245.3:8080/axis/services/diana?wsdl
    http://64.92.170.71:6565/axis/services/diana?wsdl
    http://206.225.85.121:8080/axis/services/diana?wsdl
    The diana web-service wraps diana application from the R package . It Computes a divisive hierarchical clustering of the dataset. Diana algorithm is unique in computing a divisive hierarchy, whereas most other software for hierarchical clustering is agglomerative.
  • http://pathport.vbi.vt.edu/services/wsdls/clust.wsdl
    http://69.93.57.58:8080/axis/services/clust?wsdl
    http://66.240.238.41:8080/axis/services/clust?wsdl
    http://208.187.245.3:8080/axis/services/clust?wsdl
    http://64.92.170.71:6565/axis/services/clust?wsdl
    http://206.225.85.121:8080/axis/services/clust?wsdl
    Wraps Cluster to perform hierarchical gene clustering on microarray data.
  • http://pathport.vbi.vt.edu/services/wsdls/hclust.wsdl
    http://69.93.57.58:8080/axis/services/hclust?wsdl
    http://66.240.238.41:8080/axis/services/hclust?wsdl
    http://208.187.245.3:8080/axis/services/hclust?wsdl
    http://64.92.170.71:6565/axis/services/hclust?wsdl
    http://206.225.85.121:8080/axis/services/hclust?wsdl
    The hclust web-service wraps the Cluster application from the R package . It Computes hierarchical clustering of the dataset.
  • http://pathport.vbi.vt.edu/services/wsdls/multtest.wsdl
    http://69.93.57.58:8080/axis/services/multtest?wsdl
    http://66.240.238.41:8080/axis/services/multtest?wsdl
    http://208.187.245.3:8080/axis/services/multtest?wsdl
    http://64.92.170.71:6565/axis/services/multtest?wsdl
    http://206.225.85.121:8080/axis/services/multtest?wsdl
    Utilizes a Bioconductor package based program, Multtest, to perform statistical T-test and F-test on microarray data.
  • http://pathport.vbi.vt.edu/services/wsdls/anova.wsdl
    http://69.93.57.58:8080/axis/services/anova?wsdl
    http://66.240.238.41:8080/axis/services/anova?wsdl
    http://208.187.245.3:8080/axis/services/anova?wsdl
    http://64.92.170.71:6565/axis/services/anova?wsdl
    http://206.225.85.121:8080/axis/services/anova?wsdl
    The anova web-service wraps the Anova application from the R package . It fits an analysis of variance model by a call to 'lm' for each stratum and generates Anova table.
  • http://pathport.vbi.vt.edu/services/wsdls/beta/sam.wsdl
    http://64.92.170.71:6565/axis/services/sam?wsdl
    http://valar.bioinformatics.vt.edu:6565/axis/services/sam?wsdl
    The sam web-service wraps the native programs in Bioconductor package to perform a Significance Analysis of Microarrays (SAM). It is possible to perform one and two class analyses using either a modified t-statistic or a (standardized) Wilcoxon rank statistic, and a multiclass analysis using a modified F-statistic. The results of the kmean web-service are returned as a subset of the MADM format.
  • http://pathport.vbi.vt.edu/services/wsdls/geneidmap.wsdl
    The Geneidmap web-service queries VBI gene alias database, and allow use to map one gene id with other alias for the same gene. VBI gene alias database is an Oracle9i database hosted by VBI CCF. The data source is Affymetrix's probe annotation files (.csv files). Currently, the alias database contains 510,727 unique probe_set_ids from a variety of genomes, such as human, rat, mouse, plasmodium, E.coli, yeast, etc. geneidmap webservice is called within pathport's bioinformatics Group Suggestor and Microarray plugin.
  • http://pathport.vbi.vt.edu/services/wsdls/dnacuts.wsdl
    http://66.240.238.41:8080/axis/services/dnacuts?wsdl
    http://208.187.245.3:8080/axis/services/dnacuts?wsdl
    http://64.92.170.71:6565/axis/services/dnacuts?wsdl
    http://206.225.85.121:8080/axis/services/dnacuts?wsdl
    Digests a DNA sequence using any number of enzymes chosen from a comprehensive list using the EMBOSS restrict program. Restrict predicts restriction enzyme cleavage sites using REBASE. The REBASE database contains information on restriction enzymes, recognition sequences, cleavage sites, methylation specificity, commercial availability of the enzymes, and references (both published and unpublished observations). DNACuts allows user to select specific enzymes using default values (set at a minimum cut length of 4 bases and linear DNA) or user input values.
  • http://pathport.vbi.vt.edu/services/wsdls/beta/probedesign.wsdl
    http://69.93.57.58:8080/axis/services/probedesign?wsdl
    http://66.240.238.41:8080/axis/services/probedesign?wsdl
    http://208.187.245.3:8080/axis/services/probedesign?wsdl
    http://64.92.170.71:6565/axis/services/probedesigng?wsdl
    http://206.225.85.121:8080/axis/services/probedesign?wsdl
    Wraps the Primer3 program developed by the Whitehead Institute to choose optimal primers for PCR, DNA sequencing, and hybridization probes for microarray experiment.

    Primer3 uses the following criteria for selection:

    1. Oligonucleotide melting temperature, size, GC content, and primer-dimer formation
    2. PCR product size
    3. Positional constraints within the source sequence
  • http://pathport.vbi.vt.edu/services/wsdls/beta/yoda.wsdl
    http://69.93.57.58:8080/axis/services/yoda?wsdl
    http://66.240.238.41:8080/axis/services/yoda?wsdl
    http://208.187.245.3:8080/axis/services/yoda?wsdl
    http://64.92.170.71:6565/axis/services/yodagt?wsdl
    http://206.225.85.121:8080/axis/services/yoda?wsdl
    YODA (Yet-another Oligo Design Application) computes gene specific oligonucleotides that are free of secondary structure for genome-scale oligonucleotide microarray construction. Selection is based on three major criteria: oligonucleotide melting temperature, specificity to a single target, or at least to the shortest list of possible targets and the inability to fold into a stable secondary structure at the hybridization temperature.
  • http://pathport.vbi.vt.edu/services/wsdls/goanalysis.wsdl
    This webservice wraps the VBI GoAnalysis Program that identifies a GoTerm base on a list of gene IDs.
  • http://pathport.vbi.vt.edu/services/wsdls/signalp.wsdl
    Wraps SignalP 3.0 , which predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks and hidden Markov models.
  • http://pathport.vbi.vt.edu/services/wsdls/lipop.wsdl
    Lipop Web-Service is for prediction of lipoproteins and for discriminating between lipoprotein signalpetides, other signal peptides and n-terminal membrane helices in Gram negative bacteria. For additional information on Lipop, please refer to: http://www.cbs.dtu.dk/services/LipoP/
  • http://pathport.vbi.vt.edu/services/wsdls/interproscan.wsdl
    http://69.93.57.58:8080/axis/services/interproscan?wsdl
    http://66.240.238.41:8080/axis/services/interproscan?wsdl
    http://208.187.245.3:8080/axis/services/interproscan?wsdl
    http://64.92.170.71:6565/axis/services/interproscan?wsdl
    http://206.225.85.121:8080/axis/services/interproscan?wsdl
    Wraps EBI's InterProScan-v3.1 program which contacts multiple databases bringing their information together. Our web-service converts these results into DAS/1 format. Please note that this service can be VERY SLOW and that we have no control over the responsiveness of the databases that it contacts. For sequences of 5-10 kilobases, expect running times of 4-5 minutes. Larger sequences will require progressively longer running times. Returns DAS/1 formatted results.
  • http://pathport.vbi.vt.edu/services/wsdls/mstag.wsdl
    The MsTag web-service wraps the MS-Tag tool. A stripped down MS-Tag input form from the UCSF Mass Spectrometry Facility for people who barely know the difference between a b-ion and a y-ion. If you don't know the difference between monoisotopic and average masses, please consult your local mass spectrometrist.
  • http://pathport.vbi.vt.edu/services/wsdls/msfit.wsdl
    The MS-Fit web-service wraps MS-Fit tool. A peptide mass fingerprinting tool from the UCSF mass spectometry facility that tries to fit a users mass spectrometry data into a protein sequence in an existing database and thus suggest the identity of the users protein.
  • http://pathport.vbi.vt.edu/services/wsdls/proteinpredict.wsdl
    The Proteinpredict web-service uses P2SL package which infers protein targeting based on implicit motif frequency distribution of protein sequences. Targeting-signal is modeled based on the distribution of subsequence occurrences (implicit motifs) using self-organizing maps. The boundaries among the classes were then determined by a set of support vector machines. P2SL is a hybrid computational system that predicts over ER targeted, cytosolic, mitochondrial and nuclear protein localization classes. The results of the Proteinpredict web-service are returned as a subset of the DAS1 format. For additional information on P2SL please refer to: http://staff.vbi.vt.edu/volkan/p2sl/
  • http://pathport.vbi.vt.edu/services/wsdls/tagident.wsdl
    The Tagident web-service wraps the TagIdent tool available at TagidentTool (formerly GuessProt). This web-service generates a list of proteins close to a given pI and Mw. It can also identify proteins by tagging a short sequence of Amino Acids up to 6 Amino acids to be precise. Search can be made specific to a species by using the "species" option.
  • http://pathport.vbi.vt.edu/services/wsdls/psipred.wsdl
    Psipred Web-Service is for prediction of secondary structure. For additional information on Psipred, please refer to: http://bioinf.cs.ucl.ac.uk/psipred/
  • http://pathport.vbi.vt.edu/services/wsdls/memsat.wsdl
    Memsat Web-Service is for prediction of secondary structure and topology of all-helix integral membrane proteins based on the recognition of topological models. For additional information on memsat, please refer to: http://bioinf.cs.ucl.ac.uk/psipred/
  • http://pathport.vbi.vt.edu/services/wsdls/rbsfinder.wsdl
    The RBSFinder web-service predicts ribosome binding sites(RBS) in the upstream regions of the genes annotated by Glimmer2. If there is no RBS-like patterns in this region, program searches for a start codon having a RBS-like pattern ,in the same reading frame upstream or downstream and relocates start codon accordingly. For additional information on RBSFinder refer to: http://www.tigr.org/software/

No Cost License Restricted Services:

  • http://pathport.vbi.vt.edu/services/wsdls/contig.wsdl
    Creates assembled contigs from trace files. Pred is used for base calling, and read trace data from chromatogram files in the SCF, ABI, and ESD formats. the Sanger Institute's Cross_Match program is used to remove contaminating vectors, and CAP3 performs the final assembly: It makes use of base quality values in constructing an alignment of sequence reads and generating a consensus sequence for each contig. CAP3 also uses a large number of forward-reverse constraints to locate and correct errors in the layout of sequence reads allowingCAP3 to address assembly errors caused by repeats. Results are returned in DAS/1 format.
  • http://pathport.vbi.vt.edu/services/wsdls/genscan.wsdl
    Wraps the GENSCAN program for doing gene prediction. It analyzes genomic DNA sequences from a variety of organisms including humans and other vertebrates, invertebrates, and plants. For each sequence, the GENSCAN web-service determines the most likely gene structure under a probabilistic model of the gene structural and compositional properties of the genomic DNA for the given orgamism. GENSCAN predicts genes based on a pre-built probabilistic model. The probabilistic model accounts for many of the essential gene structural properties of genomic sequences, e.g., typical gene density, the typical number of exons per gene, the distribution of exon sizes for different types of exons. GENSCAN also builds the model based on many of the important compositional properties of genes, e.g., the reading frame-specific hexamer composition of coding regions vs the (reading frame-independent) hexamer composition of introns and intergenic regions, and the position-specific signals, and of the TATA box, cap site, and polyadenylation signals. Importantly, novel models of the donor and acceptor splice sites are used which capture potentially important dependencies between positions in these signals. For human and vertebrate sequences, separate sets of model parameters are used which account for the manysubstantial differences in gene density and structure observed in distinct C+G% compositional regions of the human genome and the genomes of other vertebrates. Results are returned in DAS/1 format.
  • http://pathport.vbi.vt.edu/services/wsdls/genemark.wsdl
    Wraps the GeneMark program for doing gene prediction. GeneMark is a family of gene prediction programs provided by Mark Borodovsky's Bioinformatics Group at the Georgia Institute of Technology. It predicts genes and intergenic regions using Hidden Markov models of coding and non-coding regions in prokaryotic or eukaryotic genomic DNA sequences. Results are returned in DAS/1 format.
  • http://pathport.vbi.vt.edu/services/wsdls/grailexp.wsdl
    Wraps PERCEVAL application (Protein-coding exon, repetitive, and CpG-Island Evaluator) contained within the GrailEXP gene discovery suite for doing gene prediction. It analyzes genomic DNA sequences from a variety of eukaryotes. This web-service produces a list of possible Grail Exon Candidates using the Grail neural network as being a potential exon. Results are returned in DAS/1 format.
  • http://pathport.vbi.vt.edu/services/wsdls/orpheus.wsdl
    Wraps the Orpheus system for doing bacterial gene prediction in large genomic fragments or completed genomes. The analysis starts with a database similarity search and identification of reliable gene fragments. The latter are used to derive statistical characteristics of protein-coding regions and ribosome-binding sites (RBS). At the final step, the complete set of genes in the analyzed genome is predicted, with special attention paid to correct gene start identification. Results are returned in DAS/1 format. NOTE: This program is EXTREMELY slow, requiring days to finish where the previous programs take minutes.
  • http://pathport.vbi.vt.edu/services/wsdls/fgenesh.wsdl
    The FGENESH web-service is a general-purpose gene identification web-service that analyzes genomic DNA sequences from a variety of organisms including humans and other vertebrates, invertebrates, and plants. For each sequence, the FGENESH web-service determines the most likely gene structure under a probabilistic model of the gene structural and compositional properties of the genomic DNA for the given organism.

For Pay Restricted Services:

    NO SERVICES IN THIS CATEGORY

Private VBI Services:

    NO SERVICES IN THIS CATEGORY

Last modified on 11Jun2008