ToolBus/PathPort Web-Services
The PathPort project at VBI has implemented a number of web-services
that provide the core data access and analysis tools capabilities for the
system. These tools are utilized to provide the information that is displayed
via the visualization plugins within the context of ToolBus.
Most of these web-services simply wrap existing programs (e.g., Primer3,
MUMmer 3.0) and databases (e.g., VBI's own bacteria phylogenetic and
annotated DNA databases). Many of these resources are open-source
and have no usage restrictions while others have restrictions.
Web-services can have restricted access because of the licensing terms
of the underlying algorithm/software. Some programs, such as the
GENSCAN
have been acquired by VBI under the terms of an academic license, and
therefore cannot be made available outside of VBI.
Please read about VBI's AAA Ticket system
for information about how it works and how to get an account to enable you
to use these premium services.
IMPORTANT NOTE:
For those of you working from behind VERY restrictive firewalls,
please note that all VBI services are published on ports 6565 and 7575.
Data Services
Sequence Alignment, Similarity Search and Comparison
Gene Prediction
Statistic Analysis
Others
Freely Available Public Services
-
http://pathport.vbi.vt.edu/services/wsdls/beta/vbidas.wsdl
http://208.187.245.3:8080/axis/services/vbidas?wsdl
http://64.92.170.71:6565/axis/services/vbidas?wsdl
Makes available sequence and annotation information from VBI's version of
NCBI's RefSeq and/or Genbank database for viri, bacteria, archea, plasmodia, mouse,
rat, mosquito, fruit fly genomes. The MySQL relational database schema
was designed to follow Distributed Annotation System (DAS)
standard.
-
http://pathport.vbi.vt.edu/services/wsdls/embldas.wsdl
http://69.93.57.58:8080/axis/services/embldas?wsdl
http://66.240.238.41:8080/axis/services/embldas?wsdl
http://208.187.245.3:8080/axis/services/embldas?wsdl
http://64.92.170.71:6565/axis/services/embldas?wsdl
http://206.225.85.121:8080/axis/services/embldas?wsdl
Utilizes a European Bioinformatics
Institute (EBI)'s XEMBL web-service
to retrieve organism DNA and annotations from EMBL database. XEMBL returns
the results in BSML format, which are then converted into DAS/1 by embldas.
EBI's service can be somewhat slow, thus slowing this service's
response. Generally it should take no more than several minutes.
Please also note that this service's availability is dependent
upon the availability of EBI's service (over which we have no
control).
-
http://pathport.vbi.vt.edu/services/wsdls/beta/genometool.wsdl
http://64.92.170.71:6565/axis/services/genometool?wsdl
Built on the top of Vbidas search tool. User can navigate chromosome
data and corresponding contig assembly, followed by sequence and annotation
information from VBI's version of NCBI's RefSeq and/or Genbank database for viri, bacteria,
archea, plasmodia, mouse, rat, mosquito, fruit fly and human genomes.
-
http://pathport.vbi.vt.edu/services/wsdls/guspathport.wsdl
GUS web service provides genome sequence and feature annotations.
The database was developed on
Genomics Unified Schema (GUS) platform.
GUS is built on a strongly-typed relational schema and includes the
GUS Application Framework to assist in data acquisition and analysis tool development.
The GUS schema integrates genome, transcript, and proteome of one or more organisms, gene regulation
and networks, ontologies, gene expression and inter-organism comparisons.
http://pathport.vbi.vt.edu/services/wsdls/guspathport.wsdl
Guspathport contacts an Oracle database that uses RefSeq as the genome data source.
http://pathport.vbi.vt.edu/services/wsdls/guspatric.wsdl
Guspatric contacts the database that contains annotation for organisms for the
Patric project, one of the eight
Bioinfomatics Resource Centers (BRC) project funded by National Institutes of Health through
the National Institute of Allergy and
Infectious Diseases (NIAID) .
-
http://pathport.vbi.vt.edu/services/wsdls/beta/pathway.wsdl
A database of protein interaction pathways involved in the infection process for pathogens
from both the CDC's and NIAID's A, B, and C pathogen lists. Currently only pathways for a
few pathogens are present, but VBI's researchers are continuing to gather and curate data
for this fully referenced and growing database. If you have specific questions, concerns, or
suggestions regarding this information please email us at
pronet@vbi.vt.edu
-
http://pathport.vbi.vt.edu/services/wsdls/beta/piml.wsdl
A database of background information on pathogens from both the
CDC's and NIAID's A, B, and C pathogen lists
(CDC
list,
NIAID
list). Includes
taxonomic, epidemiological, laboratory work, and host information.
This fully referenced data has been gathered and curated by
VBI researchers. If you have specific questions, concerns, or
suggestions regarding this information please email us at
pathinfo@vbi.vt.edu
The piml web-service retrieves pathogen background information including sections on
taxonomy, organism, epidemiology, host-list, and labwork. The information
is fully referenced with links to PubMed abstracts when available.
Currently, the pathogens are documented in Xindice database. Xindice is a database designed
to store XML data (commonly referred to as a native XML database). The user selects a specific
pathogen and the corresponding pathogen ID is submitted to Xindice database, whereupon,
the returned pathogen document is displayed by the PIML viewer.
-
http://pathport.vbi.vt.edu/services/wsdls/beta/pimllink.wsdl
This webservice takes an organism name and tries to find a close
match in the pathogen information database for it. It then retrieve pathogen
information from PIML webservice. Currently it is used
by the phylogenetic viewer so that users can click on tree tips to find
corresponding background information on those tips if available in the database.
Algorithm: Pimllink webservice apply a
similarity metric algorithm to get the most approximate pathogen name
which is matching the user provided pathogen name, then fetch pathgen
information with this most approximate matching string.
PimlLink applied a new string similarity metric algorithm that rewards
both common substrings and a common ordering of those substrings. In
addition, the algorithm considers not only the single longest common
substring, but other common substrings as well. The solution is to find out
how many adjacent character pairs are contained in both strings by
considering adjacent characters: not only the characters, but also
the character ordering in the original string are taken into account,
since each character pair contains some information about the original ordering.
-
http://pathport.vbi.vt.edu/services/wsdls/beta/phylogenetic.wsdl
http://206.225.85.121:8080/axis/services/phylogenetic?wsdl
Wraps a sophisticated database and set of associated phylogenetic
construction tools to create phylogenetic trees for bacterial
organisms and their genes. You can even give your own sequence
for inclusion in the tree.
The database and tools this services
makes were (and continue to be) developed under the direction
of Dr.
Allan Dickerman at VBI.
-
http://pathport.vbi.vt.edu/services/wsdls/phylip.wsdl
http://69.93.57.58:8080/axis/services/phylip?wsdl
http://66.240.238.41:8080/axis/services/phylip?wsdl
http://208.187.245.3:8080/axis/services/phylip?wsdl
http://64.92.170.71:6565/axis/services/phylip?wsdl
http://206.225.85.121:8080/axis/services/phylip?wsdl
The Phylip web-service takes a group of sequences to generate a tree using
Phylip package and
PAML package.
The Phylip package contains many programs for inferring phylogenies. It carries out different
algorithms on different kinds of data.
PROTPARS estimates phylogenies from protein sequences(input using the standard one-letter code
for amino acids) using the parsimony method, in a variant which counts only those nucleotide
changes that change the amino acid, on the assumption that silent changes are more easily accomplished.
DNAPARS estimates phylogenies by the parsimony method using nucleic acid sequences. Allows use
the full IUB ambiguity codes, and estimates ancestral nucleotide states. Gaps treated as a
fifth nucleotide state.
DNAML estimates phylogenies from nucleotide sequences by maximum likelihood. The model employed
allows for unequal expected frequencies of the four nucleotides, for unequal rates of transitions
and transversions, and for different (prespecified) rates of changes in different categories of
sites, with the program inferring which sites have which rates.
PAML package contains programs for phylogenetic analyses of DNA or protein sequences using
maximum likelihood (ML).
The program CODEML is formed by merging two old programs: codonml, which implements the codon
substitution method of Goldman and Yang(1994) for protein-coding DNA sequences, and aaml,
which implements models for amino acid sequences. These two are now distinguished by a
variable named seqtype in the control file codeml.ctl, that is, 1 for codon sequences and
2 for amino acid sequences.
-
http://pathport.vbi.vt.edu/services/wsdls/beta/blastbt.wsdl
Uses the Linux Beowulf cluster at
VBI's Core
Computational Facility to run BLASTN, BLASTP, and translating
BLAST analyses on multiple organism databases
simultaneously and to combine the answers before returning the
results in BLAST XML format.
-
http://pathport.vbi.vt.edu/services/wsdls/beta/timelogicblast.wsdl
Uses the Timelogic Server at VBI's Core Computational Facility to
run BLASTN, BLASTP, and translating BLAST analyses on multiple
organism databases simultaneously and to combine the answers
before returning the results in BLAST XML format.
-
http://pathport.vbi.vt.edu/services/wsdls/beta/blastlocal.wsdl
Uses PathPort server to run BLASTN, BLASTP, and translating
BLAST analyses on multiple organism databases
simultaneously and to combine the answers before returning the
results in BLAST XML format.
-
http://pathport.vbi.vt.edu/services/wsdls/msblast.wsdl
MS Blast
is a specialized BLAST -based protocol developed for identification of proteins
by sequence similarity searches using peptide sequences produced by the interpretation
of tandem mass spectra. a specialized BLAST -based protocol developed for identification
of proteins by sequence similarity searches using peptide sequences produced by the
interpretation of tandem mass spectra.
-
http://pathport.vbi.vt.edu/services/wsdls/ssaha.wsdl
Ssaha web service is to use
ssaha (Sequence Search and Alignment by Hashing Algorithm) for rapidly finding near exact matches in DNA or
protein databases at VBI. For sequence search on DNA level, choices of genome sequences and
CDS sequence are provided. Meanwhile. DNA sequence search can be performed according to
synonymous codon translation. Protein sequence can be searched against protein database
or nucleotide (genome or CDS) database on basis of translation. Returned result will be
mapped on Homolog model and viewable in Homolog viewer.
-
http://pathport.vbi.vt.edu/services/wsdls/ssahasnp.wsdl
ssahaSNP applies
ssaha (Sequence Search and Alignment by Hashing Algorithm)
to high-throughput SNP detection, in which high quality region of the base sequence
from each trace read is compared to reference genome sequence. Both query sequence
and reference sequence should have associated quality values, which assures that SNPs
are only detected in high quality region.
ssahaSNP protocol is used in The SNP Consortium. ssahaSNP web service
is to use ssaha algorithm for rapidly finding near exact matches in DNA or protein databases
at VBI. For sequence search on DNA level, choices of genome sequences and CDS sequence are
provided. Meanwhile. DNA sequence search can be performed according to synonymous codon translation.
Protein sequence can be searched against protein database or nucleotide (genome or CDS) database
on basis of translation.
- http://pathport.vbi.vt.edu/services/wsdls/timelogicsmithwaterman.wsdl
Utilize VBI's recently acquired
TimeLogic FPGA hardware system to run the Smith-Waterman
algorithm on multiple organism databases.
-
http://pathport.vbi.vt.edu/services/wsdls/beta/msa.wsdl
http://69.93.57.58:8080/axis/services/msa?wsdl
http://66.240.238.41:8080/axis/services/msa?wsdl
http://208.187.245.3:8080/axis/services/msa?wsdl
http://64.92.170.71:6565/axis/services/msagt?wsdl
http://206.225.85.121:8080/axis/services/msa?wsdl
Wraps EBI's
ClustalW program
for doing global multiple sequence alignments for either DNA or protein sequences.
This web-service accepts multiple sequence data files. Supported input formats include
FASTA, EMBL/SwissProt, and NBRF/PIR. Results are returned in MSAML format.
-
http://pathport.vbi.vt.edu/services/wsdls/beta/water.wsdl
http://66.240.238.41:8080/axis/services/water?wsdl
http://208.187.245.3:8080/axis/services/water?wsdl
http://64.92.170.71:6565/axis/services/watergt?wsdl
http://206.225.85.121:8080/axis/services/water?wsdl
Wraps the water local alignment program from the
EMBOSS
suite.
Water uses the Smith-Waterman algorithm (modified for speed enhancements),
a dynamic programming approach to calculate the local alignment between two sequences.
Local alignment methods are very useful for scanning databases or when matches between
small regions of sequences (e.g. between protein domains) are needed .
Results are returned in MSAML format.
-
http://pathport.vbi.vt.edu/services/wsdls/beta/stretcher.wsdl
http://66.240.238.41:8080/axis/services/stretcher?wsdl
http://208.187.245.3:8080/axis/services/stretcher?wsdl
http://64.92.170.71:6565/axis/services/stretchergt?wsdl
http://206.225.85.121:8080/axis/services/stretcher?wsdl
Wraps the stretcher global alignment program from the
EMBOSS
suite.
Stretcher finds the best global alignment between two sequences.
In a global pairwise alignment it is assumed that the two sequences
have diverged from a common ancestor and the program should try to stretch the two sequences.
Stretcher introduces gaps where necessary, in order to show the alignment
over the whole length of the two sequences that best illustrates their similarities.
Stretcher calculates and finds an optimal global alignment with a modification of
the classic dynamic programming algorithm using linear space.
Returns MSAML formatted results.
-
http://pathport.vbi.vt.edu/services/wsdls/blat.wsdl
BLAT , "BLAST-Like Alignment Tool",
is a software a tool which performs rapid mRNA/DNA and
cross-species protein alignments. It is more accurate and 500 times faster than popular
existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at
sensitivity settings typically used when comparing vertebrate sequences. This webservice
is dedicated to EST genome alignment using BLAT. ESTs are a valuable resource in whole
genome annotation. Alignment of ESTs to the genome sequences can gives you intron, exon
and boundary information and is far more accurate than gene prediction programs.
-
http://pathport.vbi.vt.edu/services/wsdls/hmmer.wsdl
http://69.93.57.58:8080/axis/services/hmmer?wsdl
The Hmmer web-service wraps HMMER to compare a protein sequence against a HMM
database, for example, the TIGRFAM database, or the Pfam database, and returns
information of the protein family information for the query sequence.
For additional information on HMMER software package, please refer to:
hmmer
For additional information on TIGRFAMs database, please refer to:
Tigrfam
For additional information on Pfam database, please refer to:
Pfam
-
http://pathport.vbi.vt.edu/services/wsdls/rfam.wsdl
http://69.93.57.58:8080/axis/services/rfam?wsdl
The Rfam web-service query the Rfam database and return RNA family information.
Covariance model searches are extremely compute intensive. A small model (like tRNA)
can search a sequence database at a rate of around 300 bases/sec. The compute time
scales roughly to the 4th power of the length of the RNA, so larger models quickly
become infeasible without significant compute resources. The Rfam web-service wraps
rfam_scan.pl script, wich uses presreening BLAST to increase search speed For
additional information on Rfam database, please
refer to: Rfam.
For additional information about Infernal, please refer to:
Infernal.
-
http://pathport.vbi.vt.edu/services/wsdls/cognitor.wsdl
http://69.93.57.58:8080/axis/services/cognitor?wsdl
The Cognitor web-service compares a protein sequence against COG database, and
returns information of the COG that the query protein belong to.
For additional information on Cognitor and COG refer to:
COGnitor
-
http://pathport.vbi.vt.edu/services/wsdls/blocks.wsdl
http://69.93.57.58:8080/axis/services/blocks?wsdl
The Blocks web-service compares a protein sequence against BLOCKS database, and
returns information of the block hit for the query protein.
For additional information on Blacks, please refer to:
Blocks
-
http://pathport.vbi.vt.edu/services/wsdls/beta/mummer.wsdl
http://64.92.170.71:6565/axis/services/mummergt?wsdl
Wraps the MUMmer 3.0
program developed by The Institute for
Genomic Research (TIGR) for doing large sequence comparisons.
It rapidly aligns entire genomes, as well as finds "Maximum Unique Matches" (MUMs)
between two DNA or protein sequences.
The central algorithm of MUMmer takes two input sequences and finds all subsequences
longer than a specified minimum length that are identical between the two input sequences
and their reverse complement. These matches are guaranteed to be maximal,
in that they cannot be extended on either end without incurring a mismatch.
The algorithm is implemented using a suffix-tree based data structure,
which permits very fast and memory-efficient comparisons of the sequences.
Currently, VBI hosts the following mummer flavored webservices,
each running the same algorithm but contacting different databases.
-
http://pathport.vbi.vt.edu/services/wsdls/beta/mummer.wsdl
The MySQL database contains sequence and annotation information from
NCBI's RefSeq and/or Genbank database.
-
http://pathport.vbi.vt.edu/services/wsdls/mummergus.wsdl
The Oracle database was developed on
Genomics Unified Schema (GUS) platform and use RefSeq as
the genome data source.
-
http://pathport.vbi.vt.edu/services/wsdls/mummerpatric.wsdl
The database contains annotation for organisms for the Patric
project, one of the eight
Bioinfomatics Resource Centers (BRC) projects funded by National Institutes of Health through
the National Institute of Allergy and
Infectious Diseases (NIAID) .
-
http://pathport.vbi.vt.edu/services/wsdls/sean.wsdl
Sean webservice scans the aligned
sequences a character at a time to find potential SNPs by comparing the individual
character with the corresponding character in the consensus. If there is a difference in
the sequence either side is checked against the same region in the consensus over the
required window. By default this is set to 15 bp and is configurable by the user.
If the windows are identical the position of the base change and the base are noted,
only if at least one other identical base change is located at the same location in
another sequence is it stored as a potential SNP. The only other check is to ensure
that the consensus character is present in at least two sequences also as the
consensus does not always contain the dominant base at a particular position. The results
are returned in DAS/1 format.
-
http://pathport.vbi.vt.edu/services/wsdls/beta/glimmer.wsdl
http://69.93.57.58:8080/axis/services/glimmer?wsdl
http://66.240.238.41:8080/axis/services/glimmer?wsdl
http://208.187.245.3:8080/axis/services/glimmer?wsdl
http://64.92.170.71:6565/axis/services/glimmergt?wsdl
http://206.225.85.121:8080/axis/services/glimmer?wsdl
Wraps the
Glimmer 2.10
program, from The Institute for
Genomic Research (TIGR), for doing gene prediction
in genomic DNA sequences, especially the genomes of bacteria and archaea.
It uses interpolated Markov models (IMMs) to identify the coding regions
and distinguish them from noncoding DNA.
Glimmer consists of two main programs. The first is the training program,
"build-imm", which takes an input set of sequences and builds and outputs
the IMM for them. These sequences can be complete genes or just partial
open reading frames (ORFs).
The second program is Glimmer, which uses IMM to identify putative genes
in entire genome. Glimmer automatically resolves conflicts between most
overlapping genes by choosing one of them. It also identifies genes that
are suspected to truly overlap, and flags these for closer inspection by the user.
These ``suspect'' gene candidates have been a very small percentage of the total
for all the genomes analyzed thus far.
Results are returned in DAS/1 format.
-
http://pathport.vbi.vt.edu/services/wsdls/glimmerm.wsdl
http://69.93.57.58:8080/axis/services/glimmerm?wsdl
http://66.240.238.41:8080/axis/services/glimmerm?wsdl
http://208.187.245.3:8080/axis/services/glimmerm?wsdl
http://64.92.170.71:6565/axis/services/glimmerm?wsdl
http://206.225.85.121:8080/axis/services/glimmerm?wsdl
Wraps the GlimmerM
program, from The Institute for
Genomic Research (TIGR), for doing gene prediction
for eukaryotic genomic DNA sequences.
"It is based on a dynamic programming algorithm that considers all combinations of
possible exons for inclusion in a gene model and chooses the best of these combinations.
The decision about what gene model is best is a combination of the strength of
the splice sites and the score of the exons generated by an interpolated Markov model (IMM).
The system has been trained for Arabidopsis thaliana and Aspergillus and should work well
on closely related organisms."
Results are returned in DAS/1 format.
-
http://pathport.vbi.vt.edu/services/wsdls/glimmerhmm.wsdl
glimmerhmm webservice wraps GlimmerHMM ,
a gene finder program based on a Generalized Hidden Markov Model (GHMM).
"Although the gene finder conforms to the overall mathematical framework of a GHMM,
additionally it incorporates splice site models adapted from the GeneSplicer program
and a decision tree adapted from GlimmerM. It also utilizes Interpolated Markov Models
for the coding and noncoding models . Currently, GlimmerHMM's GHMM structure includes
introns of each phase, intergenic regions, and four types of exons (initial, internal,
final, and single)".
-
http://pathport.vbi.vt.edu/services/wsdls/tigrscan.wsdl
http://69.93.57.58:8080/axis/services/tigrscan?wsdl
http://66.240.238.41:8080/axis/services/tigrscan?wsdl
http://208.187.245.3:8080/axis/services/tigrscan?wsdl
http://64.92.170.71:6565/axis/services/tigrscan?wsdl
http://206.225.85.121:8080/axis/services/tigrscan?wsdl
The TIGRSCAN web-service wraps the native
TIGRSCAN application to do gene prediction.It is a gene finder based on the
Generalized Hidden Markov Model framework
-
http://pathport.vbi.vt.edu/services/wsdls/unveil.wsdl
The unveil web-service is a gene identification web-service which analyzes genomic DNA
sequences from a variety of organisms. The webservice utilizes Hidden Markov Model to
identify non-overlapping gene models on either or both strands during a single pass over
a given DNA sequence. Individual components of the model can be independently retrained
using the Baum-Welch EM algorithm.
-
http://pathport.vbi.vt.edu/services/wsdls/snap.wsdl
Wraps Semi-HMM-based Nucleic Acid Parser (SNAP).
SNAP predicts transcript models solely on the basis of the underlying genomic sequence and
does not take any experimental evidence into account. SNAP Protein Reports are not available
for all species, but SNAP performs better than GENSCAN in some species.
-
http://pathport.vbi.vt.edu/services/wsdls/rrnascan.wsdl
http://69.93.57.58:8080/axis/services/rrnascan?wsdl
The rRNAscan web-service identifies ribosomal RNA genes in genomic DNA or RNA
sequences. Currently it detects 5S rRNA by initially building matrices or profiles
from multiple ribosomal RNA sequence alignments and then finding matches between the
concensus sequence from the GRIBSKOV or HENIKOFF profile / matrix and the query
sequence.
-
http://pathport.vbi.vt.edu/services/wsdls/trnascan.wsdl
http://69.93.57.58:8080/axis/services/trnascan?wsdl
http://66.240.238.41:8080/axis/services/trnascan?wsdl
http://208.187.245.3:8080/axis/services/trnascan?wsdl
http://64.92.170.71:6565/axis/services/trnascan?wsdl
http://206.225.85.121:8080/axis/services/trnascan?wsdl
Detects transfer RNA genes in DNA sequences using the
tRNAscan-SE-v1.23
program developed by
the Department of Genetics at Washington University's School of
Medicine (St. Louis) and the Department of Genetics at Stanford
University.
The web-service works in three main phases:
- tRNAscan detects tRNA by looking for short, well conserved intragenic
promoter sequences found in the TPC and D arm regions of prototypic tRNAs.
- tRNAscan extracts the DNA subsequences identified as possible tRNAs and
passes these segments to an RNA search program that employs the Cove program.
- tRNAscan takes confirmed tRNAs and runs another Cove program that
displays RNA secondary structure.
Returns DAS/1 formatted results.
-
http://pathport.vbi.vt.edu/services/wsdls/tfscan.wsdl
http://66.240.238.41:8080/axis/services/tfscan?wsdl
http://208.187.245.3:8080/axis/services/tfscan?wsdl
http://64.92.170.71:6565/axis/services/tfscan?wsdl
http://206.225.85.121:8080/axis/services/tfscan?wsdl
Wraps the tfscan transcription factor binding site prediction program from
the EMBOSS
suite.
The TRANSFAC Database is a database of eukaryotic cis-acting regulatory DNA elements
and trans-acting factors. It covers a range of organisms from yeast to human.
The SITE data from TRANSFAC contains information on individual, putative, regulatory protein
binding sites. It has been divided into the following taxonomic groups: Fungi, Insects, Plants,
Vertebrates, and Other. Tfscan takes an input sequence, combined with a user-selected organism,
and performs a fast match of the TRANSFAC sequences against the input sequence.
The result is a list of positions which match the binding sites in the TRANSFAC SITE database,
returned in DAS/1 formatted.
-
http://pathport.vbi.vt.edu/services/wsdls/repeatmasker.wsdl
RepeatMasker is a program that screens DNA sequences for interspersed repeats and
low complexity DNA sequences. The output of the program is a detailed annotation of
the repeats that are present in the query sequence as well as a modified version of
the query sequence in which all the annotated repeats have been masked (default:
replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently
will be masked by the program
-
http://pathport.vbi.vt.edu/services/wsdls/rankgene.wsdl
Rankgene web-service will analyze gene expression data, feature selection and rank genes
based on the predictive power of each gene to classify samples into functional or disease categories.
-
http://pathport.vbi.vt.edu/services/wsdls/tico.wsdl
Tico web service wraps up TiCo package, "a tool for postprocessing predictions of prokaryotic
genes with the objective to improve the accuracy of annotated Translation Initiation Sites (TIS)."
"Currently the tool can be used to analyze and reannotate predictions obtained by the program GLIMMER2
(Delcher et al., 1999). The underlying algorithm of TiCo is based on a completely unsupervised method for
scoring potential translation starts. The method works without specific assumptions about characteristic
sequence features such as Shine-Dalgarno motifs or start codon usage. In addition, empirical thresholds have
been avoided in order to prevent the algorithm from specialization with respect to particular test data."
For additional information on Tico, please refer to:
http://tico.gobics.de/index.jsp
-
http://pathport.vbi.vt.edu/services/wsdls/rsom.wsdl
http://69.93.57.58:8080/axis/services/rsom?wsdl
http://66.240.238.41:8080/axis/services/rsom?wsdl
http://208.187.245.3:8080/axis/services/rsom?wsdl
http://64.92.170.71:6565/axis/services/rsom?wsdl
http://206.225.85.121:8080/axis/services/rsom?wsdl
Utilizes an R package based
program,
GeneSom,
which uses self-organizing maps (SOM) to perform cluster for microarray data analysis.
"GeneSom applies SOM algorithm for gene clustering of microarray data.
SOM is an unsupervised neural network learning algorithm for the analysis and organization
of gene expression profiles. It puts a number of vectors into a high-dimensional input data space
to place the data sets in an ordered fashion. It constructs a nonlinear projection
of the data to a "map" display that is used for visualizing similarities, relationships,
and cluster structures.The same SOM display can be used for visualizing the relationships
between data sets, such as the gene expression and the functional classes of gene."
-
http://pathport.vbi.vt.edu/services/wsdls/agnes.wsdl
http://69.93.57.58:8080/axis/services/agnes?wsdl
http://66.240.238.41:8080/axis/services/agnes?wsdl
http://208.187.245.3:8080/axis/services/agnes?wsdl
http://64.92.170.71:6565/axis/services/agnes?wsdl
http://206.225.85.121:8080/axis/services/agnes?wsdl
Utilizes an R package based
program, Agnes,
which uses agglomerative clustering methods to perform hierarchical cluster data analysis.
Unlike other agglomerative clustering methods such as hclust, agnes yields the
agglomerative coefficient which measures the amount of clustering structure found.
Algorithm outline: At the beginning, each observation is a
small cluster by itself. Clusters are then merged until only one large cluster remains which
contains all observations. At each stage, two nearest clusters are combined to form
one large cluster.
-
http://pathport.vbi.vt.edu/services/wsdls/kmean.wsdl
http://69.93.57.58:8080/axis/services/kmean?wsdl
http://66.240.238.41:8080/axis/services/kmean?wsdl
http://208.187.245.3:8080/axis/services/kmean?wsdl
http://64.92.170.71:6565/axis/services/kmean?wsdl
http://206.225.85.121:8080/axis/services/kmean?wsdl
Utilizes an R package based
program, Kmeans,
to perform non-hierarchical cluster data analysis.
Algorithm outline: This nonhierarchical method initially takes the number of components of
the population equal to the final required number of clusters. In this step itself the
final required number of clusters is chosen such that the points are mutually farthest apart.
Next, it examines each component in the population and assigns it to one of the clusters
depending on the minimum distance. The centroid's position is recalculated every time a
component is added to the cluster and this continues until all the components are grouped into
the final required number of clusters.
-
http://pathport.vbi.vt.edu/services/wsdls/rpca.wsdl
http://69.93.57.58:8080/axis/services/rpca?wsdl
http://66.240.238.41:8080/axis/services/rpca?wsdl
http://208.187.245.3:8080/axis/services/rpca?wsdl
http://64.92.170.71:6565/axis/services/rpca?wsdl
http://206.225.85.121:8080/axis/services/rpca?wsdl
Utilizes an R package based
program, Princomp,
to perform statistical principle component analysis on microarray data.
PCA is designed to capture the variance in a dataset in terms of principle components.
It is trying to reduce the dimensionality of the data to summarize the most important
(i.e. defining) parts while simultaneously filtering out noise.
PCA is a classical statistical approach for classifying samples of unknown classes.
As a "supervised" computer learning method, pca exploits prior knowledge
of gene function to identify unknown samples of similar function from expression data.
-
http://pathport.vbi.vt.edu/services/wsdls/rsvm.wsdl
http://69.93.57.58:8080/axis/services/rsvm?wsdl
http://66.240.238.41:8080/axis/services/rsvm?wsdl
http://208.187.245.3:8080/axis/services/rsvm?wsdl
http://64.92.170.71:6565/axis/services/rsvm?wsdl
http://206.225.85.121:8080/axis/services/rsvm?wsdl
Utilizes an R package based
program, SVM,
to perform classification on microarray data.
SVM is used to train a support vector machine. It can be used to perform general regression,
classification, and density-estimation as well.
SVM is considered as a supervised computer learning method
because they exploit prior knowledge of gene function to identify unknown genes of
similar functions from expression data.
-
http://pathport.vbi.vt.edu/services/wsdls/rknn.wsdl
http://69.93.57.58:8080/axis/services/rknn?wsdl
http://66.240.238.41:8080/axis/services/rknn?wsdl
http://208.187.245.3:8080/axis/services/rknn?wsdl
http://64.92.170.71:6565/axis/services/rknn?wsdl
http://206.225.85.121:8080/axis/services/rknn?wsdl
Utilizes an R package based
program, KNN,
to perform classification on microarray data.
KNN (k-nearest neighbor classification) method classifies test set from training set.
For each row of the test set, the `k' nearest (in Euclidean distance) training set vectors
are found, and the classification is decided by majority vote, with ties broken at random.
If there are ties for the `k'th nearest vector, all candidates are included in the vote.
-
http://pathport.vbi.vt.edu/services/wsdls/rlda.wsdl
http://69.93.57.58:8080/axis/services/rlda?wsdl
http://66.240.238.41:8080/axis/services/rlda?wsdl
http://208.187.245.3:8080/axis/services/rlda?wsdl
http://64.92.170.71:6565/axis/services/rlda?wsdl
http://206.225.85.121:8080/axis/services/rlda?wsdl
Utilizes an R package based
program, LDA,
to perform classification on microarray data.
Linear discriminant analysis (LDA) is a classical statistical approach for classifying samples
of unknown classes. LDA is considered a supervised computer learning method because it
exploits prior knowledge of gene function to identify unknown samples of similar function
from expression data.
-
http://pathport.vbi.vt.edu/services/wsdls/diana.wsdl
http://69.93.57.58:8080/axis/services/diana?wsdl
http://66.240.238.41:8080/axis/services/diana?wsdl
http://208.187.245.3:8080/axis/services/diana?wsdl
http://64.92.170.71:6565/axis/services/diana?wsdl
http://206.225.85.121:8080/axis/services/diana?wsdl
The diana web-service wraps
diana application
from the R package . It Computes a divisive
hierarchical clustering of the dataset. Diana algorithm is unique in computing a divisive hierarchy,
whereas most other software for hierarchical clustering is agglomerative.
-
http://pathport.vbi.vt.edu/services/wsdls/clust.wsdl
http://69.93.57.58:8080/axis/services/clust?wsdl
http://66.240.238.41:8080/axis/services/clust?wsdl
http://208.187.245.3:8080/axis/services/clust?wsdl
http://64.92.170.71:6565/axis/services/clust?wsdl
http://206.225.85.121:8080/axis/services/clust?wsdl
Wraps Cluster
to perform hierarchical gene clustering on microarray data.
-
http://pathport.vbi.vt.edu/services/wsdls/hclust.wsdl
http://69.93.57.58:8080/axis/services/hclust?wsdl
http://66.240.238.41:8080/axis/services/hclust?wsdl
http://208.187.245.3:8080/axis/services/hclust?wsdl
http://64.92.170.71:6565/axis/services/hclust?wsdl
http://206.225.85.121:8080/axis/services/hclust?wsdl
The hclust web-service wraps the Cluster application
from the R package . It Computes
hierarchical clustering of the dataset.
-
http://pathport.vbi.vt.edu/services/wsdls/multtest.wsdl
http://69.93.57.58:8080/axis/services/multtest?wsdl
http://66.240.238.41:8080/axis/services/multtest?wsdl
http://208.187.245.3:8080/axis/services/multtest?wsdl
http://64.92.170.71:6565/axis/services/multtest?wsdl
http://206.225.85.121:8080/axis/services/multtest?wsdl
Utilizes a Bioconductor package based
program, Multtest,
to perform statistical T-test and F-test on microarray data.
-
http://pathport.vbi.vt.edu/services/wsdls/anova.wsdl
http://69.93.57.58:8080/axis/services/anova?wsdl
http://66.240.238.41:8080/axis/services/anova?wsdl
http://208.187.245.3:8080/axis/services/anova?wsdl
http://64.92.170.71:6565/axis/services/anova?wsdl
http://206.225.85.121:8080/axis/services/anova?wsdl
The anova web-service wraps the Anova application
from the R package .
It fits an analysis of variance model by a call to 'lm' for each stratum and generates Anova table.
-
http://pathport.vbi.vt.edu/services/wsdls/beta/sam.wsdl
http://64.92.170.71:6565/axis/services/sam?wsdl
http://valar.bioinformatics.vt.edu:6565/axis/services/sam?wsdl
The sam web-service wraps the native programs in Bioconductor package to perform a Significance
Analysis of Microarrays (SAM). It is possible to perform one and two class analyses using either a
modified t-statistic or a (standardized) Wilcoxon rank statistic, and a multiclass analysis using a
modified F-statistic. The results of the kmean web-service are returned as a subset of the MADM format.
-
http://pathport.vbi.vt.edu/services/wsdls/geneidmap.wsdl
The Geneidmap web-service queries VBI gene alias database, and allow use to map one gene id
with other alias for the same gene.
VBI gene alias database is an Oracle9i database hosted by VBI CCF.
The data source is Affymetrix's probe annotation files
(.csv files).
Currently, the alias database contains 510,727 unique probe_set_ids from a variety of genomes,
such as human, rat, mouse, plasmodium, E.coli, yeast, etc.
geneidmap webservice is called within pathport's bioinformatics
Group Suggestor and Microarray plugin.
-
http://pathport.vbi.vt.edu/services/wsdls/dnacuts.wsdl
http://66.240.238.41:8080/axis/services/dnacuts?wsdl
http://208.187.245.3:8080/axis/services/dnacuts?wsdl
http://64.92.170.71:6565/axis/services/dnacuts?wsdl
http://206.225.85.121:8080/axis/services/dnacuts?wsdl
Digests a DNA sequence using any number of enzymes chosen
from a comprehensive list using the
EMBOSS
restrict program.
Restrict predicts restriction enzyme cleavage sites using
REBASE.
The REBASE database contains information on restriction enzymes,
recognition sequences, cleavage sites, methylation specificity,
commercial availability of the enzymes, and references
(both published and unpublished observations).
DNACuts allows user to select specific enzymes using default values
(set at a minimum cut length of 4 bases and linear DNA) or user input values.
-
http://pathport.vbi.vt.edu/services/wsdls/beta/probedesign.wsdl
http://69.93.57.58:8080/axis/services/probedesign?wsdl
http://66.240.238.41:8080/axis/services/probedesign?wsdl
http://208.187.245.3:8080/axis/services/probedesign?wsdl
http://64.92.170.71:6565/axis/services/probedesigng?wsdl
http://206.225.85.121:8080/axis/services/probedesign?wsdl
Wraps the Primer3
program developed by the Whitehead Institute
to choose optimal primers for PCR, DNA sequencing, and hybridization probes
for microarray experiment.
Primer3 uses the following criteria for selection:
- Oligonucleotide melting temperature, size, GC content, and primer-dimer formation
- PCR product size
- Positional constraints within the source sequence
-
http://pathport.vbi.vt.edu/services/wsdls/beta/yoda.wsdl
http://69.93.57.58:8080/axis/services/yoda?wsdl
http://66.240.238.41:8080/axis/services/yoda?wsdl
http://208.187.245.3:8080/axis/services/yoda?wsdl
http://64.92.170.71:6565/axis/services/yodagt?wsdl
http://206.225.85.121:8080/axis/services/yoda?wsdl
YODA (Yet-another Oligo Design Application) computes gene specific oligonucleotides
that are free of secondary structure for genome-scale oligonucleotide
microarray construction. Selection is based on three major criteria: oligonucleotide
melting temperature, specificity to a single target, or at least to the shortest list
of possible targets and the inability to fold into a stable secondary structure at the
hybridization temperature.
-
http://pathport.vbi.vt.edu/services/wsdls/goanalysis.wsdl
This webservice wraps the VBI GoAnalysis Program that identifies a GoTerm
base on a list of gene IDs.
-
http://pathport.vbi.vt.edu/services/wsdls/signalp.wsdl
Wraps SignalP 3.0 , which predicts the
presence and location of signal peptide cleavage sites in amino acid sequences from different organisms:
Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates
a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on
a combination of several artificial neural networks and hidden Markov models.
-
http://pathport.vbi.vt.edu/services/wsdls/lipop.wsdl
Lipop Web-Service is for prediction of lipoproteins and for discriminating between lipoprotein
signalpetides, other signal peptides and n-terminal membrane helices in Gram negative bacteria.
For additional information on Lipop, please refer to:
http://www.cbs.dtu.dk/services/LipoP/
-
http://pathport.vbi.vt.edu/services/wsdls/interproscan.wsdl
http://69.93.57.58:8080/axis/services/interproscan?wsdl
http://66.240.238.41:8080/axis/services/interproscan?wsdl
http://208.187.245.3:8080/axis/services/interproscan?wsdl
http://64.92.170.71:6565/axis/services/interproscan?wsdl
http://206.225.85.121:8080/axis/services/interproscan?wsdl
Wraps EBI's
InterProScan-v3.1
program which contacts multiple
databases bringing their information together. Our web-service
converts these results into DAS/1 format. Please note that this
service can be VERY SLOW and that we have no control over the
responsiveness of the databases that it contacts. For sequences
of 5-10 kilobases, expect running times of 4-5 minutes. Larger
sequences will require progressively longer running times.
Returns DAS/1 formatted results.
-
http://pathport.vbi.vt.edu/services/wsdls/mstag.wsdl
The MsTag web-service wraps the
MS-Tag tool.
A stripped down MS-Tag input form from the UCSF Mass Spectrometry Facility for people
who barely know the difference between a b-ion and a y-ion. If you don't know the
difference between monoisotopic and average masses, please consult your local mass spectrometrist.
-
http://pathport.vbi.vt.edu/services/wsdls/msfit.wsdl
The MS-Fit web-service wraps
MS-Fit tool.
A peptide mass fingerprinting tool from the UCSF mass spectometry facility that tries to fit
a users mass spectrometry data into a protein sequence in an existing database and thus suggest
the identity of the users protein.
-
http://pathport.vbi.vt.edu/services/wsdls/proteinpredict.wsdl
The Proteinpredict web-service uses P2SL package which infers protein targeting based on implicit motif
frequency distribution of protein sequences. Targeting-signal is modeled based on the distribution of
subsequence occurrences (implicit motifs) using self-organizing maps. The boundaries among the classes were
then determined by a set of support vector machines. P2SL is a hybrid computational system that predicts
over ER targeted, cytosolic, mitochondrial and nuclear protein localization classes. The results of the
Proteinpredict web-service are returned as a subset of the DAS1 format.
For additional information on P2SL please refer to:
http://staff.vbi.vt.edu/volkan/p2sl/
-
http://pathport.vbi.vt.edu/services/wsdls/tagident.wsdl
The Tagident web-service wraps the TagIdent tool available at
TagidentTool (formerly GuessProt).
This web-service generates a list of proteins close to a given pI and Mw.
It can also identify proteins by tagging a short sequence of Amino Acids up to 6 Amino acids
to be precise. Search can be made specific to a species by using the "species" option.
-
http://pathport.vbi.vt.edu/services/wsdls/psipred.wsdl
Psipred Web-Service is for prediction of secondary structure.
For additional information on Psipred, please refer to:
http://bioinf.cs.ucl.ac.uk/psipred/
-
http://pathport.vbi.vt.edu/services/wsdls/memsat.wsdl
Memsat Web-Service is for prediction of secondary structure and topology of all-helix integral membrane
proteins based on the recognition of topological models.
For additional information on memsat, please refer to:
http://bioinf.cs.ucl.ac.uk/psipred/
-
http://pathport.vbi.vt.edu/services/wsdls/rbsfinder.wsdl
The RBSFinder web-service predicts ribosome binding sites(RBS) in the upstream regions of the genes
annotated by Glimmer2. If there is no RBS-like patterns in this region, program searches for a start codon
having a RBS-like pattern ,in the same reading frame upstream or downstream and relocates start codon
accordingly.
For additional information on RBSFinder refer to:
http://www.tigr.org/software/
No Cost License Restricted Services:
-
http://pathport.vbi.vt.edu/services/wsdls/contig.wsdl
Creates assembled contigs from trace files.
Pred is used for base calling,
and read trace data from chromatogram files in the SCF, ABI, and ESD formats.
the Sanger Institute's
Cross_Match
program is used to remove contaminating vectors, and
CAP3 performs
the final assembly: It makes use of base quality values in constructing an alignment of
sequence reads and generating a consensus sequence for each contig.
CAP3 also uses a large number of forward-reverse constraints to locate and correct errors
in the layout of sequence reads allowingCAP3 to address assembly errors caused by repeats.
Results are returned in DAS/1 format.
- http://pathport.vbi.vt.edu/services/wsdls/genscan.wsdl
Wraps the GENSCAN
program for doing gene prediction.
It analyzes genomic DNA sequences from a variety of organisms including
humans and other vertebrates, invertebrates, and plants. For each sequence,
the GENSCAN web-service determines the most likely gene structure under a
probabilistic model of the gene structural and compositional properties of
the genomic DNA for the given orgamism.
GENSCAN predicts genes based on a pre-built probabilistic model.
The probabilistic model accounts for many of the essential gene structural
properties of genomic sequences, e.g., typical gene density, the typical number
of exons per gene, the distribution of exon sizes for different types of exons.
GENSCAN also builds the model based on many of the important compositional properties
of genes, e.g., the reading frame-specific hexamer composition of coding regions
vs the (reading frame-independent) hexamer composition of introns and intergenic
regions, and the position-specific signals, and of the TATA box, cap site, and
polyadenylation signals. Importantly, novel models of the donor and acceptor
splice sites are used which capture potentially important dependencies between
positions in these signals. For human and vertebrate sequences, separate sets of
model parameters are used which account for the manysubstantial differences in
gene density and structure observed in distinct C+G% compositional regions of the
human genome and the genomes of other vertebrates.
Results are returned in DAS/1 format.
- http://pathport.vbi.vt.edu/services/wsdls/genemark.wsdl
Wraps the
GeneMark
program for doing gene prediction.
GeneMark is a family of gene prediction programs provided by Mark Borodovsky's
Bioinformatics Group at the Georgia Institute of Technology.
It predicts genes and intergenic regions using Hidden Markov models of coding and
non-coding regions in prokaryotic or eukaryotic genomic DNA sequences.
Results are returned in DAS/1 format.
- http://pathport.vbi.vt.edu/services/wsdls/grailexp.wsdl
Wraps PERCEVAL application (Protein-coding exon, repetitive, and CpG-Island Evaluator)
contained within the GrailEXP
gene discovery suite for doing gene prediction.
It analyzes genomic DNA sequences from a variety of eukaryotes.
This web-service produces a list of possible Grail Exon Candidates using the
Grail neural network as being a potential exon.
Results are returned in DAS/1 format.
- http://pathport.vbi.vt.edu/services/wsdls/orpheus.wsdl
Wraps the Orpheus
system for doing bacterial gene prediction in large genomic fragments or completed genomes.
The analysis starts with a database similarity search and identification of
reliable gene fragments. The latter are used to derive statistical characteristics
of protein-coding regions and ribosome-binding sites (RBS). At the final step,
the complete set of genes in the analyzed genome is predicted, with special attention
paid to correct gene start identification.
Results are returned in DAS/1 format. NOTE: This program is
EXTREMELY slow, requiring days to finish where the previous
programs take minutes.
-
http://pathport.vbi.vt.edu/services/wsdls/fgenesh.wsdl
The FGENESH web-service is a general-purpose gene identification web-service that analyzes
genomic DNA sequences from a variety of organisms including humans and other vertebrates,
invertebrates, and plants. For each sequence, the FGENESH web-service determines the most
likely gene structure under a probabilistic model of the gene structural and compositional
properties of the genomic DNA for the given organism.
For Pay Restricted Services:
NO SERVICES IN THIS CATEGORY
Private VBI Services:
NO SERVICES IN THIS CATEGORY
|