Genome Workbench: Use ProSplign for Protein to Genomic Alignments


The ProSplign algorithm was developed at NCBI
and is particularly adept at handling frame shifts and mRNA splicing events. This tutorial
uses version 2.10.5 of Genome Workbench, which is the first version to include the ProSplign
tool. And this video closely follows a tutorial
found on the Genome Workbench web page. Documentation for ProSplign can be found here,
http://www.ncbi.nlm.nih.gov/sutils/static/prosplign/documentation.html. There are only 3 steps to running ProSplign
within genome workbench: You first import the genomic and protein sequences
that you want to align. Then you display the genomic sequence in a
Graphical Sequence view. And then you run ProSplign. So I’ll go over these 3 steps now in more
detail. I’m going to use GenBank as my data source,
so I’ll double-click on GenBank, then paste in a genomic accession.version number with
a range, and a protein accession.version. I’ll click Next, and I’m going to Add to an
existing project and just keep the name as is, then click Finish. I now have my sequences in the Data folder,
so I’ll right-click on the genomic record, choose Open New View, select Graphical Sequence
View, and click Next. Here is that genomic accsession in blue. This
Scaffold track appears by default; you could, of course, remove that if you wanted to, but
be sure that you keep, or add, the Alignments track. I am now ready to run ProSplign, so in the
Data folder I select the two sequences that I want to align, right-click and select Run
Tool, then choose ProSPLIGN and click Next. Since ProSplign generates alignments in a
pairwise fashion, you can have more than one genomic sequence or range. I have only one
in my example, and I happen to know that the best alignment is on the minus strand of my
genomic sequence, but if in doubt, select Both and the program will present the best
alignment. For genomic sequences that have no introns, uncheck With introns. And you
can select the organism manually, but ProSplign automatically determines the genetic code
to use from the organism associated with the sequence record. When ready, click Next. I’ll just click Finish, and there is the alignment.
I can expand this section, and even pop out this window to get a better view. I can zoom in to get a better look at a particular
region. I will Zoom to Selection, but I could zoom all the way to the sequence. Okay, that should allow you to get started
using ProSplign, but I do want to demonstrate one more feature. I’m going to put this window back into the
main display, and run ProSplign again to show you how you can cancel an alignment that is
running. My example ran very quickly, but there may be times when you start an alignment,
then decide you want to cancel the process. So I will select the two sequences, choose
to run ProSplign, click through the Next and Finish buttons, and now I see the job running
in the Task View. To cancel, I’ll right-click on the job description and select Cancel Task,
which generates a window notifying me that the job is finished, in other words, no alignments
were created. That ends this introduction to ProSplign.
You can send questions to the NCBC Helpdesk at: [email protected] And click on this
note to see a playlist of other Genome Workbench tutorials.