Enter sequence(s)

Sequence can be in form of raw sequence, accession # or gi (example accession Y14934). You may opt to include a definition line starting with ">" at the top in conforming to FASTA format. You can also load your sequences contained in a local file (make sure it is a plain text file). If the sequence is already in GenBank, you can just enter its accession or gi #.

Multiple query sequences may be submitted. Each sequence must have a unique identifier and we suggest that you do not use white spaces in the identifier as any characters after the white spaces will be excluded. We recommend not to exceed 1000 sequences per search.

Min consecutive nucleotide matches for D gene

This controls the threshold for D gene detection. You can set the minimal number of required consecutive nucleotide matches between the query sequence and the D genes based on your own criteria. The default value is 5 nucleotides.

Mismatch penalty for D gene

A higher mismatch penalty (i.e., -4) favors detecting D gene matches with higher similarity to the query sequence but such matched regions are not necessarily long. On the other hand, a lower mismatch penalty (i.e., -1) favors detecting longer D gene matches that do not necessarily have a high similarity to the query sequence. D gene search with a low mismatch penalty is also more likely to be subject to spurious matches that are caused by random nucleotide additions and somatic mutations.

Show amino acid translation

This will translate your query as well as the top germline sequence and align the amino acid to the second base of a codon. The mismatched amino acids in the germline sequence will be colored.

Focus on the V gene

This allows a user to find the best matches for the V gene in your query sequence among additional non-germline databases (i.e., nr, genome, etc). This option has NO effect on search against germline gene databases (see explanation below).

A typical rearranged query sequence includes a leader, the V, D, J gene (sometimes the C region is also included). When a sequence is submitted for blast search, the similarity matches will be performed over the entire query sequence. Unlike the germline V gene database which only contains the V gene sequences, other databases such as nr contain many rearranged sequences that also include a leader, the V, D, J and C genes. As a result, the best hit from these databases does not necessarily have the best match to the query V gene; Rather, it has the best match over the entire query sequence (For example, it may have very high similarity to the leader, D, J or C genes in a query sequence but only a low match to the V gene). This is not a problem if the goal is trying to find the best overall matches to a query sequence. However, if the goal is to find best matches to the V gene of a query sequence, then one needs to isolate the V gene part manually from a query sequence and then use it for a search.

With the "Focus on the V gene" option on, the V gene part from a query sequence is automatically isolated (based on comparison to hits from the germline V gene database) and then used for search against additional databases like nr. This option should be disabled, however, if the search intention is to find best hits based on overall matches.

Program

Choose blastp for protein sequences and blastn for nucleotide sequences.

V domain delineation system

The V domain can be delineated using either IMGT system (Lefranc et al 2003) or Kabat system (Kabat et al, 1991, Sequences of Proteins of Immunological Interest, National Institutes of Health Publication No. 91-3242, 5th ed., United States Department of Health and Human Services, Bethesda, MD). Domain annotation of the query sequence is based on pre-annotated domain information for the best matched germline hit.

Databases

All IMGT germline databases are from IMGT/V-QUEST reference directory sets.  
Sequences from several different categories are available including functional genes (F), open reading frame genes (ORF), 
pseudogenes whose protein translation frames are intact (in-frame P) and orphon genes that are outside of normal immunoglobulin or T cell receptor gene loci.

All UNSWIg gene databases are from UNSWIg germline repertoire.

NCBI human V genes: 
This database consists of the "IMGT human V genes (F+ORF+in-frame P) including orphons" database plus 
a few pseudogenes that IMGT database did not include.  It contains the same human sequences as the "Ig germline V genes" database for 
the previous version of IgBLAST. 


NCBI human V genes (old):
This is our earliest version of human Ig germline V genes database before addition of the human germline sequences from IMGT database.  
It is the same as "Ig germline V genes (old)" database for the previous version of IgBLAST.

NCBI mouse V genes, NCBI mouse D genes and NCBI mouse J genes: 
These are mouse germline sequences independently collected by NCBI.

See NCBI germline genes for details on NCBI germline gene collections.

Custom:
You can search your own database.  Your database should contain sequences in FASTA format.


Additional database (non-germline):
See NCBI BLAST page for details on all additional non-germline databases such as nr.

Organism for the query sequence

Specify the organism which the query sequence comes from. This allows the program to properly report the V domain delineation, the V-J frame status (i.e, in-frame, out-of-frame, etc) and the translation of the query nucleotide sequence.

Expect

The statistical significance threshold for reporting matches against database sequences. Lower EXPECT thresholds are more stringent and report only high similarity matches. Choose higher EXPECT value (for example 1 or more) if you expect a low identity between your query sequence and the targets. Note that this option is only for the additional database search (it has no effect on the germline gene database search).





    


Last modified: Fri Sep 25 12:59:56 EDT 2015