Variant Classifier
Frequently Asked Questions
If you do not find answers to your questions, please contact Kelvin Li at kli(AT)jcvi.org.
Questions
- When I try to execute the Show_Latest_Databases.pl, I get the PERL error Can’t locate Bio/EnsEMBL/DBSQL/DBConnection.pm in @INC(@INC contains: ...
- When I try to run any of your scripts, I get errors: <script name>.pl: Command not found.
- I am following the text file format for your input, but the results just don’t look right and I’m getting unexplainable warnings or errors.
- When I run Show_Latest_Database.pl, I get an error: Could not connect to database homo_sapiens_core_57_37b as user anonymous using [DBI:mysql:database=homo_sapiens_core_57_37b;host=ensembldb.ensembl.org;port=5306] ...
- When I run Extract_Coding_Info.pl, I get a warning/error like: Could not connect to database homo_sapiens_core_57_37b as user anonymous using [DBI:mysql:database=homo_sapiens_core_57_37b; ...
- When I run Extract_Coding_Info.pl, I get an error: For homo_sapiens_core_57_37b there is a difference in the software release (55) and the database release (57) ...
- The classification of non-synonymous substitutions is quite naïve. Will you integrate SIFT?
Answers
1. When I try to execute the Show_Latest_Databases, I get the PERL error:
Can’t locate Bio/EnsEMBL/DBSQL/DBConnection.pm in @INC( @INC contains: ...
This means that you have not installed the Ensembl Perl API properly. Please refer to the Ensembl Perl API web page at:
http://www.ensembl.org/info/data/api.html
This web page provides comprehensive instructions for installing the necessary Perl modules. Due to the frequency of updates, it is best to contact the Ensembl help staff if you are having difficulty with your installation.
Specific steps for installation can be found at:
http://www.ensembl.org/info/docs/api/api_installation.html
You will essentially need to download the necessary packages, unpack them, and set up your environmental variables. If you are not sure which packages to download, just download them all. They are not that big and will save you time in the future if you end up needing them for another project. After you unpack the files, and it is extremely critical that you set up your environmental variables so Perl will be able to find them. If you have downloaded all the packages, unpacked all the files and you are still having the above error, then please contact Ensembl or your IT staff, as this is not an error that can be resolved by the authors of the VariantClassifier scripts.
Unfortunately, we are unable to provide an automated script that will install the Ensembl Perl API automatically along with the VariantClassifier software, because this may cause configuration errors in other programs you may have or will install. Ensembl updates their API and database upon data releases, so it is important you are constantly aware of the version you are using to ensure data consistency.
2. When I try to run any of your scripts, I get errors: “<script name>.pl: Command not found.”
This means that the VariantClassifier cannot find where your Perl executable is. Do the following:
At the command line type:
which perl
This will tell you where your shell thinks a version of Perl exists. If you see the message:
perl: Command not found.
Then you will need to install perl, or configure your environmental variable, PATH, to include where you know Perl has been installed.
Frequently, people will install perl in /usr/local/bin/perl, and so our scripts point to that location by default. If the result of “which perl” give a different result, then you will need to modify the first line of all the perl scripts to change from:
#!/usr/local/bin/perl
To the result of the “which perl” command.
3. I am following the text file format for your input, but the results just don’t look right and I’m getting unexplainable warnings or errors.
If you created your text file in Windows, it’s likely that Windows has inserted both the “Carriage Return”/”Line Feed” special characters into your file. Since the parsing of the input is done on tab characters and carriage returns, the line feed character may be throwing off the parser when you run your code in Linux or MacOS. If you do not have the command dos2unix already installed on your computer, download it, and run it on all files you have generated in Windows. This code will strip the line feeds from the text file.
4. When I run Show_Latest_Database.pl, I get an error:
Could not connect to database homo_sapiens_core_57_37b as user anonymous using
[DBI:mysql:database=homo_sapiens_core_57_37b;host=ensembldb.ensembl.org;port=5306] as a locator:
Unknown database 'homo_sapiens_core_57_37b' at
/usr/local/devel/DAS/software/resequencing/ensembl_api/ensembl_55/ensembl/modules/Bio/EnsEMBL/DBSQL/DBConnection.pm line 274.
This is an unfortunate error that we don’t have an automatic workaround for yet. Show_Latest_Database.pl works by connecting to the database, and then asking for list of database that are on line. The chicken-before-the-egg problem arises in that in order for us to connect to the Ensembl database, we need to name the database we want to connect to. If a database is still on line, then the script works fine, but if a database becomes removed, then this error will occur. To fix this problem, you will need to perform very minor surgery on the Show_Latest_Database.pl script. You will need to replace the string that says, in this case, “homo_sapiens_core_57_37b”, near the first 30 lines of the script, which is the dbname, and replace it with the most current one. It doesn’t have to be the homo_sapiens database, but at the time that this code was originally written, human and mouse were the only two genomes in Ensembl. The specific database name doesn’t really matter, as long as it exists. You will have to go to the Ensembl web site to determine which build/assemblies are currently in play.
5. When I run Extract_Coding_Info.pl, I get an error:
Could not connect to database homo_sapiens_core_57_37b as user anonymous using [DBI:mysql:database=homo_sapiens_core_57_37b;host=ensembldb.ensembl.org;port=5306] as a locator:
Unknown database 'homo_sapiens_core_57_37b' at /usr/local/devel/DAS/software/resequencing/ensembl_api/ensembl_55/ensembl/modules/Bio/EnsEMBL/DBSQL/DBConnection.pm line 274.
Where, sapiens_core_57_37b , may be some other database you specified.
When you get this kind of an error while running Extract_Coding_Info.pl, you have probably installed the Ensembl API’s correctly, but you are requesting annotation from a database that no longer exists, or you have improperly specified one that does currently exists. What you should do is try running Show_Latest_Database.pl again, with the –o option, to select a database that is still on line.
6. When I run Extract_Coding_Info.pl, I get a warning/error like:
For homo_sapiens_core_57_37b there is a difference in the software release (55) and the database release (57).
You should update one of these to ensure that your script does not crash.
The version of the Ensembl API is closely tied with the build/assembly of the annotation. This is an Ensembl feature, which while out of our control, will likely impact your use of this script if enough time has passed between your API installation and your most current run. What this warning is telling you is that the current database that you are using, 57/37b, is different than the Ensembl API that you have installed, 57/55. The easiest solution to the problem is to download and reinstall the latest Ensembl API and use the most recent build/assembly of your genome of interest.
7. The classification of non-synonymous substitutions is quite naïve. Will you integrate SIFT?
Unfortunately integrating SIFT into the VariantClassifier is beyond the scope of this software tool. We would however recommend visiting the SIFT website at http://www.weizhongli-lab.org and reading the articles that Dr. Ng and colleagues have written to provide a more accurate assessment of SNPs.