gff3_QC readme¶
Usage¶
gff3_QC.py [-h] [-g GFF] [-f FASTA] [-noncg] [-i] [-n ALLOWED_NUM_OF_N][-t [CHECK_N_FEATURE_TYPES [CHECK_N_FEATURE_TYPES …]]] [-o OUTPUT] [-v] [-s STATISTIC]
Testing environment¶
Python 3.x
Inputs¶
- GFF3: Specify the file name with the -g or –gff argument. Please note that this program requires gene/pseudogene and mRNA/pseudogenic_transcript to have an ID attribute in column 9.
- Fasta file: Specify the file name with the -f or –fasta argument. This file must be the Fasta file that the GFF3 seqids and coordinates refer to. For more information, refer to the GFF3 specification.
Outputs¶
- Error report for the input GFF3 file
- Line_num: Line numbers of the found problematic models in the input GFF3 file.
- Error_code: Error codes for the found problematic models. Please refer to lib/ERROR/ERROR.py to see the full list of Error_code and the corresponding Error_tag.
- Error_level: Severity levels of the error codes. Three levels were defined: Error (violates the GFF3 specification), Warning (might violate the GFF3 specification), and Info (likely not an error, but worth checking).
- Error_tag: Detail of the found errors for the problematic models. Please refer to lib/ERROR/ERROR.py to see the full list of Error_code and the corresponding Error_tag.
- Statistic report for the output files
- Error_code: Error codes for the found problematic models. Please refer to lib/ERROR/ERROR.py to see the full list of Error_code and the corresponding Error_tag.
- Number of problematic models: Calculate the type and number of error_code.
- Error_level: Severity levels of the error codes. Three levels were defined: Error (violates the GFF3 specification), Warning (might violate the GFF3 specification), and Info (likely not an error, but worth checking).
- Error_tag: Detail of the found errors for the problematic models. Please refer to lib/ERROR/ERROR.py to see the full list of Error_code and the corresponding Error_tag.
Quick start¶
gff3_QC -g example_file/example.gff3 -f example_file/reference.fa -o test -s statistic.txt
or
gff3_QC --gff example_file/example.gff3 --fasta example_file/reference.fa --output test --statistic statistic.txt
Optional arguments¶
- -h, –help
- show this help message and exit
- -g GFF, –gff GFF
- Genome annotation file, gff3 format
- -f FASTA, –fasta FASTA
- Genome sequences, fasta format
- -noncg, –noncanonical_gene
- gff3 file is not formatted in the canonical gene model format.
- -i, –initial_phase
- Check whether initial CDS phase is 0 (default - no check)
- -n ALLOWED_NUM_OF_N, –allowed_num_of_n ALLOWED_NUM_OF_N
- Max number of Ns allowed in a feature, anything more will be reported as an error (default: 0)
- -t [CHECK_N_FEATURE_TYPES [CHECK_N_FEATURE_TYPES …]], –check_n_feature_types [CHECK_N_FEATURE_TYPES [CHECK_N_FEATURE_TYPES …]]
- Count the number of Ns in each feature with the type specified, multiple types may be specified, ex: -t CDS exon (default: “CDS”)
- -o OUTPUT, –output OUTPUT
- output file name (default: report.txt)
- -s STATISTIC, –statistic STATISTIC
- statistic file name (default: statistic.txt
- -v, –version
- show program’s version number and exit