gff3_QC readme
Usage
gff3_QC [-h] [-g GFF] [-f FASTA] [-noncg] [-i] [-n ALLOWED_NUM_OF_N]
[-t [CHECK_N_FEATURE_TYPES [CHECK_N_FEATURE_TYPES …]]] [-o OUTPUT] [-v] [-s STATISTIC]
Testing environment
Python 3.9+
Inputs
GFF3: Specify the file name with the -g or –gff argument. Please note that this program requires gene/pseudogene and mRNA/pseudogenic_transcript to have an ID attribute in column 9.
Fasta file: Specify the file name with the -f or –fasta argument. This file must be the Fasta file that the GFF3 seqids and coordinates refer to. For more information, refer to the GFF3 specification.
Outputs
Error report for the input GFF3 file
Line_num: Line numbers of the found problematic models in the input GFF3 file.
Error_code: Error codes for the found problematic models. Please refer to lib/ERROR/ERROR.py to see the full list of Error_code and the corresponding Error_tag.
Error_level: Severity levels of the error codes. Three levels were defined: Error (violates the GFF3 specification), Warning (might violate the GFF3 specification), and Info (likely not an error, but worth checking).
Error_tag: Detail of the found errors for the problematic models. Please refer to lib/ERROR/ERROR.py to see the full list of Error_code and the corresponding Error_tag.
Statistic report for the output files
Error_code: Error codes for the found problematic models. Please refer to lib/ERROR/ERROR.py to see the full list of Error_code and the corresponding Error_tag.
Number of problematic models: Calculate the type and number of error_code.
Error_level: Severity levels of the error codes. Three levels were defined: Error (violates the GFF3 specification), Warning (might violate the GFF3 specification), and Info (likely not an error, but worth checking).
Error_tag: Detail of the found errors for the problematic models. Please refer to lib/ERROR/ERROR.py to see the full list of Error_code and the corresponding Error_tag.
Quick start
gff3_QC -g example_file/example.gff3 -f example_file/reference.fa -o test -s statistic.txt
or
gff3_QC --gff example_file/example.gff3 --fasta example_file/reference.fa --output test --statistic statistic.txt
Optional arguments
-h, –help
show this help message and exit
-g GFF, –gff GFF
Genome annotation file, gff3 format
-f FASTA, –fasta FASTA
Genome sequences, fasta format
-noncg, –noncanonical_gene
gff3 file is not formatted in the canonical gene model format.
-i, –initial_phase
Check whether initial CDS phase is 0 (default - no check)
-n ALLOWED_NUM_OF_N, –allowed_num_of_n ALLOWED_NUM_OF_N
Max number of Ns allowed in a feature, anything more will be reported as an error (default: 0)
-t [CHECK_N_FEATURE_TYPES [CHECK_N_FEATURE_TYPES …]], –check_n_feature_types [CHECK_N_FEATURE_TYPES [CHECK_N_FEATURE_TYPES …]]
Count the number of Ns in each feature with the type specified, multiple types may be specified, ex: -t CDS exon (default: “CDS”)
-o OUTPUT, –output OUTPUT
output file name (default: report.txt)
-s STATISTIC, –statistic STATISTIC
statistic file name (default: statistic.txt
-v, –version
show program’s version number and exit