Here are some resources you might find useful.
Here are some scripts we built. Use and modify as you want.
Please, include a citation.
This script will filter out sequences from a fasta file that are not of a certain sequence length range. The Python script takes an input file containing sequences in fasta format, the output file name, the minimum length, and the maximum length. You can include a –visual flag at the end to have matplotlib create a histogram of the sequence lengths.
Example: python filter-by-length.py sequences.fasta sequences-filtered.fasta 200 250 –visual
You can download the script here
This script parses a file containing protein sequences in fasta format and downloads information
from EMBL or NCBI. The gathered information is then put into an Xcel
spreadsheet. The script may require some modifications for specific uses.
Example: python get_info.py sequences.fasta output.xlsx -emblebi -verbose
You can download the script here
This script reorders the information from an excel spreadsheet to match the order of a particular fasta file (one that may be ordered according to the phylogenetic tree). This script is useful for maintaining consistency between the sequence alignment, the meta-data on the sequences and the phylogenetic tree.
Example: python reorder_metadata.py sequences.fasta input.xlsx output.xlsx
You can download the script here
This program, written in Python, identifies residues within a clade (user must define what sequences are to be included in each clade) that are conserved within the clade but are different outside the clade and assigns a significance score to these sites. The following papers describe the method and applications of the method
Author: Skylar Olson
Example: python general_entropy.py input.fasta GrDel | PsMultiplier 4 RSequence XP3423 gapScore .90 |
You can download the script here