General
Best Practices
Event Procedure
Helpdesk Request Form
Lab Closure Procedure
Links
Resource Drive Conventions
Resource Map
Shipping (FedEx)
Shipping (International)
Suggestion Box
Support Article Guidelines
Welcome
Information Technology
Email & Calendar
Add Calendars to iPhone
Confidential emails
Create a Shared Calendar
Create an Email Group
Email Filters & Rules
Email Groups
Email Headers
Email Signatures
Google 2-Step Verification
Google Calendar Overview
Phishing
Schedule emails
Staff Minus One Emails
Using Outlook with Gmail
Using Yubico Security Keys with your Google account
Print & Fax
Software
1Password
Adobe Acrobat DC
Adobe Creative Cloud
ArcGIS
Colby VPN
Combine PDFs in Adobe Acrobat
Excel Trust Settings
Install Falcon Antivirus
Microsoft Office
Microsoft Office Runtime Error Fix
Mosyle Mac Enrollment
Raiser's Edge
Slack
Software Resources
Uninstall OpenVPN
Windows 10 - Restore from backup
Updates
2020 December - email name spoofing
2020 October - COVID resources
2021 December - WiFi
2021 January - Zoom recording + private chat
2021 March - NetSuite Google authentication
2021 March - VPN Upgrade
2023 - Zoom Updates
2024 May - VPN SSO
DNS
DNS Change
Data Storage and Computer Backups
Google Drive
HPCC and Storage Proposal Information
Laptop Recommendations
Loaner Hardware
Migrating data from Storage to Google Drive
Passwords
Phones
Restoring Files
Storage
VPN
VPN Migration
Vendor Access
Website Request
WiFi
Zeiss Digital Classroom
HR & Payroll
Paid Time Off
Payroll Overview & FAQ
Personnel Offboarding
Personnel Onboarding
Timesheet Approval (supervisors)
Timesheets
Facilities
BMS Access
Bigelow R/V Billing Form
E&I Wing Construction Update
R/V Bowditch Reservation Center
R/V Clarice Reservation Center
Finance
Admin
Budget & Reports
Invoicing
Policies & Procedures
Advancement Entry of Donations and Pledges
Corporate Traveler / Melon
Gas and Cryo-Supply Ordering Process and Form Link
Purchasing Flowchart - for staff reference
Purchasing Policy
Vendors Exempt from Purchase Orders
Proposals
Purchase & Expense
Bill/Invoice Approval
Creating a Bill to be Paid
Equipment Capitalization Help
Expense Report
Expense Report (example)
Non-Employee Reimbursement
Purchase Order
Purchase Order (example)
Purchase Order (supplemental)
Recurring Purchase Order (SRS)
Amazon.com
Approval Reminders
Business Office Orientation
Capital One - Corporate Credit Card
Customize Dashboard
Dashboard (SRS)
NetSuite FAQ
NetSuite Login
NetSuite shortcuts
Revenue Flow Chart
Workshop, Training Projects, and Participant Support Help
Computing
Software
AAI Calculation
ANI Calculation
AlphaFold
Anvi'o
Conda environments
Jupyter notebook
Prokka
RStudio
dada2
sag-mg-recruit
Job management
Charlie Overview
Connect to Charlie
Edit with VS Code
Getting Started
Monitor jobs
Software modules
Transfer files
Zoom
- Home
- Computing
- Software
- AAI Calculation
AAI Calculation
Updated
comparem is a toolbox for comparative genomics. We are using it to calculate AAI (Average Amino Acid Identity) between genomes. It is installed into a conda env within SCGC's anaconda3 module, so to access it you need to enter:
module use /mod/scgc
module load anaconda3
source activate comparem
To see what comparem can do, type:
$ comparem -h
Output:
Common workflows:
aai_wf -> Calculate AAI between all pairs of genomes
(runs call_genes => similarity => aai)
classify_wf -> Identify similar genomes based on AAI values
(runs call_genes => similarity => classify)
Gene prediction:
call_genes -> Identify genes within genomes
Gene homology and genome similarity:
similarity -> Perform reciprocal sequence similarity search between proteins
aai -> Calculate AAI between all pairs of genomes
classify -> Identify similar genomes based on AAI value
Usage profiles:
aa_usage -> Calculate amino acid usage within each genome
codon_usage -> Calculate codon usage within each genome
kmer_usage -> Calculate kmer usage within each genome
stop_usage -> Calculate stop codon usage within each genome
Lateral gene transfer:
lgt_di -> Calculate dinuceotide (3rd,1st) usage of genes to identify putative LGT events
lgt_codon -> Calculate codon usage of genes to identify putative LGT events
Visualization and exploration:
diss -> Calculate the dissimilarity between usage profiles
hclust -> Perform hierarchical clustering
Use: comparem <command> -h for command specific help.
Feature requests or bug reports can be sent to Donovan Parks (donovan.parks@gmail.com)
or posted on GitHub (https://github.com/dparks1134/comparem).
For instructions on CompareM's aai calculation workflow type:
$ comparem aai_wf -h
Output:
usage: comparem aai_wf [-h] [-e EVALUE] [-p PER_IDENTITY] [-a PER_ALN_LEN]
[-x FILE_EXT] [--proteins] [--force_table FORCE_TABLE]
[--blastp] [--sensitive] [--keep_headers] [--keep_rbhs]
[--tmp_dir TMP_DIR] [-c CPUS] [--silent]
input_files output_dir
Calculate AAI between all pairs of genomes
positional arguments:
input_files genome files
output_dir output directory
optional arguments:
-h, --help show this help message and exit
-e, --evalue EVALUE e-value cutoff for identifying initial blast hits
(default: 0.001)
-p, --per_identity PER_IDENTITY
percent identity for defining homology (default: 30.0)
-a, --per_aln_len PER_ALN_LEN
percent alignment length of query sequence for
defining homology (default: 70.0)
-x, --file_ext FILE_EXT
extension of files to process (default: fna)
--proteins indicates the input files contain protein sequences
--force_table FORCE_TABLE
force use of specific translation table
--blastp use blastp instead of diamond
--sensitive use sensitive mode of DIAMOND
--keep_headers indicates FASTA headers already have the format
<genome_id>~<gene_id>
--keep_rbhs create file with reciprocal best hits
--tmp_dir TMP_DIR specify alternative directory for temporary files
(default: /tmp)
-c, --cpus CPUS number of CPUs to use (default: 1)
--silent suppress output