ABI 2 FASTA Converter: Quick Guide for Converting ABI Files to FASTA
Converting ABI (Applied Biosystems) chromatogram files to FASTA sequence format is a common task in molecular biology workflows — for sequence submission, alignment, or downstream analyses. This quick guide explains what each format contains, why conversion matters, and gives a clear, step-by-step workflow (including batch conversion), common options, and troubleshooting tips.
What are ABI and FASTA formats?
- ABI: Binary chromatogram files generated by Sanger sequencing instruments. They include raw trace data (electropherogram), base calls, quality scores, and metadata (sample name, instrument run info).
- FASTA: Plain-text sequence format containing nucleotide or protein sequences with a simple header line beginning with “>”. FASTA does not store trace data or quality scores; it holds only sequence information and an identifier.
Why convert ABI to FASTA?
- FASTA is required by many sequence-analysis tools (BLAST, multiple sequence alignment, phylogenetics).
- Removing trace data reduces file size and simplifies storage and sharing.
- Converting allows automated pipelines to process sequences without chromatogram-specific software.
Tools you can use
- Command-line utilities: seqtk, EMBOSS seqret, Biopython scripts, sff-tools (for other formats).
- GUI tools: FinchTV (viewing), SnapGene Viewer (export), Geneious (export), Chromas.
- Custom scripts: Biopython provides parsers for ABI files and easy FASTA output for automation.
Quick command-line example (Biopython)
Install Biopython if needed:
bash
pip install biopython
Example Python script to convert a single ABI to FASTA:
python
from Bio import SeqIO record = SeqIO.read(“sample.ab1”, “abi”) SeqIO.write(record, “sample.fasta”, “fasta”)
- Saves the base-called sequence and the name from the ABI header as the FASTA header.
Batch conversion (command line)
Using a short shell loop for a directory of .ab1 files:
bash
mkdir -p fasta_output for f in.ab1; do out=fastaoutput/“\({f</span><span class="token" style="color: rgb(57, 58, 52);">%</span><span class="token" style="color: rgb(54, 172, 170);">.ab1}</span><span class="token" style="color: rgb(163, 21, 21);">.fasta"</span><span> </span><span> python - </span><span class="token" style="color: rgb(57, 58, 52);"><<</span><span class="token" style="color: rgb(163, 21, 21);">'PY' </span><span class="token" style="color: rgb(163, 21, 21);">from Bio import SeqIO, SeqIO </span><span class="token" style="color: rgb(163, 21, 21);">import sys </span><span class="token" style="color: rgb(163, 21, 21);">rec = SeqIO.read(sys.argv[1], "abi") </span><span class="token" style="color: rgb(163, 21, 21);">SeqIO.write(rec, sys.argv[2], "fasta") </span><span class="token" style="color: rgb(163, 21, 21);">PY</span><span> </span><span> </span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(54, 172, 170);">\)f” “$out” done
Or a simpler one-liner (requires Biopython):
bash
for f in *.ab1; do python -c “from Bio import SeqIO; SeqIO.write(SeqIO.read(’\(f</span><span class="token" style="color: rgb(163, 21, 21);">','abi'),'</span><span class="token" style="color: rgb(54, 172, 170);">\){f%.ab1}.fasta’,‘fasta’)”; done
Common options and considerations
- Header contents: FASTA headers are limited; include sample ID, run date, or locus if needed. Avoid spaces or use pipes/underscores.
- Quality scores: FASTA does not carry quality scores. If you need quality, export FASTQ (if supported) or store quality data separately.
- Trimming: ABI base calls may include low-quality ends; perform trimming (e.g., using Phred scores or trimming tools) before exporting for cleaner downstream results.
- Ambiguous bases: Bases called as N or ambiguous IUPAC codes will appear in FASTA; consider manual inspection when many Ns appear.
- Encoding: Ensure output uses UTF-8 and Unix line endings where required by downstream tools.
Troubleshooting
- “Can’t parse ABI file”: file may be corrupted or from an unsupported instrument. Try opening in FinchTV to confirm integrity.
- Missing sample name in header: extract and set a custom header using Biopython before writing.
- Batch script fails on large datasets: process in chunks or use GNU parallel to speed up conversions.
Example: trimming low-quality ends with Biopython
A minimal approach to trim Ns at sequence ends:
python
from Bio import SeqIO rec = SeqIO.read(“sample.ab1”, “abi”) seq = str(rec.seq).strip(“N”) rec.seq = seq SeqIO.write(rec, “sample.trimmed.fasta”, “fasta”)
For quality-based trimming, use dedicated tools (TrimAl, Trimmomatic for NGS; custom Phred trimming for Sanger).
Best practices
- Keep original ABI files archived; they contain raw data useful for re-analysis.
- Add meaningful identifiers in FASTA headers.
- Perform quality trimming and inspection before submitting sequences to public databases.
- Validate converted FASTA files with a quick alignment or BLAST to confirm expected sequence.
Quick checklist
- Verify ABI file integrity (open in viewer)
- Install Biopython or chosen tool
- Convert single file and inspect FASTA header/sequence
- Batch convert remaining files
- Trim low-quality ends and remove Ns if needed
- Archive original ABIs
This guide gives the essentials to convert ABI chromatograms into FASTA quickly and reliably. Use the command-line examples for automation and follow best practices for quality control and header management.
Leave a Reply