Automate Exports with PGToTxt: Scripts, Tips, and Best Practices
Overview
PGToTxt is a tool/approach to export PostgreSQL table data into plain-text formats (CSV, TSV, fixed-width, or custom-delimited). Automating exports saves time, ensures reproducibility, and enables downstream processing (ETL, reporting, backups).
Key Features to Automate
- Scheduled exports (cron, systemd timers)
- Format options: CSV, TSV, pipe-delimited, fixed-width
- Column selection and ordering
- Incremental exports using timestamps or high-water marks
- Compression (gzip) and rotation
- Secure credential handling (environment variables, .pgpass)
Example Bash Script (cron-friendly)
bash
#!/usr/bin/env bash set -euo pipefail # Config PGHOST=localhost PGPORT=5432 PGUSER=export_user PGDATABASE=mydb EXPORT_DIR=/var/exports/pgtotxt TABLE=my_table COLUMNS=“id,name,created_at,amount” WHERE=“created_at >= NOW() - INTERVAL ‘1 day’” OUTFILE=“\({EXPORT_DIR}</span><span class="token" style="color: rgb(163, 21, 21);">/</span><span class="token" style="color: rgb(54, 172, 170);">\){TABLE}_\((</span><span class="token" style="color: rgb(57, 58, 52);">date</span><span class="token" style="color: rgb(54, 172, 170);"> +%F</span><span class="token" style="color: rgb(54, 172, 170);">)</span><span class="token" style="color: rgb(163, 21, 21);">.csv.gz"</span><span> </span> <span></span><span class="token" style="color: rgb(57, 58, 52);">mkdir</span><span> -p </span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(54, 172, 170);">\)EXPORT_DIR” psql “host=\(PGHOST</span><span class="token" style="color: rgb(163, 21, 21);"> port=</span><span class="token" style="color: rgb(54, 172, 170);">\)PGPORT user=\(PGUSER</span><span class="token" style="color: rgb(163, 21, 21);"> dbname=</span><span class="token" style="color: rgb(54, 172, 170);">\)PGDATABASE” </span> -c “\copy (SELECT \(COLUMNS</span><span class="token" style="color: rgb(163, 21, 21);"> FROM </span><span class="token" style="color: rgb(54, 172, 170);">\)TABLE WHERE \(WHERE</span><span class="token" style="color: rgb(163, 21, 21);">) TO STDOUT WITH CSV HEADER"</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">\</span><span> </span><span> </span><span class="token" style="color: rgb(57, 58, 52);">|</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">gzip</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">></span><span> </span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(54, 172, 170);">\)OUTFILE”
Scheduling
- Use cron: add
0 2 * * * /path/to/export.shfor daily 02:00 runs. - For better reliability, prefer systemd timers (restart policies, logging) in production.
Incremental Export Patterns
- High-water mark column (e.g., updated_at): store last max(updated_at) in a state file and export rows > last_value.
- Use transactional consistency: run SELECT within same transaction or rely on snapshot isolation if necessary.
Security & Credentials
- Prefer .pgpass with strict file permissions or environment variables injected by a secrets manager.
- Avoid hardcoding passwords in scripts.
- Restrict DB role to only SELECT privileges needed for export.
Performance Tips
- Export only needed columns and use indexes in WHERE clauses.
- Use COPY/psql \copy to stream data efficiently.
- For large tables, export by range (partition or id-range) and parallelize.
- Disable autocommit in transactional pipelines when combining multiple steps.
File Handling & Retention
- Compress outputs (gzip, zstd) to save storage.
- Use predictable filenames with timestamps.
- Implement retention: delete files older than N days or move to cold storage.
- Verify checksums (sha256) for integrity when transferring.
Error Handling & Monitoring
- Exit with nonzero status on failure and log stderr/stdout to rotating logs.
- Emit a small success/failure JSON or status file for orchestrators.
- Integrate alerting (email, Slack) for repeated failures.
Best Practices Checklist
- Automation: schedule with cron/systemd; add retries/backoff.
- Security: avoid embedded secrets; minimal privileges.
- Reliability: atomic writes (temp file + mv), checksums, retries.
- Performance: use COPY, limit columns, parallelize large exports.
- Observability: logs, metrics, alerts, and exported-file manifests.
If you want, I can generate a ready-to-deploy export script for your specific table schema, retention policy, and schedule.
Leave a Reply