Skip to content

S3 (Amazon S3)

Query files stored in Amazon S3 buckets with automatic credential detection and format inference.

Usage

bash
loq -i:S3 "SELECT * FROM 's3://bucket/key.csv'"

S3 URLs

Single File

bash
loq -i:S3 "SELECT * FROM 's3://my-bucket/logs/access.csv'"

Glob Patterns

bash
# All CSV files in prefix
loq -i:S3 "SELECT * FROM 's3://my-bucket/logs/*.csv'"

# All files with date pattern
loq -i:S3 "SELECT * FROM 's3://my-bucket/logs/2024-01-*.csv'"

Prefix (All Objects)

bash
loq -i:S3 "SELECT * FROM 's3://my-bucket/logs/'"

Authentication

loq uses the standard AWS credential chain:

  1. Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
  2. Shared credentials file: ~/.aws/credentials
  3. AWS config file: ~/.aws/config
  4. IAM role: EC2 instance profile or ECS task role

Using Environment Variables

bash
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1

loq -i:S3 "SELECT * FROM 's3://bucket/file.csv'"

Using Profiles

bash
export AWS_PROFILE=production

loq -i:S3 "SELECT * FROM 's3://bucket/file.csv'"

IAM Role (EC2/ECS)

No configuration needed - credentials are automatic.

Region Configuration

bash
# Via environment
export AWS_REGION=eu-west-1

# Or AWS_DEFAULT_REGION
export AWS_DEFAULT_REGION=eu-west-1

Supported Formats

S3 input supports multiple file formats:

ExtensionFormatAuto-Detected
.csvCSVYes
.jsonJSONYes
.ndjsonNDJSONYes
.parquetParquetYes
.gzGzip compressedYes

Compression

Gzip-compressed files are automatically decompressed:

bash
loq -i:S3 "SELECT * FROM 's3://bucket/logs/access.csv.gz'"

Examples

Query CSV from S3

bash
loq -i:S3 "SELECT * FROM 's3://my-logs/access.csv' LIMIT 10"

Filter S3 Data

bash
loq -i:S3 "SELECT timestamp, status, message
                 FROM 's3://my-logs/app.ndjson'
                 WHERE status = 'error'
                 ORDER BY timestamp DESC"

Aggregate Across Files

bash
loq -i:S3 "SELECT status, COUNT(*) AS count
                 FROM 's3://my-logs/2024-01-*.csv'
                 GROUP BY status
                 ORDER BY count DESC"

Query Parquet

bash
loq -i:S3 "SELECT id, name, amount
                 FROM 's3://data-warehouse/sales.parquet'
                 WHERE amount > 1000"

Join S3 and Local Files

bash
loq "SELECT s.*, l.lookup_value
           FROM 's3://bucket/data.csv' s
           JOIN lookup.csv l ON s.key = l.key"

Cost Optimization

Use Filters Early

S3 charges for data transfer. Filter to reduce data:

bash
# Transfers less data
loq -i:S3 "SELECT id, name FROM 's3://bucket/large.csv' LIMIT 100"

Use Specific Prefixes

More specific prefixes = fewer objects listed:

bash
# Good: specific prefix
loq -i:S3 "SELECT * FROM 's3://bucket/logs/2024/01/15/'"

# Less efficient: broad prefix
loq -i:S3 "SELECT * FROM 's3://bucket/logs/'"

Consider Parquet

Parquet is columnar and compressed - much cheaper to query:

bash
loq -i:S3 "SELECT id, status FROM 's3://bucket/data.parquet'"

Output to S3

Currently, S3 is input-only. To save results to S3:

bash
# Save locally, then upload
loq -i:S3 -o:CSV --ofile:results.csv "SELECT * FROM 's3://bucket/data.csv'"
aws s3 cp results.csv s3://output-bucket/results.csv

Common Patterns

Log Analysis

bash
loq -i:S3 "SELECT cs-uri-stem, COUNT(*), AVG(time-taken)
                 FROM 's3://logs-bucket/access-*.csv.gz'
                 GROUP BY cs-uri-stem
                 ORDER BY COUNT(*) DESC
                 LIMIT 20"

Data Pipeline

bash
# Query raw data, output processed
loq -i:S3 -o:JSON --ofile:processed.json \
          "SELECT
               id,
               UPPER(name) AS name,
               ROUND(price, 2) AS price
           FROM 's3://raw-data/products.csv'
           WHERE status = 'active'"

Cross-Account Access

Ensure your credentials have access to the bucket:

json
{
  "Effect": "Allow",
  "Action": ["s3:GetObject", "s3:ListBucket"],
  "Resource": [
    "arn:aws:s3:::other-account-bucket",
    "arn:aws:s3:::other-account-bucket/*"
  ]
}

VPC Endpoints

For private S3 access in VPCs, ensure your VPC endpoint is configured. loq uses standard AWS SDK which respects VPC endpoints.

Troubleshooting

Access Denied

Check credentials and bucket policy:

bash
# Test with AWS CLI
aws s3 ls s3://bucket/prefix/

Region Mismatch

Ensure region is correct:

bash
export AWS_REGION=us-west-2

Large Files

For very large files:

bash
# Use LIMIT during exploration
loq -i:S3 "SELECT * FROM 's3://bucket/huge.csv' LIMIT 1000"

Timeout Issues

Large files may take time to download. Consider:

  • Using more specific queries
  • Converting to Parquet for efficiency
  • Using AWS Athena for very large datasets

No Objects Found

Check the prefix/pattern:

bash
# List objects to verify
aws s3 ls s3://bucket/logs/ --recursive

IAM Permissions

Minimum required permissions:

json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket",
        "arn:aws:s3:::your-bucket/*"
      ]
    }
  ]
}

See Also

All rights reserved.