S3 (Amazon S3)
Query files stored in Amazon S3 buckets with automatic credential detection and format inference.
Usage
loq -i:S3 "SELECT * FROM 's3://bucket/key.csv'"S3 URLs
Single File
loq -i:S3 "SELECT * FROM 's3://my-bucket/logs/access.csv'"Glob Patterns
# All CSV files in prefix
loq -i:S3 "SELECT * FROM 's3://my-bucket/logs/*.csv'"
# All files with date pattern
loq -i:S3 "SELECT * FROM 's3://my-bucket/logs/2024-01-*.csv'"Prefix (All Objects)
loq -i:S3 "SELECT * FROM 's3://my-bucket/logs/'"Authentication
loq uses the standard AWS credential chain:
- Environment variables:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY - Shared credentials file:
~/.aws/credentials - AWS config file:
~/.aws/config - IAM role: EC2 instance profile or ECS task role
Using Environment Variables
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1
loq -i:S3 "SELECT * FROM 's3://bucket/file.csv'"Using Profiles
export AWS_PROFILE=production
loq -i:S3 "SELECT * FROM 's3://bucket/file.csv'"IAM Role (EC2/ECS)
No configuration needed - credentials are automatic.
Region Configuration
# Via environment
export AWS_REGION=eu-west-1
# Or AWS_DEFAULT_REGION
export AWS_DEFAULT_REGION=eu-west-1Supported Formats
S3 input supports multiple file formats:
| Extension | Format | Auto-Detected |
|---|---|---|
.csv | CSV | Yes |
.json | JSON | Yes |
.ndjson | NDJSON | Yes |
.parquet | Parquet | Yes |
.gz | Gzip compressed | Yes |
Compression
Gzip-compressed files are automatically decompressed:
loq -i:S3 "SELECT * FROM 's3://bucket/logs/access.csv.gz'"Examples
Query CSV from S3
loq -i:S3 "SELECT * FROM 's3://my-logs/access.csv' LIMIT 10"Filter S3 Data
loq -i:S3 "SELECT timestamp, status, message
FROM 's3://my-logs/app.ndjson'
WHERE status = 'error'
ORDER BY timestamp DESC"Aggregate Across Files
loq -i:S3 "SELECT status, COUNT(*) AS count
FROM 's3://my-logs/2024-01-*.csv'
GROUP BY status
ORDER BY count DESC"Query Parquet
loq -i:S3 "SELECT id, name, amount
FROM 's3://data-warehouse/sales.parquet'
WHERE amount > 1000"Join S3 and Local Files
loq "SELECT s.*, l.lookup_value
FROM 's3://bucket/data.csv' s
JOIN lookup.csv l ON s.key = l.key"Cost Optimization
Use Filters Early
S3 charges for data transfer. Filter to reduce data:
# Transfers less data
loq -i:S3 "SELECT id, name FROM 's3://bucket/large.csv' LIMIT 100"Use Specific Prefixes
More specific prefixes = fewer objects listed:
# Good: specific prefix
loq -i:S3 "SELECT * FROM 's3://bucket/logs/2024/01/15/'"
# Less efficient: broad prefix
loq -i:S3 "SELECT * FROM 's3://bucket/logs/'"Consider Parquet
Parquet is columnar and compressed - much cheaper to query:
loq -i:S3 "SELECT id, status FROM 's3://bucket/data.parquet'"Output to S3
Currently, S3 is input-only. To save results to S3:
# Save locally, then upload
loq -i:S3 -o:CSV --ofile:results.csv "SELECT * FROM 's3://bucket/data.csv'"
aws s3 cp results.csv s3://output-bucket/results.csvCommon Patterns
Log Analysis
loq -i:S3 "SELECT cs-uri-stem, COUNT(*), AVG(time-taken)
FROM 's3://logs-bucket/access-*.csv.gz'
GROUP BY cs-uri-stem
ORDER BY COUNT(*) DESC
LIMIT 20"Data Pipeline
# Query raw data, output processed
loq -i:S3 -o:JSON --ofile:processed.json \
"SELECT
id,
UPPER(name) AS name,
ROUND(price, 2) AS price
FROM 's3://raw-data/products.csv'
WHERE status = 'active'"Cross-Account Access
Ensure your credentials have access to the bucket:
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::other-account-bucket",
"arn:aws:s3:::other-account-bucket/*"
]
}VPC Endpoints
For private S3 access in VPCs, ensure your VPC endpoint is configured. loq uses standard AWS SDK which respects VPC endpoints.
Troubleshooting
Access Denied
Check credentials and bucket policy:
# Test with AWS CLI
aws s3 ls s3://bucket/prefix/Region Mismatch
Ensure region is correct:
export AWS_REGION=us-west-2Large Files
For very large files:
# Use LIMIT during exploration
loq -i:S3 "SELECT * FROM 's3://bucket/huge.csv' LIMIT 1000"Timeout Issues
Large files may take time to download. Consider:
- Using more specific queries
- Converting to Parquet for efficiency
- Using AWS Athena for very large datasets
No Objects Found
Check the prefix/pattern:
# List objects to verify
aws s3 ls s3://bucket/logs/ --recursiveIAM Permissions
Minimum required permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-bucket",
"arn:aws:s3:::your-bucket/*"
]
}
]
}