Skip to content

TEXTLINE / TEXT

Line-by-line text file parsing with optional regex pattern matching.

Overview

The TEXTLINE format reads text files line by line, treating each line as a row. You can optionally apply regex patterns with named capture groups to extract structured fields from unstructured text.

Usage

bash
# Basic line-by-line reading
loq -i:TEXTLINE "SELECT * FROM messages.txt"
loq -i:TEXT "SELECT * FROM app.log"

# With regex pattern for structured extraction (library usage)
# Note: Regex patterns are configured programmatically in library mode

Default Schema

When used without a regex pattern, TEXTLINE provides a single column:

FieldTypeDescription
TextSTRINGThe entire line content

Regex Pattern Mode

When configured with a regex pattern (in library mode), the schema is determined by named capture groups in the pattern. Each named group (?P<name>...) becomes a column.

Example Pattern

rust
// In library code
let mut input = TextLineInput::open("access.log")
    .unwrap()
    .with_pattern(r"(?P<ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<date>[^\]]+)\] \"(?P<request>[^\"]+)\"")
    .unwrap();
// Schema: [ip, date, request]

Features

Basic Mode

  • Each line becomes one row with a single "Text" column
  • Empty lines are included
  • Handles both Unix (\n) and Windows (\r\n) line endings
  • Simple and fast for line-based processing

Regex Mode

  • Extract structured fields using named capture groups
  • Lines that don't match the pattern are skipped
  • Missing capture groups return NULL values
  • All extracted fields are returned as strings

Options

OptionDescription
-i:TEXTLINESelect TEXTLINE format
-i:TEXTAlias for TEXTLINE

Examples

Basic Line Reading

bash
# Read all lines from a file
loq -i:TEXTLINE "SELECT Text FROM notes.txt"

# Count lines in a file
loq -i:TEXTLINE "SELECT COUNT(*) AS line_count FROM document.txt"

# View first 20 lines
loq -i:TEXTLINE "SELECT Text FROM file.txt LIMIT 20"

Filtering Lines

bash
# Find lines containing ERROR
loq -i:TEXT "SELECT Text FROM app.log WHERE Text LIKE '%ERROR%'"

# Find lines starting with specific prefix
loq -i:TEXT "SELECT Text FROM log.txt WHERE Text LIKE 'WARNING:%'"

# Count error occurrences
loq -i:TEXT "SELECT COUNT(*) FROM app.log WHERE Text LIKE '%ERROR%'"

String Operations

bash
# Extract parts of lines using SQL functions
loq -i:TEXT "SELECT UPPER(Text) FROM notes.txt"

# Get line length
loq -i:TEXT "SELECT Text, STRLEN(Text) AS length FROM file.txt"

# Extract substrings
loq -i:TEXT "SELECT SUBSTR(Text, 1, 50) AS preview FROM log.txt"

Aggregation

bash
# Group by line patterns
loq -i:TEXT "SELECT 
    CASE 
        WHEN Text LIKE 'ERROR%' THEN 'Error'
        WHEN Text LIKE 'WARN%' THEN 'Warning'
        ELSE 'Info'
    END AS level,
    COUNT(*) AS count
FROM app.log
GROUP BY level"

Multiple Files

bash
# Process multiple log files
loq -i:TEXT "SELECT Text FROM 'logs/*.log' WHERE Text LIKE '%ERROR%'"

# Recursive directory search
loq -i:TEXT -recurse:2 "SELECT Text FROM 'logs/*.txt'"

Output Formats

bash
# Export to JSON
loq -i:TEXT -o:JSON "SELECT Text FROM file.txt WHERE Text LIKE '%important%'"

# Format as table
loq -i:TEXT -o:DATAGRID "SELECT Text FROM notes.txt LIMIT 10"

# Save to SQLite
loq -i:TEXT -o:SQLITE --ofile:output.db "SELECT Text FROM file.txt"

Use Cases

Log Analysis

bash
# Find all error messages
loq -i:TEXT "SELECT Text FROM app.log WHERE Text LIKE '%ERROR%'"

# Count warnings by hour (assuming timestamp prefix)
loq -i:TEXT "SELECT 
    SUBSTR(Text, 1, 13) AS hour,
    COUNT(*) AS warnings
FROM app.log
WHERE Text LIKE '%WARN%'
GROUP BY hour
ORDER BY hour"

Text Processing

bash
# Remove blank lines
loq -i:TEXT "SELECT Text FROM file.txt WHERE TRIM(Text) != ''"

# Find long lines
loq -i:TEXT "SELECT Text FROM file.txt WHERE STRLEN(Text) > 100"

# Deduplicate lines
loq -i:TEXT "SELECT DISTINCT Text FROM file.txt"

Configuration Files

bash
# Extract comments
loq -i:TEXT "SELECT Text FROM config.txt WHERE Text LIKE '#%'"

# Find specific settings
loq -i:TEXT "SELECT Text FROM config.ini WHERE Text LIKE 'debug=%'"

Data Extraction

bash
# Find lines with IP addresses
loq -i:TEXT "SELECT Text FROM access.log 
WHERE Text LIKE '%[0-9]%.[0-9]%.[0-9]%.[0-9]%'"

# Extract email addresses (simple pattern)
loq -i:TEXT "SELECT Text FROM contacts.txt WHERE Text LIKE '%@%.%'"

Library Usage with Regex Patterns

For advanced structured extraction, use the library API with regex patterns:

rust
use logparser_formats::TextLineInput;

// Parse Apache-style access logs
let mut input = TextLineInput::open("access.log")
    .unwrap()
    .with_pattern(
        r#"(?P<ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<date>[^\]]+)\] "(?P<request>[^"]+)" (?P<status>\d+) (?P<size>\d+)"#
    )
    .unwrap();

// Schema: [ip, date, request, status, size]

Pattern Features

  • Named Groups: Use (?P<name>...) to create columns
  • Skip Non-Matches: Lines that don't match are automatically skipped
  • NULL Values: Missing optional groups return NULL
  • String Type: All captured values are strings (use SQL CAST to convert)

Pattern Examples

Key-Value Logs

rust
// Parse: "user:alice age:30 city:NYC"
let pattern = r"user:(?P<user>\w+) age:(?P<age>\d+) city:(?P<city>\w+)";

Syslog Format

rust
// Parse: "Jan 15 08:30:15 server app[1234]: message"
let pattern = r"(?P<month>\w+) (?P<day>\d+) (?P<time>[\d:]+) (?P<host>\S+) (?P<app>\w+)\[(?P<pid>\d+)\]: (?P<message>.*)";

Custom Application Logs

rust
// Parse: "[ERROR] 2024-01-15 Component: message"
let pattern = r"\[(?P<level>\w+)\] (?P<date>[\d-]+) (?P<component>\w+): (?P<message>.*)";

Performance Notes

  • Line-by-line parsing is efficient for large files
  • Regex patterns add overhead - use basic mode when possible
  • Filtering with WHERE is fast - applied during reading
  • Large files are streamed, not loaded into memory

Performance Tips

bash
# Good: Filter early to reduce data
loq -i:TEXT "SELECT Text FROM huge.log WHERE Text LIKE '%ERROR%'"

# Good: Use LIMIT during exploration
loq -i:TEXT "SELECT Text FROM huge.log LIMIT 100"

# Better: Use more specific format if available
# If the file is actually CSV, use -i:CSV instead
loq -i:CSV "SELECT * FROM data.csv"

Comparison with Other Formats

When to Use TEXTLINE

  • Unstructured log files
  • Simple line-based text files
  • Quick exploration of unknown formats
  • When structure doesn't match standard formats

When to Use Other Formats

FormatUse Instead Of TEXTLINE When...
CSVData is comma/tab-separated
JSONData is JSON or NDJSON
W3CIIS/Apache logs with standard format
NCSAStandard Apache access logs
SYSLOGRFC 3164/5424 syslog messages
XMLStructured XML documents
TEXTWORDNeed word-level parsing with positions

Notes

  • Each line becomes exactly one row in basic mode
  • Empty lines are included (filter with WHERE TRIM(Text) != '')
  • Line endings (\n and \r\n) are automatically stripped
  • Regex patterns are only available in library mode, not CLI
  • For structured logs, consider using NCSA, W3C, or SYSLOG formats

See Also

All rights reserved.