Linux Data Manipulation
Essential Linux commands for file creation and manipulation, including seq, rev, for loops, sed, and split for handling large data files.
Mark
Performance Testing Expert
Linux is a preferred tool for file creation and manipulation. It can manage larger files more efficiently compared to Windows alternatives.
Key Commands
SEQ Command
Creates number sequences with the syntax: starting number, increment value, maximum value.
# Generate numbers 1 to 10
seq 1 10
# Generate numbers 1 to 100 with increment of 5
seq 1 5 100
# Generate with leading zeros
seq -w 01 10
REV Command
Reverses strings. Useful for complex string manipulations when combined with other commands.
# Reverse a string
echo "hello" | rev
# Output: olleh
# Transform filename-01.csv to filename-02.csv
echo "filename-01.csv" | rev | cut -d'-' -f1 | rev | sed 's/01/02/'
FOR Command
Loops through command blocks. Demonstrates iterating through sequences.
# Loop through 1 to 10
for i in $(seq 1 10); do
echo "Processing file $i"
done
# Loop through files
for file in *.csv; do
echo "Found: $file"
done
SED Command
Modifies file content. Extremely powerful for text transformations.
# Add a header row to a CSV file
sed -i '1i column1,column2' filename01.csv
# Replace text in a file
sed -i 's/old_text/new_text/g' filename.csv
# Delete first line
sed -i '1d' filename.csv
SPLIT Command
Fragments large files into smaller chunks. Essential for handling million-row datasets.
# Split file into 10,000 line chunks
split -l 10000 largefile.csv
# This creates output files: xaa, xab, xac, etc.
# Split with custom prefix
split -l 10000 largefile.csv chunk_
# This creates: chunk_aa, chunk_ab, chunk_ac, etc.
Combining Commands
These commands become powerful when combined:
# Create numbered CSV files with headers
for i in $(seq -w 01 10); do
echo "column1,column2,column3" > "file-$i.csv"
echo "data1,data2,data3" >> "file-$i.csv"
done
# Process all CSV files
for file in *.csv; do
sed -i '1i new_header' "$file"
done
Further Reading
Tags: