join assumes that that input data is sorted based on the key
on which the join is going to take place.
In delimited data, elements of a record are separated by a special 'delimiter' character. In the CSV files, fields are delimited by commas or tabs:
Explanation of options:
"-t ," Input and output field separator is "," (for CSV) "-a 1" Output a line for every line of j1 not matched in j2 "-a 2" Output a line for every line of j2 not matched in j1 "-o 0,1.2,2.2" Output field format specification
For the last option,
0 denotes the match (join) field (needed when using
1.2 denotes field 2 from file 1 ("j1") and
2.2 denotes field 2 from
file 2 ("j2").
-a option creates a full outer join as in SQL.
This command must be given two and only two input files.
To join several files you can loop through them.
File "J" is now the full outer join of "j1", "j2".
and so on through j4, j5…
For many files this is best done with a loop:
Sorted Data Note
join assumes that the input data has been sorted by the field to be joined.
See section on
sort for details.