bash - 1) Reordering one csv file based on another file header and 2) Merging one column of one csv file to another and remove duplicate - TagMerge
21) Reordering one csv file based on another file header and 2) Merging one column of one csv file to another and remove duplicate1) Reordering one csv file based on another file header and 2) Merging one column of one csv file to another and remove duplicate

1) Reordering one csv file based on another file header and 2) Merging one column of one csv file to another and remove duplicate

Asked 1 years ago
0
2 answers

You can do it all quite easily with Miller, which is available here as a static binary. Put the mlr executable somewhere in your PATH and you're done with the installation.


For starters, I'll assume that we're working with two CSV files with no inconsistency in the column names:

cat file1.csv
Username, Identifier,One-time password,Recovery code,First name,Last name,Department,Location
booker12,9012,12se74,rb9012,Rachel,Booker,Sales,Manchester
cat file2.csv
Department,First name,Last name,One-time password
Engineering,Oliver,Den,09ja61
Sales,Rachel,Booker,12se74

rename a specified field:
mlr --csv rename 'First name,Name' file2.csv
Department,Name,Last name,One-time password
Engineering,Oliver,Den,09ja61
Sales,Rachel,Booker,12se74

re-order the columns of file2.csv based on the header of file1.csv:
mlr --csv reorder -f "$(head -n 1 file1.csv)" file2.csv
One-time password,First name,Last name,Department
09ja61,Oliver,Den,Engineering
12se74,Rachel,Booker,Sales

add missing columns in file2.csv based on the header of file1.csv:
mlr --csv template -t file1.csv file2.csv
Username, Identifier,One-time password,Recovery code,First name,Last name,Department,Location
,,09ja61,,Oliver,Den,Engineering,
,,12se74,,Rachel,Booker,Sales,

remove duplicates based on One-time password,First name,Last name unicity:
mlr --csv head -n 1 -g 'One-time password,First name,Last name' fileX.csv

concatenate file1.csv and file2.csv:
mlr --csv unsparsify file1.csv file2.csv
Username, Identifier,One-time password,Recovery code,First name,Last name,Department,Location
booker12,9012,12se74,rb9012,Rachel,Booker,Sales,Manchester
,,09ja61,,Oliver,Den,Engineering,
,,12se74,,Rachel,Booker,Sales,

concatenate file1.csv and file2.csv and remove the duplicates based on One-time password,First name,Last name unicity:

The command is composed of a chain of operations.

mlr --csv unsparsify then head -n 1 -g 'One-time password,First name,Last name' file1.csv file2.csv
Username, Identifier,One-time password,Recovery code,First name,Last name,Department,Location
booker12,9012,12se74,rb9012,Rachel,Booker,Sales,Manchester
,,09ja61,,Oliver,Den,Engineering,

Lastly, let's suppose that the column First name is called Name in file2.csv and that you want to concatenate file1.csv and file2.csv and remove the duplicates based on One-time password,First name,Last name unicity.

You can do it by prepending a rename operation to the previous command:

mlr --csv rename 'Name,First name' then unsparsify then head -n 1 -g 'One-time password,First name,Last name' file1.csv file2.csv

Source: link

0

Change “/mydir” to your desired working directory.
import os
import glob
import pandas as pd
os.chdir("/mydir")
Match the pattern (‘csv’) and save the list of file names in the ‘all_filenames’ variable. You can check out this link to learn more about regular expression matching.
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
Use pandas to concatenate all files in the list and export as CSV. The output file is named “combined_csv.csv” located in your working directory.
#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')

Source: link

Recent Questions on bash

    Programming Languages