I'm posting this mainly to remind myself how to do this as I keep on forgetting (anyone with good *nix-foo won't learn anything here).
I've been running code for a long time gathering data for a paper I will one day perhaps have time to write. I analyse the csv file routinely thanks to a
sleepcommand and everything is synced in a dropbox folder so if I'm bored I can take a look at this kind of graph every now and then:
Anyway! That's not the point.
The point is that at some point every now and then dropbox will get conflicted copies:
First of all I need to gather those two csv files together:
cat Output_file_with_permute.csv Output_file_with_permute\ \(Vince\ Knight\'s\ conflicted\ copy\ 2013-03-06\).csv > fixed.csv
We can check that we do indeed have all the files together using
grep -c .to count the number of rows in each file:
cat Output_file_with_permute.csv | grep -c . cat Output_file_with_permute\ \(Vince\ Knight\'s\ conflicted\ copy\ 2013-03-06\).csv | grep -c . cat fixed.csv | grep -c .
The output is shown (31499=9702+21798):
Removing duplicatesThis is really simple using the
sort fixed.csv | uniq > fixed_Output_file.csv
This sorts the file and using the
uniqcommand to just output the unique ones.
If I count how many files are in the new file:
cat fixed_Output_file.csv | grep -c .
I get 21797 rows so it looks like the conflicted file didn't have any rows that the main file was missing.
I've used all this before when I had code running on multiple machines which obviously created a bunch of conflicted copies (because of how dropbox does things) with relevant data all over the place.
The final step is to simply clean all this up by removing the unwanted files:
mv fixed_Output_file.csv Output_file_with_permute.csvrm fixed.csv
rm Output_file_with_permute\ \(Vince\ Knight\'s\ conflicted\ copy\ 2013-03-06\).csv
As I said above the main reason I've written this post is to try and make sure I remember how to do this (I've had to google this everytime I need to do this)...