From: Stefano Zacchiroli Date: Wed, 4 Feb 2009 18:14:23 +0000 (+0100) Subject: sort + fosdem post X-Git-Url: http://git.upsilon.cc/?p=homepage.git;a=commitdiff_plain;h=4ab4b10e2e12d19e43be89c890111f4057096152 sort + fosdem post --- diff --git a/blog/archives/2009/02.mdwn b/blog/archives/2009/02.mdwn new file mode 100644 index 00000000..238bda07 --- /dev/null +++ b/blog/archives/2009/02.mdwn @@ -0,0 +1 @@ +[[template id=archive_month year="2009" month="02"]] diff --git a/blog/posts/2009/02.mdwn b/blog/posts/2009/02.mdwn new file mode 100644 index 00000000..ff29ca95 --- /dev/null +++ b/blog/posts/2009/02.mdwn @@ -0,0 +1 @@ +[[meta redir=archives/2009/02]] diff --git a/blog/posts/2009/02/sort_gotcha.mdwn b/blog/posts/2009/02/sort_gotcha.mdwn new file mode 100644 index 00000000..b959f068 --- /dev/null +++ b/blog/posts/2009/02/sort_gotcha.mdwn @@ -0,0 +1,42 @@ +# sorting text records on the first column only + +It turns out that [`join`](http://man.cx/join) is a strange +beast. Let's say you have two files, which are textual records using +TAB as separator, containing different columns beside the first one, +which is the record key. Using join you can do cool stuff like `join +file1 file2` to obtain a join in the relations sense, by default on +the first column. + +The [join manpage](http://man.cx/join) states that the two fields must +be sorted on the join field. This is of course to avoid complexity +explosion. The annoying thing is that if you forget to do that, you +will miss tuples, without any warning, which is not good (and can +actually be at least *spotted* avoiding complexity explosion ...). +So yeah, if I'm here it's because after having been hit by +this, I finally read the fine manpage :-) + +But how to sort on the join field? Turns out that dear old `sort -u +file | [sponge](http://packages.debian.org/sid/moreutils) file` is not +enough. Why? Because by default it will use *all* keys to sort and +then you will end up sorting also on data values; needless to say that +such sorting methods is not stable among the two different files you +want to sort in turn. + +OK then, let's try with `sort -u -k 1 file | sponge file`. Not even, +but close. ... because [`sort`](http://man.cx/sort) with this syntax +will *start* sorting at key 1 (which is the default), but then +continue sorting with the other keys, in the hope of doing you a +favour. + +The magic line is then `sort -u -k 1,1 file | sponge file`, which +tells `sort` to (damn) sort using only the first (damned) key and +stopping there. Thank you `sort`. Have a nice day. + +I’m going to FOSDEM,
+the Free and Open Source Software Developers’ European Meeting +**PS** I've been sick this week, but I'm definitely getting better, so +I can finally announce that I'm going to FOSDEM 2009, yay, see you +there! + +[[!tag lang/english planet/debian]] diff --git a/blog/posts/2009/02/sort_gotcha/going-to-fosdem.png b/blog/posts/2009/02/sort_gotcha/going-to-fosdem.png new file mode 100644 index 00000000..11fad7b6 Binary files /dev/null and b/blog/posts/2009/02/sort_gotcha/going-to-fosdem.png differ