08 January 2013

307. Burning audio CDs from the command line in debian testing/wheezy

I've got a CD burner on a headless box, so burning audio CDs from the command line is a necessity.

I also normally end up burning flash videos that I've converted to mp3s, so I'll show that too. This post is essentially covered (although not very well) already here: http://verahill.blogspot.com.au/2012/01/debian-testing-64-wheezy-small-fixes.html

First install the necessary programmes
sudo apt-get install ffmpeg wodim mpg123


Converting flv to mp3
To batch-convert flv files to mp3, do
ls *.flv|xargs -I {}  ffmpeg -i {} -ar 44100 -ab 160k -ac 2 {}.mp3


Preparing the files
Rename your files to 01.mp3, 02.mp3 etc. to make the songs burn in that order (since you're using *wav below).

Convert the mp3s to wav files (you could've gone straight from flv to wav above)
for i in *.mp3; do mpg123 --rate 44100 --stereo --buffer 3072 --resync -w "`basename "$i" .mp3`".wav "$i"; done

Burn
Burn with
wodim -v -pad speed=1 dev=/dev/cdrw1 -dao -swab *.wav

assuming that cdrw1 is the correct device.

Eject your cd when done.
eject cdrom1

Done.
[There's also no shortage of terminal music players, such as cplay.]

PS. You can burn anything you want from the command line using burn, e.g. an .iso file
sudo apt-get install burn
sudo burn -I -n myiso.iso

If the device you want to burn on is /dev/cdrom1 instead of /dev/cdrom, you can edit that in /etc/burn.conf

07 January 2013

306. Insync with Google Drive and Google Docs on Debian Testing/Wheezy

The problem:
1. It would be nice to be able to use Google Docs as a replacement for Microsoft Word until Libre/OpenOffice catch up (post about that later) or the world switches to LaTeX and
2. for that to happen there needs to be an easier way to sync documents between google docs and your harddrive than using email.

The closest thing to that is using Google Drive to keep documents synced, and opening them in Google Docs using your browser.

It's been more than half a year since promising that Google Drive would be available for linux, and Google has yet to actually release anything (here) and it almost looks like vaporware (here).

In lieu of an official solution, there are a few options. One is Grive -- which seems to work with Google Drive but not Google Docs -- and another one is Insync, which isn't open source as it is owned by a start-up. It's the most promising and full-featured solution though, so we'll go with that.


There used to be gdocsfs, but it doesn't seem to be maintained.

The usual caveats about installing things from outside the repos apply, and even more so in this case since the source code is not available.

Setting up Insync
sudo apt-get install xdotool python-nautilus libxdo2 gir1.2-nautilus-3.0
mkdir -p ~/tmp/insync
cd ~/tmp/insync
wget http://s.insynchq.com/builds/insync-beta-gnome-cinnamon-common_0.9.34_amd64.deb
wget http://s.insynchq.com/builds/insync-beta-gnome_0.9.34_all.deb
sudo dpkg -i *.deb

If all went well you'll find InSync installed (move mouse to top-right corner, type insync and it should be there). Clicking on it opens a browser tab, in which you're asked to select the gmail account you wish to use.

You're next asked to allow InSync to do various things:

Confirm (you'll then get an email) and associate your machine with the account.

You should now have a new set of folders in $HOME:
/home/me/Insync/
`-- me@gmail.com





If you create a directory either in ~/Insync/me@gmail or in google drive using your browser, the directory should show up in both places (i.e. it's synced) -- assuming that you've got insync running:
insync --headless > /dev/null &

will keep it running in the background. Any doc file copied to the insync folder will now be editable in Google docs by pointing your browser to https://drive.google.com/#my-drive


Simple as that.

305. make -jN -- should N equal number of cores or N+1 cores? Optimal number of threads per core

Update: I repeated this test by compiling kernel 3.7.2 using different settings (http://verahill.blogspot.com.au/2013/01/321-compiling-kernel-372-on-debian.html)  -- given the length of the compile and it's reliance on CPU grunt it is probably a better test case. It came out showing that N -- or even N-1 -- was better than over-committing.

The original commentator also offered this explanation:
Historically N+1 or even N*1.5 was used & worked better on memory / I-O constrained systems where the available cache was used as a short lasting one to feed the extra committed threads / processes while I-O was in progress. As you've observed correctly this is not the case on machines that have an abundance of RAM, where this acts as a long lasting cache, no data that got written to disk will be read back > spawning additional threads / processes has therefore a detrimental effect on efficiency due to (much) more rescheduling / TLB shootdown interrupts. In short, when available ram is larger then total disk-space needed for build N = amount of logical cpu's if not N = logical cpu's + 1 Setting the global environment variable (CONCURRENCY_LEVEL) instead of fixed values for -j for automated builds using the previously mentioned #export CONCURRENCY_LEVEL=`getconf _NPROCESSORS_ONLN` is always the safest bet, especially when using server grade machines and high speed 0 seek time solid state disks ...
I think the conclusion is the one offered above -- stick to N for optimal performance, unless you have a compelling reason not to. I should also emphasize that I don't have a background in computing of any sort, whereas the poster is a professional in the HPC field.

So if I'm allowed to paraphrase and make conclusions:
for a very short compile, like the one in this post, you may find that N+1 seemingly gives a better result since disk I/O plays a big part relative to the code generation (and whatever else a compiler does). For a longer, more 'normal' compilation disk I/O play a smaller part.

If your RAM is too small and you need to cache to disk repeatedly, then that obviously increases the disk I/O as well.

In the end, the penalty for over-committing (http://verahill.blogspot.com.au/2013/01/321-compiling-kernel-372-on-debian.html) is large enough that it's a better bet to just got for N threads.

I really shouldn't be surprised -- it's the same effect you see when launching a computational job: you do NOT want to launch more threads than cores.

Original post:
I got a comment recently regarding the number of threads that should be used for make:
make -j7 is the number of cores +1 

Stop copy paste nonsense.... sigh...

make -j1 will spawn 1 worker process
-j7 will spawn 7. 

#export CONCURRENCY_LEVEL=`getconf _NPROCESSORS_ONLN`

makes adding -jjob unnecessary 
on an i7 this is the same as -j8

When in doubt check top.....

So the question is whether for N cores, should you spawn N threads or N+1? The poster has a valid point -- there's not that much data on what really is the best configuration and while most people keep repeating the (mostly) accepted N+1 (or 1.5*N)  wisdom, we really need more hard numbers.

So here's my real-world unscientific benchmark for compiling Gromacs 4.5.5 on a six core AMD Phenom II  1055T with 8 Gb RAM and a slow 5400 rpm hard drive (disk I/O plays into things as well). I'm using gcc 4.7.2-4 and Debian Wheezy/Testing.

To get the data I used this script, maketest.sh:

make distclean
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/openblas/lib
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/single/lib -L/opt/openblas/lib -lopenblas"
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/single/include -I/opt/openblas/include"
./configure --disable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_sp --prefix=/opt/gromacs/gromacs-4.5.5
time make -j$1

which I called with e.g.
sh maketest.sh 6

Admittedly, this is a fairly short build but it is a 'real' one.

Results:
N    Time (real)
1    9 m 52 s
2    5 m 18 s
3    3 m 48 s
4    3 m 02 s
5    2 m 24 s
6    2 m 16 s
7    2 m 05 s
8    2 m 06 s
9    2 m 07 s
10   2 m 07 s
11   2 m 08 s
12   2 m 09 s
Or as a plot:
The buiild time decreases roughly exponentially with the number of threads. The blue line is at 125 seconds i.e. dx/dy=0.
I'm actually quite surprised at how N+1 turned out to be the best configuration, although in general it seems that you don't suffer any penalty for using more threads, so 1.5*N works just as well.

I also ran sar (sysstat; sar -u 1 180 |gawk '{print $3,$5,$8}' |tee n7.dat ) for -j7 to see how the load varies with time during make (I collected a little bit of data before and after make, hence the flat line at the end):
The black/blue (user/idle) lines are what are interesting here
The build is very evidently not perfectly parallel at all stages, and that will also affect the optimal number of threads/core.


Raw results
N=1
real    9m51.519s
user    6m43.316s
sys     0m44.092s
N=2
real    5m18.359s
user    7m3.548s
sys     0m46.112s
N=3
real    3m47.850s
user    7m22.732s
sys     0m47.064s
N=4
real    3m2.131s
user    7m56.068s
sys     0m41.744s
N=5
real    2m24.258s
user    7m53.140s
sys     0m34.928s
N=6
real    2m16.429s
user    8m15.088s
sys     0m27.160s
N=7
real    2m5.361s
user    7m50.200s
sys     0m28.280s
N=8
real    2m5.820s
user    7m52.380s
sys     0m27.548s
N=9
real    2m7.266s
user    7m54.344s
sys     0m28.340s
N=10
real    2m7.057s
user    7m56.628s
sys     0m27.872s
N=11
real    2m7.728s
user    7m58.276s
sys     0m27.332s
N=12
real    2m8.819s
user    8m0.600s
sys     0m27.544s