Showing posts with label parallel. Show all posts
Showing posts with label parallel. Show all posts

11 September 2012

232. Compile parallel (threaded) povray 3.7-rc6 on Debian Wheezy

Update 13 May 2013: This build won't work with v3.7-rc7 on debian wheezy if you have libjpeg62 installed. See http://verahill.blogspot.com.au/2013/05/413-povray-37-rc7-on-debian-wheezy.html.

Remove libjpeg62 and it works fine though.

Original post
Expanding my little cluster has got me thinking about additional uses for it. The primary purpose is obviously work i.e. MD simulations using gromacs and ab initio calcs using NWChem and Gaussian. I'm also testing it with John the Ripper to see how well the users of the linux box in the lab are choosing their passwords.

At that point I realised that it'd be sweet to have at least an OMP capable version of povray to speed things up when polishing figures for those elusive journal covers.

Debian testing currently uses v. 3.6.1 of povray but

  POV-Ray 3.6 does not support multithreaded rendering. POV-Ray 3.7 does.

So compile we will although v 3.7 is beta, so be aware.
sudo mkdir /opt/povray
sudo chown $USER /opt/povray

wget http://povray.org/redirect/www.povray.org/beta/source/povray-3.7.0.RC6.tar.gz
tar xvf povray-3.7.0.RC6.tar.gz
cd povray-3.7.0.RC6/
sudo apt-get install libboost-all-dev libpng-dev libjpeg-dev libtiff-dev build-essential libsdl-dev

Note: libboost-all-dev is big. It might be enough with libboost-thread-dev

./configure --prefix=/opt/povray --program-suffix=_3.7 COMPILED_BY="me@here"
===============================================================================
POV-Ray 3.7.0.RC5 has been configured.

Built-in features:
  I/O restrictions:          enabled
  X Window display:          disabled
  Supported image formats:   gif tga iff ppm pgm hdr png jpeg tiff
  Unsupported image formats: openexr

Compilation settings:
  Build architecture:  x86_64-unknown-linux-gnu
  Built/Optimized for: x86_64-unknown-linux-gnu (using -march=native)
  Compiler vendor:     gnu
  Compiler version:    g++ 4.7
  Compiler flags:      -pipe -Wno-multichar -Wno-write-strings -fno-enforce-eh-specs -s -O3 -ffast-math -march=native -pthread

Type 'make check' to build the program and run a test render.
Type 'make install' to install POV-Ray on your system.

The POV-Ray components will be installed in the following directories:
  Program (executable):       /opt/povray/bin
  System configuration files: /opt/povray/etc/povray/3.7
  User configuration files:   $HOME/.povray/3.7
  Standard include files:     /opt/povray/share/povray-3.7/include
  Standard INI files:         /opt/povray/share/povray-3.7/ini
  Standard demo scene files:  /opt/povray/share/povray-3.7/scenes
  Documentation (text, HTML): /opt/povray/share/doc/povray-3.7
  Unix man page:              /opt/povray/share/man
===============================================================================

The way it is configured we can keep our debian version of povray and install the newer version (povray_3.7)

make
make install

Seems like -geometry 1000x1000 doesn't work anymore. Instead use -H1000 -W1000

I've played around with it a little bit and it does parallel (threaded) execution nicely.

wget http://www.ms.uky.edu/~lee/visual05/povray/fourcube7.pov
./povray_3.7 -H1000 -W1000 fourcube7.pov +A0.1
takes 9 seconds on an AMD II X3. The standard, serial Debian version takes 21 seconds.

231. Compiling john the ripper: single/serial, parallel/OMP and MPI

Update: updated for v1.7.9-jumbo-7 since hccap2john in 1.7.9-jumbo-6 was broken

For no particular reason at all, here's how to compile John the Ripper on Debian Testing (Wheezy). It's very easy, and this post is probably a bit superfluous. The standard version only supports serial and parallel (OMP). See below for MPI.


The regular version: 

mkdir ~/tmp
cd ~/tmp
wget http://www.openwall.com/john/g/john-1.7.9.tar.gz
tar xvf john-1.7.9.tar.gz
cd john-1.7.9/src

If you don't edit the Makefile you build a serial/single-threaded binary.
If you want to build a threaded version for a single node with a multicore processor (OMP) do:
Edit Makefile and uncomment row 19 or 20

 18 # gcc with OpenMP
 19 OMPFLAGS = -fopenmp
 20 OMPFLAGS = -fopenmp -msse2
make clean linux-x86-64
cd ../run

You now have a binary called john in your ../run folder.


The Jumbo version:
If you want to build a distributed version with MPI (can split jobs across several nodes) you need the enhanced, community version:

sudo apt-get install openmpi-bin libopenmpi-dev
cd ~/tmp
wget http://www.openwall.com/john/g/john-1.7.9-jumbo-7.tar.gz
tar xvf john-1.7.9-jumbo-7.tar.gz 
cd john-1.7.9-jumbo-7/src

Edit the Makefile
  20 ## Uncomment the TWO lines below for MPI (can be used together with OMP as well)
  21 ## For experimental MPI_Barrier support, add -DJOHN_MPI_BARRIER too.
  22 ## For experimental MPI_Abort support, add -DJOHN_MPI_ABORT too.
  23 CC = mpicc -DHAVE_MPI
  24 MPIOBJ = john-mpi.o

and do
make clean linux-x86-64-native
cd ../run

I had a look at the passwords on one of our lab boxes -- it immediately discovered that someone had used 'password' as the password...


These test were run on my old AMD II X3 445. Processes which don't speed up with MP are highlighted in red. LM DES is borderline -- it's faster, but doesn't scale well.

Here's the single thread/serial version:
./john --test
Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts:     2906K c/s real, 2918K c/s virtual
Only one salt:  2796K c/s real, 2807K c/s virtual
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts:     95564 c/s real, 95948 c/s virtual
Only one salt:  93593 c/s real, 93781 c/s virtual
Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
Raw:    14094 c/s real, 14122 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw:    918 c/s real, 919 c/s virtual
Benchmarking: Kerberos AFS DES [48/64 4K]... DONE
Short:  474316 c/s real, 475267 c/s virtual
Long:   1350K c/s real, 1356K c/s virtual
Benchmarking: LM DES [128/128 BS SSE2-16]... DONE
Raw:    39843K c/s real, 39923K c/s virtual
Benchmarking: generic crypt(3) [?/64]... DONE
Many salts:     262867 c/s real, 263393 c/s virtual
Only one salt:  260121 c/s real, 260642 c/s virtual
Benchmarking: Tripcode DES [48/64 4K]... DONE
Raw:    369843 c/s real, 370584 c/s virtual
Benchmarking: dummy [N/A]... DONE
Raw:    99512K c/s real, 99712K c/s virtual
Here's the OMP version:
Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts:     6706K c/s real, 2555K c/s virtual
Only one salt:  5015K c/s real, 2091K c/s virtual
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts:     205670 c/s real, 85411 c/s virtual
Only one salt:  238524 c/s real, 86720 c/s virtual
Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
Raw:    38400 c/s real, 13812 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw:    2306 c/s real, 845 c/s virtual
Benchmarking: Kerberos AFS DES [48/64 4K]... DONE
Short:  474675 c/s real, 476581 c/s virtual
Long:   1332K c/s real, 1335K c/s virtual
Benchmarking: LM DES [128/128 BS SSE2-16]... DONE
Raw:    49046K c/s real, 16785K c/s virtual
Benchmarking: generic crypt(3) [?/64]... DONE
Many salts:     721670 c/s real, 246640 c/s virtual
Only one salt:  699168 c/s real, 239605 c/s virtual
Benchmarking: Tripcode DES [48/64 4K]... DONE
Raw:    367444 c/s real, 369657 c/s virtual
Benchmarking: dummy [N/A]... DONE
Raw:    100351K c/s real, 100552K c/s virtual
And here's the MPI version:
mpirun -n 3 ./john --test
(note that this includes a great many more tests than the default version)
Benchmarking: Traditional DES [128/128 BS SSE2-16]... (3xMPI) DONE
Many salts:     8533K c/s real, 8707K c/s virtual
Only one salt:  7705K c/s real, 8110K c/s virtual
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... (3xMPI) DONE
Many salts:     279808 c/s real, 282634 c/s virtual
Only one salt:  273362 c/s real, 276096 c/s virtual
Benchmarking: FreeBSD MD5 [128/128 SSE2 intrinsics 12x]... (3xMPI) DONE
Raw:    65124 c/s real, 65781 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... (3xMPI) DONE
Raw:    2722 c/s real, 2749 c/s virtual
Benchmarking: Kerberos AFS DES [48/64 4K]... (3xMPI) DONE
Short:  1387K c/s real, 1415K c/s virtual
Long:   3880K c/s real, 3959K c/s virtual

Benchmarking: LM DES [128/128 BS SSE2-16]... (3xMPI) DONERaw:    114781K c/s real, 115940K c/s virtual

I don't quite understand the Kerberos results.



Other targets of interest are:

linux-x86-64-avx         Linux, x86-64 with AVX (2011+ Intel CPUs)
linux-x86-64-xop         Linux, x86-64 with AVX and XOP (2011+ AMD CPUs)
linux-x86-64             Linux, x86-64 with SSE2 (most common)
linux-x86-avx            Linux, x86 32-bit with AVX (2011+ Intel CPUs)
linux-x86-xop            Linux, x86 32-bit with AVX and XOP (2011+ AMD CPUs)
linux-x86-sse2           Linux, x86 32-bit with SSE2 (most common, if 32-bit)
linux-x86-mmx            Linux, x86 32-bit with MMX (for old computers)
linux-x86-any            Linux, x86 32-bit (for truly ancient computers)

The FX 8150 does AVX and XOP, while my 1055T doesn't.

The community version has more options:

linux-x86-64-native      Linux, x86-64 'native' (all CPU features you've got)
linux-x86-64-gpu         Linux, x86-64 'native', CUDA and OpenCL (experimental)
linux-x86-64-opencl      Linux, x86-64 'native', OpenCL (experimental)
linux-x86-64-cuda        Linux, x86-64 'native', CUDA (experimental)
linux-x86-64-avx         Linux, x86-64 with AVX (2011+ Intel CPUs)
linux-x86-64-xop         Linux, x86-64 with AVX and XOP (2011+ AMD CPUs)
linux-x86-64[i]          Linux, x86-64 with SSE2 (most common)
linux-x86-64-icc         Linux, x86-64 compiled with icc
linux-x86-64-clang       Linux, x86-64 compiled with clang
linux-x86-gpu            Linux, x86 32-bit with SSE2, CUDA and OpenCL (experimental)
linux-x86-opencl         Linux, x86 32-bit with SSE2 and OpenCL (experimental)
linux-x86-cuda           Linux, x86 32-bit with SSE2 and CUDA (experimental)
linux-x86-sse2[i]        Linux, x86 32-bit with SSE2 (most common, 32-bit)
linux-x86-native         Linux, x86 32-bit, with all CPU features you've got (not necessarily best)
linux-x86-mmx            Linux, x86 32-bit with MMX (for old computers)
linux-x86-any            Linux, x86 32-bit (for truly ancient computers)
linux-x86-clang          Linux, x86 32-bit with SSE2, compiled with clang
linux-alpha              Linux, Alpha
linux-sparc              Linux, SPARC 32-bit
linux-ppc32-altivec      Linux, PowerPC w/AltiVec (best)
linux-ppc32              Linux, PowerPC 32-bit
linux-ppc64              Linux, PowerPC 64-bit
linux-ia64               Linux, IA-64