12 January 2012

48. nvidia gt520 issues and solutions on debian testing

EDIT: see here for a Linux Mint Debian Edition take on the GT 520 nouveau issue -- http://community.linuxmint.com/tutorial/view/824

EDIT: Someone made a succinct how-to for nvidia driver installation on debian: http://blog.libremath.org/2012/04/07/debian-nvidia-quick-guide/ NOTE: site seems to be down.

--start here --
I recently bought an nvidia gt520 1 GB graphics card. To my surprise it turned out to be a bit of a pain to actually get it working properly.

Sadly, we don't always document all the steps when trying to get something to work, but here's roughly what I remember.

The problem:
I plugged the nvidia gt520 into the pci express slot, connected the vga cable to the vga socket on the new graphics card and started my computer. My setup autostarts gdm3. Everything went fine -- the boot messages were flashing by as per normal, then gdm3 started. And got stuck. I experience two different types of hanging  -- either just a black screen, or a black screen with a single cursor indicator (a single _ in the top left corner).

Logging in remotely (had ssh server running) and looking at top I could see that gdm3 was using up 50+% cpu power. Leaving the system for half an hour didn't allow for any progress.

Also, even when I did ctrl+alt+f1 to bring up tty1 I would be forcibly returned to tty7 over and over again. Trying to fix anything was thus difficult. After doing ctrl+alt+f1 a few times and being thrown around it would stop responding and strange symbols would appear on the screen when trying to use the keyboard.

One last piece of information: my onboard graphics is nvidia as well, but this probably isn't relevant.

Logging in remotely I tried using the excellent smxi / sgfxi scripts (http://smxi.org/) to install the proper graphics drivers. I tried nouveau, debian-nvidia and nvidia-current . I also tried just deleting /etc/X11/xorg.conf and hoping for the best

Diagnosis:
First I made sure gmd wasn't starting anymore so that the computer wouldn't hang and I'd be able to work in peace:
sudo vim /etc/default/grub

CMD_LINUX_DEFAULT="quiet splash"
was changed to
CMD_LINUX_DEFAULT="quiet splash text"
(there may be other things on the same line -- just add text)

Then to make the changes take effect,
sudo update-grub
and reboot

Second I tried unloading any modules

sudo rmmod nouveau
sudo rmmod nvidia

I edited /etc/modules and commented out nvidia, and made sure nouveau was there. I also edited etc/modprobe.d/nvidia-kernel-common.conf and commented out blacklist nouveau.

I then tried installing the nouveu driver a last time
sudo sgfxi -N nouveau
and rebooted
After the reboot had completed dmesg| grep nouv gave me the clue I needed -- the drivers had failed to load! I don't remember what the exact message was, but it was all about failure.


Solution:
(also see first post below)

I removed the xorg.conf
sudo rm /etc/X11/xorg.conf
then
startx
The desktop started! But I found myself in fallback mode -- the graphics acceleration obviously wasn't working -- but that wasn't a surprise since the drivers had failed to load.

I then ran
sudo rmmod nouveau
sudo apt-get install nvidia-kernel-dkms nvidia-settings nvidia-smi nvidia-xconfig
sudo nvidia-xconfig
startx

It worked!

My autogenerated /etc/modprobe.d/nvidia-kernel-common.conf now looks like this again:
alias char-major-195* nvidia

options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=44 NVreg_DeviceFileMode=0660
# To enable FastWrites and Sidebus addressing, uncomment these lines
# options nvidia NVreg_EnableAGPSBA=1
# options nvidia NVreg_EnableAGPFW=1

# see #580894
blacklist nouveau

Remember to remove any mention of nouveau in /etc/modules.

You can change your /etc/default/grub back to the way it was again to make gdm start again every time.

Edit: Reading between the lines it seems that Squeeze may not have the proper drivers available for GT520 -- binary installation using smxi might be a good idea in that case: http://forums.debian.net/viewtopic.php?f=17&t=72876

Lengthy output follows:

Here's dmesg | grep nvidia

###############################
[    7.192358] nvidia: module license 'NVIDIA' taints kernel.
[    7.278115] nvidia 0000:02:00.0: PCI INT A -> Link[LNED] -> GSI 18 (level, low) -> IRQ 18
[    7.278122] nvidia 0000:02:00.0: setting latency timer to 64
###############################


Here's lspci -vvnn



###############################


02:00.0 VGA compatible controller [0300]: nVidia Corporation GF119 [GeForce GT 520] [10de:1040] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Giga-byte Technology Device [1458:3520]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 18
Region 0: Memory at df000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at d0000000 (64-bit, prefetchable) [size=128M]
Region 3: Memory at dc000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at ec00 [size=128]
[virtual] Expansion ROM at def80000 [disabled] [size=512K]
Capabilities: <access denied>
Kernel driver in use: nvidia

###############################


Here's lshw -C display (run as user)
###############################
WARNING: you should run this program as super-user.

  *-display            
       description: VGA compatible controller
       product: GF119 [GeForce GT 520]
       vendor: nVidia Corporation
       physical id: 0
       bus info: pci@0000:02:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:18 memory:df000000-dfffffff memory:d0000000-d7ffffff memory:dc000000-ddffffff ioport:ec00(size=128) memory:def80000-deffffff
WARNING: output may be incomplete or inaccurate, you should run this program as super-user.
###############################


And here's the xorg.conf:


###############################


# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 290.10  (pbuilder@cake)  Wed Nov 23 11:33:47 UTC 2011

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    SubSection     "Display"
        Depth       24
    EndSubS


###############################



Links to this page:
http://community.linuxmint.com/tutorial/view/824

11 January 2012

47. A step on the way to compiling the omnibook apci drivers on debian testing

First, see here:
http://sourceforge.net/projects/omnibook/
http://home.comcast.net/~rickrich1/toshiba-1115-s103/omnibook.txt

I have an old toshiba satellite a205 which has a fan that turns on at 50 degrees and turns off at 45 degrees. It's a much too narrow range, so the fan is starting up every two minutes or so -- having it turn off at 40 degrees would probably make more sense. To this effect I wanted to see if I could get apci fan support.

In the end I don't seem to have succeeded, but here's what I did manage to do, and what happened:

Go:
Install build-essential and the kernel headers for your kernel

git clone git://omnibook.git.sourceforge.net/gitroot/omnibook/omnibook
cd omnibook/

vim polling.c
comment out lines 128:
//        cancel_rearming_delayed_workqueue(omnibook_wq, &omnibook_poll_work);

and 191 using // :
//            cancel_rearming_delayed_workqueue(omnibook_wq, &omnibook_poll_work);

Run:
sudo make load

Read through doc/INSTALL, then


WARNING:
Some say that loading with the wrong ectype can be bad for you computer. My guess is that things will be fine as long as you don't put the computer on heavy load while trying the method below out so that you don't risk burning anything.

OK, time to try the shotgun approach:
var=1 && sudo rmmod omnibook && sudo modprobe omnibook ectype=$var && ls /proc/omnibook

Do this with values of var from 1-16. See which one gives the 'best' support. For me most things showed up for all ectypes between 1 and 10, but only ectype=1 show fan_policy

Next do cat /proc/omnibook/fan , cat /proc/omnibook/display, cat /proc/omnibook/battery etc. to see whether the settings seem to correspond to reality.

ectypes
I can't find the original document which details the different ectypes and the corresponding laptop models. Your best guess is to do like I did above (just trying randomly) or to google for omnibook and ectype and see which model is closer to yours.

Making omnibook load on boot:
vim /etc/modules
add a line saying
omnibook

and the create a file called omnibook.conf under /etc/modprobe.d
In /etc/modprobe.d/omnibook.conf you put a single line:
options modprobe ectype=12

cat /proc/omnibook/fan_policy gives

Fan off temperature:         0 C
Fan on temperature:          0 C
Fan level 2 temperature:     0 C
Fan level 3 temperature:     0 C
Fan level 4 temperature:     0 C
Fan level 5 temperature:     0 C
Fan level 6 temperature:    10 C
Fan level 7 temperature:    108 C
Minimal temperature to set: 25 C
Maximal temperature to set: 95 C

Those are the same values as in fan_policy.c (in the source code we downloaded). It seems that the way to change the values is that you should recompile, which is easy enough but also a bit scary. Haven't played with it yet.


Here's tree /proc/omnibook :

/proc/omnibook
├── ac
├── battery
├── blank
├── display
├── dmi
├── fan
├── fan_policy
├── hotkeys
├── lcd
├── temperature
├── touchpad
└── version

0 directories, 12 files



Here's the dmesg | grep omni output:

[    8.792966] omnibook: Driver version 2.20090707-trunk.
[    8.792969] omnibook: Forced load with EC type 1.
[    8.793055] omnibook: Feature range f86be5c0 - f86beac0
[    8.793058] omnibook: Testing feature ac at address f86be5c0
[    8.793060] omnibook: Begin table match of ac feature.
[    8.793063] omnibook: Attempting backend ec init.
[    8.793066] omnibook: Returning table entry nr 0.
[    8.793068] omnibook: Match succeeded: continuing with ac.
[    8.793072] omnibook: Testing feature battery at address f86be600
[    8.793075] omnibook: Begin table match of battery feature.
[    8.793077] omnibook: Attempting backend ec init.
[    8.793079] omnibook: Returning table entry nr 0.
[    8.793082] omnibook: Match succeeded: continuing with battery.
[    8.793086] omnibook: Testing feature blank at address f86be640
[    8.793088] omnibook: Begin table match of blank feature.
[    8.793090] omnibook: Attempting backend i8042 init.
[    8.793093] omnibook: Returning table entry nr 1.
[    8.793095] omnibook: Match succeeded: continuing with blank.
[    8.793098] omnibook: LCD backlight turn off at console blanking is enabled.
[    8.793102] omnibook: Testing feature bluetooth at address f86be680
[    8.793105] omnibook: Testing feature cooling at address f86be6c0
[    8.793107] omnibook: Testing feature display at address f86be700
[    8.793110] omnibook: Begin table match of display feature.
[    8.793112] omnibook: Attempting backend ec init.
[    8.793114] omnibook: Returning table entry nr 2.
[    8.793116] omnibook: Match succeeded: continuing with display.
[    8.795163] omnibook: Testing feature dock at address f86be740
[    8.795166] omnibook: Testing feature dump at address f86be780
[    8.795168] omnibook: Testing feature fan at address f86be7c0
[    8.795171] omnibook: Begin table match of fan feature.
[    8.795173] omnibook: Attempting backend ec init.
[    8.795176] omnibook: Returning table entry nr 0.
[    8.795178] omnibook: Match succeeded: continuing with fan.
[    8.795182] omnibook: Testing feature fan_policy at address f86be800
[    8.795184] omnibook: Begin table match of fan_policy feature.
[    8.795187] omnibook: Attempting backend ec init.
[    8.795189] omnibook: Returning table entry nr 0.
[    8.795191] omnibook: Match succeeded: continuing with fan_policy.
[    8.795195] omnibook: Testing feature hotkeys at address f86be840
[    8.795197] omnibook: Begin table match of hotkeys feature.
[    8.795200] omnibook: Attempting backend i8042 init.
[    8.795202] omnibook: Returning table entry nr 0.
[    8.795204] omnibook: Match succeeded: continuing with hotkeys.
[    8.795207] omnibook: Enabling all hotkeys.
[    8.799296] omnibook: Testing feature dmi at address f86be880
[    8.799300] omnibook: dmi feature has no backend table, io_op not initialized.
[    8.799304] omnibook: Testing feature version at address f86be8c0
[    8.799307] omnibook: version feature has no backend table, io_op not initialized.
[    8.799311] omnibook: Testing feature lcd at address f86be900
[    8.799314] omnibook: Begin table match of lcd feature.
[    8.799317] omnibook: Attempting backend ec init.
[    8.799319] omnibook: Returning table entry nr 2.
[    8.799322] omnibook: Match succeeded: continuing with lcd.
[    8.799326] omnibook: Testing feature muteled at address f86be940
[    8.799329] omnibook: Testing feature key_polling at address f86be980
[    8.799332] omnibook: Testing feature temperature at address f86be9c0
[    8.799334] omnibook: Begin table match of temperature feature.
[    8.799337] omnibook: Attempting backend ec init.
[    8.799339] omnibook: Returning table entry nr 0.
[    8.799341] omnibook: Match succeeded: continuing with temperature.
[    8.799347] omnibook: Testing feature touchpad at address f86bea00
[    8.799350] omnibook: Begin table match of touchpad feature.
[    8.799352] omnibook: Attempting backend i8042 init.
[    8.799355] omnibook: Returning table entry nr 0.
[    8.799357] omnibook: Match succeeded: continuing with touchpad.
[    8.799361] omnibook: Testing feature wifi at address f86bea40
[    8.799363] omnibook: Testing feature throttling at address f86bea80
[    8.799366] omnibook: Enabled features: ac battery blank display fan fan_policy hotkeys dmi version lcd temperature touchpad.

46. A VERY rough approach to "benchmarking" gromacs (unscientific) on debian

Here's a comparison between different hardware and binaries which were built as described in http://verahill.blogspot.com/2012/01/debian-testing-64-wheezy-compiling.html

The simulation in question is a 100,000 step 100 ps simulation of a carbonate ion in water. Check back later for more details.

grompp -f carbonate.mdp -c carbonate.pdb -p carbonate.top -o carbonate.tpr
time mdrun -v -deffnm carbonate



Conclusions: 1) Double precision is slower by 25-35 % than single precision. 2) On a single machine there's no gain in using mpi. 3) Sadly, it appears that  intel i5-2400 X4 3.1GHz is more expensive than AMD Phenom II X6 3.1GHz for a reason.



Machines:
Be -- Phenom II X6, 8Gb RAM.
B --   Athlon X3 3.1 GHz 3 core, 4Gb RAM
Ta -- Optiplex 990 (i5 2400 3.1 GHz 4 core, 8Gb RAM). This machine was running a full gnome3/gnome-shell desktop at the same time as the tests were carried out. Take the results with a grain of salt.

Using mdrun (no mpi, single precision)
-------------------------------
Run   Be (6)     Ta (4)   B ( 3)
-------------------------------
1       1m27s    1m1s    1m48s
2       1m28s    1m1s    1m46s
3       1m35s    1m1s    1m47s
4       1m32s    1m1s    1m47s
5       1m33s    1m1s    1m47s


Using mdrun_dd (no mpi, double precision)
-------------------------------
Run   Be (6)     Ta (4)   B ( 3)
-------------------------------
1       1m49s    1m15s    2m25s
2       1m47s    1m15s    2m26s
3       1m51s    1m15s    2m26s
4       1m59s    1m15s    2m24s
5       1m58s    1m15s    2m26s



Using mdrun_mpi (mpi, single precision)

Machine: Be (Phenom II X6, 8Gb RAM).
(5 cores doesn't work)
---------------------------------
Cores/Run:   1            2      
---------------------------------
1                  4m11s   4m13s
2                  2m12s   2m15s
3                  1m46s   1m43s
4                  1m31s   1m31s
5                  ----------------
6                  1m28s   1m35s

Machine: Ta (Optiplex 990 (i5 3.1 GHz 4 core, 8Gb RAM).
---------------------------------
Cores/Run:   1            2           3
--------------------------------
1                  3m20s   3m20s   3m20s
2                  1m39s   1m38s   1m40s
3                  1m12s   1m13s   1m12s
4                  1m01s   1m01s   1m00s


Machine: Athlon X3 3.1 GHz 3 core, 4Gb RAM.
---------------------------------
Cores/Run:   1            2           3
--------------------------------
1                  4m32s   4m33s   4m36s
2                  2m28s   2m28s   2m27s
3                  1m49s   1m50s   1m49s

Using mdrun_ddmpi (mpi, double precision):

Machine: Phenom II X6, 8Gb RAM.
---------------------------------
Cores/Run:   1            2    
---------------------------------
1                  5m23s   5m25s
2                  2m56s   2m54s
3                  2m11s   2m11s
4                  1m56s   1m57s
5                  -----------------
6                  1m51s   1m52s

Machine: Optiplex 990 (i5 3.1 GHz 4 core, 8Gb RAM).
---------------------------------
Cores/Run:   1            2           3
--------------------------------
1                  4m14s   4m13s   4m13s
2                  2m09s   2m09s   2m10s
3                  1m33s   1m33s   1m33s
4                  1m16s   1m16s   1m16s


Machine: Athlon X3 3.1 GHz 3 core, 4Gb RAM.
---------------------------------
Cores/Run:   1            2           3
--------------------------------
1                  5m01s   5m52s   5m50s
2                  3m17s   3m17s   3m18s
3                  2m31s   2m31s   2m31s