Monday, January 4, 2016

Guide: Forcing low-power mode with NVidia cards on Linux

I googled for this for what seems like several hours, and only found wrong solutions. I figured it out myself in the end, so here's how to do it.

If you're like me, you're using Linux for work and Windows only for when you want to play a video game. In that scenario, I want my GPU to use low clocks when running Linux and run at the lowest temperatures possible. Coupled with the GTX 980 Ti I'm using, that translates to the fans of the cooler not spinning at all and thus the card becoming passively cooled. The fans of this particular card model only start spinning when the GPU reaches 60° Celsius.

To keep the card under the 60° threshold at all times, and thus passively cooled with 0db fan noise, the lowest power profile must be used. By default, the card will switch to maximum clock frequencies when there's load.

We can prevent that from happening by instructing the driver to:
  • Use the lowest power profile.
  • Use adaptive clock frequencies with the lowest power profile.
The second part is important, otherwise the GPU will use fixed clocks, which means it will get stuck at the highest clocks available on the lowest power profile. In this case, that would be 405MHz. However, we want the GPU to be able to go all the way down to 135MHz when idle and only switch to 405MHz when needed. Note that 405MHz is the highest it can go; we're forcing the lowest profile and thus the GPU will never use a 1392MHz clock.

So here's how to do it. In your X11 configuration directory, there should be a "xorg.conf.d" subdirectory. Usually that's "/etc/X11/xorg.conf.d/". In there, there should already be a file where the nvidia driver configuration resides. It looks like this:

Section "Device"
  Driver  "nvidia"
  [other options might exist here]
EndSection

The important thing is to find the 'Section "Device"' entry that has the 'Driver "nvidia"' entry in it. If no such file exists at all, simply create one. You can name it whatever you want as long as it ends with ".conf". In my case, I use "nvidia.conf".

We need to add a new option line to that "Device" section that forces the lowest power mode, and keeps adaptive clocking within that power mode working. Add this line:

Option  "RegistryDwords"   "PowerMizerEnable=0x1; PerfLevelSrc=0x3333; PowerMizerLevel=0x3; PowerMizerDefault=0x3; PowerMizerDefaultAC=0x3"

So the section should now look like this:

Section "Device"
  Driver  "nvidia"
  Option  "RegistryDwords"   "PowerMizerEnable=0x1; PerfLevelSrc=0x3333; PowerMizerLevel=0x3; PowerMizerDefault=0x3; PowerMizerDefaultAC=0x3"
  [other options might exist here - do not touch those]
EndSection

The various "PowerMizerLevel" entries specify which power mode to use. It doesn't matter how many power levels are shown in the nvidia-settings panel. The configuration file always deals with three:


  • Highest performance (mode 0x1)
  • Balanced (mode 0x2)
  • Lowest performance/maximum power saving (mode 0x3)


So we specify "0x3" as the mode to under all circumstances.

If you've read other guides on the net, you probably seen a "Coolbits" option. You do NOT need that. It's got nothing to do with being able to specify power levels.

You also might have seen the "PerfLevelSrc" entry be "0x2222" instead of "0x3333". That is also wrong. It turns out that this field consists of two bytes that specify whether to use adaptive or fixed clocks. 0x22 means fixed, and 0x33 means adaptive. The first byte specifies the clocks policy when on battery, the second when on AC power (or this might be vice versa, I'm not sure.) In any event, we want to use adaptive clocks for BOTH, so "0x3333" is the correct value and will allow the GPU to downclock to the minimum possible frequency instead of getting stuck at the highest frequency of the lowest power mode.

And that's pretty much it. Restart the X server (or simply reboot if you don't know how to restart X), and the new settings should take effect. You should be able to observe the clocks in the nvidia-settings control panel. In my case, the card will use 135MHz as the minimum, and 405MHz as the maximum. It will never switch to a higher power profile:

Monday, June 4, 2012

Building Gentoo Linux with GCC 4.7 and LTO

GCC 4.7 was recently added to portage. It's still hard masked and not keyworded, but works fairly well. As with almost every new major release of GCC, a few packages fail to build, but the community is already working on it; most reported packages have been fixed already. I fixed the ones I encountered myself (kdocker and zsnes) and submitted patches to Gentoo's Bugzilla. Still, if you're using a package that hasn't been fixed yet and no workaround has been posted, you might want to hold off from switching your system to GCC 4.7 as its main compiler. (Just installing 4.7 will have no impact on your system, since the current active version will remain the default.)

There's one feature in this release that stands out: the much improved link time optimization (LTO). What this does is hold off most of the optimization work until the final link step so that the optimizer can work across the whole program rather than on individual object files. LTO is activated using the -flto and -fno-use-linker-plugin options.

As you can guess, this isn't your usual Gentoo ricer option (see -funroll-loops for that). It's a useful feature that helps the optimizer produce much better results. Most commercial compilers had this for a long time but GCC was lagging behind. Well, for now it's a ricer option, since it's unsupported by the Gentoo devs, but I fully expect it to become a supported option at some point as GCC will keep improving it. Building your whole system with LTO was still a hopeless endeavor with GCC 4.6. But in 4.7 it actually improved to the point where you can emerge -e world with it. And I did. And it works, even.

My Gentoo box is a KDE desktop system, with 1043 packages installed. Out of those 1043, only 33 could not be built with LTO. Quite a step up from GCC 4.6! Last time I experimented with it on 4.6, I gave up after about the 100th package failing to build. Worse, some packages would build but not run. So LTO in GCC 4.7 seems to really have improved a lot. There was only one package that would build but not run correctly (dev-libs/glib).

How to disable LTO

So here's what you do if you want to try this yourself. The most important thing to set up is a way to disable LTO for packages that don't work well with it. Fortunately, package.env makes this easy. First, create the file:

/etc/portage/env/no-lto.conf

With the following contents:

CFLAGS="${CFLAGS} -fno-lto -fno-use-linker-plugin"
CXXFLAGS="${CXXFLAGS} -fno-lto -fno-use-linker-plugin"
LDFLAGS="${LDFLAGS} -fno-lto -fno-use-linker-plugin"


You can now use that file for every package that breaks with LTO. You do that by listing the appropriate package atoms, followed by no-lto.conf in /etc/portage/package.env. Here's how that looks on my system:

sys-apps/sysvinit no-lto.conf
dev-lang/perl no-lto.conf
sys-libs/gpm no-lto.conf
dev-libs/elfutils no-lto.conf
>=dev-lang/python-3 no-lto.conf
sys-fs/e2fsprogs no-lto.conf
sys-apps/hdparm no-lto.conf
sys-apps/pciutils no-lto.conf
media-sound/wavpack no-lto.conf
media-libs/libpostproc no-lto.conf
dev-vcs/cvs no-lto.conf
x11-libs/qt-script no-lto.conf
media-libs/alsa-lib no-lto.conf
dev-util/dialog no-lto.conf
sys-apps/hwinfo no-lto.conf
dev-util/valgrind no-lto.conf
app-cdr/cdrtools no-lto.conf
dev-libs/boost no-lto.conf
media-video/libav no-lto.conf
app-text/aspell no-lto.conf
net-misc/nx no-lto.conf
app-text/dvisvgm no-lto.conf
x11-libs/wxGTK no-lto.conf
media-video/mplayer2 no-lto.conf
x11-libs/qt-webkit no-lto.conf
x11-libs/qt-declarative no-lto.conf
x11-base/xorg-server no-lto.conf
dev-tex/luatex no-lto.conf
app-misc/strigi no-lto.conf
kde-base/kdelibs no-lto.conf
kde-base/okular no-lto.conf
net-im/amsn no-lto.conf
sys-devel/llvm no-lto.conf
sys-devel/clang no-lto.conf
app-office/lyx no-lto.conf
dev-libs/glib no-lto.conf
sys-auth/polkit no-lto.conf
net-analyzer/nmap no-lto.conf

Now all of the above packages will be built without LTO. Go ahead and copy the above into your own package.env.

Update: This list is somewhat old now; I suspect many of the above packages have already been fixed or current GCC versions are able to build them without errors.

How to enable LTO

To actually enable LTO, you need to change your make.conf and add -flto -fuse-linker-plugin to your CFLAGS/CXXFLAGS and LDFLAGS. Yes, it's also needed in LDFLAGS. Do not omit it!

In order to speed up LTO-enabled builds, you can pass the amount of concurrent jobs to be performed by the linker to the -flto option. So on a quad core CPU, you'd use -flto=4 (with a BFS kernel) or -flto=5 (with a vanilla kernel.) In general, use the same value from your MAKEOPTS variable.

Note that with LTO, you need to include your optimization flags in LDFLAGS as well. You can use the same optimizations you have in your CFLAGS. Though you can use different ones if you really want to.

Keep in mind that you'll need at least sys-devel/binutils-2.21 for LTO to work correctly. If (for some weird reason) you're still on 2.20, it will not work, since GNU ld versions prior to 2.21 do not support linker plugins (-fuse-linker-plugin).

For reference, here are the relevant entries from my own make.conf:

CFLAGS="-pipe -mtune=native -march=native -O2 -flto=4 -fuse-linker-plugin -fomit-frame-pointer -floop-interchange -floop-strip-mine -floop-block"
CXXFLAGS="${CFLAGS}"
LDFLAGS="-Wl,--as-needed -Wl,-O1 -Wl,--hash-style=gnu -Wl,--sort-common 
${CFLAGS}"

You're now ready to emerge -e @system followed by emerge -e @world. Of course you need to make GCC 4.7 the default compiler first. You do that by using the gcc-config tool.

Dealing with breakage

Since you most probably will have a different set of packages installed on your system compared to me, you might encounter build failures. Those can either be caused by LTO or by GCC 4.7 in general. So if a package fails to emerge, add it to package.env with no-lto.conf and emerge --resume. If that fails again for the same package, then it's probably a GCC 4.7 problem. In that case, you can skip building that particular package with emerge --resume --skipfirst. Don't worry, the package will continue to work even with the rest of the system being built with GCC 4.7. GCC 4.7 is compatible with 4.6 and binaries (including libraries) can be built with one and work OK with the other.

It would be nice if you filed a bug for packages that don't build with 4.7 and make that bug block 390247. Note: only file bugs for packages that fail with 4.7 without LTO. If a package only fails when LTO is enabled, don't file a bug; the Gentoo devs are not interested about LTO-related bugs at the moment.

Final thoughts

If you want to ask whether the system runs any faster now, I'm not going to answer that. I didn't run any benchmarks, and statements like "this and that program feel faster now" are subjective to begin with and very prone to placebo. Keep in mind that I didn't rebuild the whole system in order to get a faster system. The system was already fast enough. I mainly wanted to test whether LTO is ready for prime time. And it's very close to being ready. The next GCC version will surely improve compatibility even further. One day, package.env might be empty, even :-)