83 Performance
Arun edited this page 2022-07-22 23:17:05 +05:30

Rick uses a 400TB for multiverse data, and nnn to manage it ;)

Optimization techniques

Significant factors

  • quicksort with pre-filters to sort by filename, time, size etc.
  • 0 fragmentation - no byte loss while storing file name of directory entries
  • no copying of filename strings while sorting/filtering
  • entries not matching a filter are moved to the lowest slots to ignore later
  • selection handling through optimized bulk memory manipulation
  • increased number of open file descriptors
  • fast redraw only affected lines to avoid complete refresh
  • all the large buffers are aligned
  • frequently used structures optimized to facilitate SSE quad-instruction support
  • read-ahead request (to kernels supporting the feature)
  • fast routines to calculate and render file size
  • shifts instead of div and mul (modern compilers do this already)
  • optimized memory usage everywhere, buffer re-use
  • no floating point arithmetic
  • static routines
  • controlled binary size
  • O3 level compiler optimization
  • 0-warning statically-analyzed code (forced -Wall -Wextra -Werror in CI)
  • statically generated hash-table for guaranteed O(1) extension icon lookups

Rejected options

  • replace quicksort with a more aggressive algorithm (favor space over time complexity)
    • the difference is not usually perceptible by human beings
    • the random load option is also removed to emphasize this
  • non-standard calls like statx()/getdents()/getdents64()
    • platform-specific, nnn is POSIX-compliant
    • man page Description begins with "These are not the interfaces you are interested in."
  • optimize handling 10K+ entries in a dir
    • not a mass use case
    • performance with 10K files is good enough today
    • SSD/NVMe are the future
  • use lazy/background/threaded load
    • du, sort and type-to-nav program options require a stat() on all entries for correctness

Selection

Supporting multi-directory selection is technically challenging; nnn uses many optimizations to make it unnoticeable in most cases, but select-all (including filtered) or inversion while already having lots of files selected may take some time.

When tested on Motorola Photon Q (ARM CPU with 13.54 bogomips) and a directory with 10K files, even the most complicated cases we came up with finished within 10 seconds. If the earlier selection was small or continuous, both select all (full as well as on filter) and invert were instant.

Comparison

GNU libc + ncursesw vanilla

Stripped binary (or script) size & memory usage of nnn and some other popular FMs (which existed before nnn) while viewing a directory with 13.5K files (0 directories), sorted by size/du:

 BINSZ    VIRT    RES    SHR S  %MEM   COMMAND
  650K  139720  91220   8460 S   1.1   ranger
    1M   50496  15328   4076 S   0.2   vifm
    1M   72152  12468   7336 S   0.2   mc
  110K   15740   4348   2460 S   0.1   nnn -t d

GNU libc + ncursesw + upx custom

Results with aggressive make options, static binary and upx compression:

$ make O_NORL=1 O_NOMOUSE=1 O_NOLC=1 O_NOBATCH=1 O_NOSSN=1 O_NOFIFO=1 O_QSORT=1 O_NOUG=1 static strip
$ upx nnn-static
$ top
 BINSZ    VIRT    RES    SHR S  %MEM   COMMAND
  582K    3136   3056      4 S   0.0   nnn-static -cdDEnQrux -t d ~/images // 11K files
  582K    2428   2240      4 S   0.0   nnn-static -cdDEnQrux -t d /usr/bin // 1.5K files

musl libc + netbsd-curses vanilla

Results with 1950 files:

$ musl-gcc -O3 -DNORL -I../netbsd-curses/libcurses -o nnn-shared src/nnn.c -Wl,-Bsymbolic-functions -L/opt/netbsd-curses -lcurses -lterminfo
$ strip nnn-shared
$ musl-gcc -O3 -DNORL -I../netbsd-curses/libcurses -o nnn-static src/nnn.c -Wl,-Bsymbolic-functions -L/opt/netbsd-curses -lcurses -lterminfo -static
$ strip nnn-static
 BINSZ    VIRT    RES    SHR S  %MEM   COMMAND
  102K    1952   1580    868 S   0.0   nnn-shared -dn
  354K    1164    824    348 S   0.0   nnn-static -dn

nnn vs. ls

The stripped binary size of ls is 130.7K.

nnn takes less than 50% time to list a directory with 2083 files:

$ time nnn /usr/bin
0.00user 0.01system 0:00.02elapsed 42%CPU (0avgtext+0avgdata 3540maxresident)k
1608inputs+0outputs (3major+325minor)pagefaults 0swaps

$ export LS_COLORS='ex=00:su=00:sg=00:ca=00:'
$ time ls -l /usr/bin
0.01user 0.01system 0:00.05elapsed 47%CPU (0avgtext+0avgdata 3800maxresident)k
784inputs+0outputs (0major+303minor)pagefaults 0swaps

nnn roughly makes one-third the system calls made by ls (all readings taken at cold boot).

$ time strace -c nnn -d /usr/bin | wc -l
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 92.56    0.014030           7      1989           newfstatat
  4.95    0.000751         188         4           getdents
  1.19    0.000180           3        57         1 stat
  0.53    0.000081          81         1           inotify_add_watch
  0.15    0.000023           2        15           close
  0.11    0.000017           2        10           brk
  0.10    0.000015           1        14           openat
  0.08    0.000012           2         6           write
  0.07    0.000010           0        23         3 ioctl
  0.05    0.000007           0        15           fstat
  0.05    0.000007           7         1         1 unlink
  0.04    0.000006           1         8           read
  0.04    0.000006           0        14           mmap
  0.04    0.000006           6         1           inotify_rm_watch
  0.03    0.000005           5         1           sysinfo
  0.01    0.000002           2         1           lseek
  0.00    0.000000           0         2           lstat
  0.00    0.000000           0        10           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         9           rt_sigaction
  0.00    0.000000           0         9         7 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           inotify_init1
------ ----------- ----------- --------- --------- ----------------
100.00    0.015158                  2194        12 total
0.01user 0.05system 0:00.08elapsed 84%CPU (0avgtext+0avgdata 3524maxresident)k
4576inputs+0outputs (10major+545minor)pagefaults 0swaps
0

$ export LS_COLORS='ex=00:su=00:sg=00:ca=00:'
$ time strace -c ls -l /usr/bin | wc -l
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 42.88    0.011274           6      1989           lstat
 27.69    0.007281           4      1989      1989 lgetxattr
 20.74    0.005454           4      1548      1548 getxattr
  5.82    0.001529           3       442           readlink
  1.35    0.000355          89         4           getdents
  0.38    0.000100           3        31           write
  0.24    0.000064           2        33         5 openat
  0.18    0.000048           1        34           close
  0.17    0.000046           3        17           lseek
  0.16    0.000042           1        40           mmap
  0.16    0.000041           4        11           munmap
  0.11    0.000030           1        30           fstat
  0.04    0.000011           2         5           brk
  0.04    0.000011          11         1           mremap
  0.03    0.000007           0        17           read
  0.00    0.000000           0        20           mprotect
  0.00    0.000000           0         2           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2         2 ioctl
  0.00    0.000000           0        12        12 access
  0.00    0.000000           0         4           socket
  0.00    0.000000           0         4         4 connect
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2         2 statfs
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00    0.026293                  6244      3562 total
0.02user 0.12system 0:00.13elapsed 106%CPU (0avgtext+0avgdata 3728maxresident)k
3752inputs+0outputs (7major+526minor)pagefaults 0swaps
1989

C vs. other languages

/* compare.c */

int main(void)
{
  for (int i = 0; i < 1000000; i++)
    printf("hello\n");

  return 0;
}

$ time -p ./compare
real 1.89
user 0.20 <<
sys 1.25
---------------------------------
# compare.sh
#!/bin/bash

for (( i = 0; i < 1000000; i++ ))
do
  echo hello
done

exit 0

$ time -p ./compare.sh
real 5.88
user 4.66 <<
sys 1.21