Discussion:
booting NetBSD/vax on real hardware
Charles Dickman
2014-05-25 22:06:15 UTC
Permalink
After listen to Holm's travails trying to get NetBSD running on his
vax stations I did a little experimenting.

test vax: VAXstation 4000 Model 60 with 104MB RAM
boot server: Pentium Celeron PC running NetBSD 6.1.3, 100baseTX ethernet

booting NetBSD-6.1.4 uncompressed kernel (netbsd) from local root disk: 25s
booting NetBSD-6.1.4 compressed kernel (netbsd.gz) from local root disk: 17m 30s

$ time gzip -d netbsd.gz
11.65 real 6.83 user 1.49 sys

netbooting NetBSD-6.1.4 Install System compressed

boot.6.1.4: 21m 30s
boot.4.0.1: fails
boot.matt: 2m 3s

netbooting NetBSD-6.1.4 Install System uncompressed

boot.6.1.4: 54s
boot.5.1.2: 58s
boot.4.0.1: fails
boot.matt: 60s

netbooting NetBSD-1.5.3 Install System compressed

boot.mop.1.5.3: 1m 34s
boot.6.1.4: 3m 23s
boot.5.1.2: 3m 27s
boot.4.0.1: 1m 27s
boot.3.1.1: fails
boot.2.1: fails
boot.matt: fails

It is interesting how poor the decompression speed is in boot for the
later releases.

The boot that I have been using for netbooting for the last 5 years or
so is boot.matt which I believe was supplied by Matt Thomas when the
boot included in release was broken. It looks like this was about the
same time that the decompress code in standalone boot transitioned
from lib/libz to net/zlib. It is clearly 10 times faster than in
6.1.4.

ftp://ftp.netbsd.org/pub/NetBSD/misc/matt/boot

-chuck
Dave McGuire
2014-05-25 23:06:14 UTC
Permalink
Post by Charles Dickman
After listen to Holm's travails trying to get NetBSD running on his
vax stations I did a little experimenting.
test vax: VAXstation 4000 Model 60 with 104MB RAM
boot server: Pentium Celeron PC running NetBSD 6.1.3, 100baseTX ethernet
booting NetBSD-6.1.4 uncompressed kernel (netbsd) from local root disk: 25s
booting NetBSD-6.1.4 compressed kernel (netbsd.gz) from local root disk: 17m 30s
$ time gzip -d netbsd.gz
11.65 real 6.83 user 1.49 sys
netbooting NetBSD-6.1.4 Install System compressed
boot.6.1.4: 21m 30s
boot.4.0.1: fails
boot.matt: 2m 3s
netbooting NetBSD-6.1.4 Install System uncompressed
boot.6.1.4: 54s
boot.5.1.2: 58s
boot.4.0.1: fails
boot.matt: 60s
netbooting NetBSD-1.5.3 Install System compressed
boot.mop.1.5.3: 1m 34s
boot.6.1.4: 3m 23s
boot.5.1.2: 3m 27s
boot.4.0.1: 1m 27s
boot.3.1.1: fails
boot.2.1: fails
boot.matt: fails
It is interesting how poor the decompression speed is in boot for the
later releases.
The boot that I have been using for netbooting for the last 5 years or
so is boot.matt which I believe was supplied by Matt Thomas when the
boot included in release was broken. It looks like this was about the
same time that the decompress code in standalone boot transitioned
from lib/libz to net/zlib. It is clearly 10 times faster than in
6.1.4.
ftp://ftp.netbsd.org/pub/NetBSD/misc/matt/boot
Is it possible that the caches are not enabled during the
decompresssion? That might be tough due to per-implementation
differences, but if there's room in the boot code for that stuff, that'd
make a HUGE difference.

-Dave
--
Dave McGuire, AK4HZ
New Kensington, PA
Anders Magnusson
2014-05-26 07:07:02 UTC
Permalink
Post by Dave McGuire
Post by Charles Dickman
After listen to Holm's travails trying to get NetBSD running on his
vax stations I did a little experimenting.
test vax: VAXstation 4000 Model 60 with 104MB RAM
boot server: Pentium Celeron PC running NetBSD 6.1.3, 100baseTX ethernet
booting NetBSD-6.1.4 uncompressed kernel (netbsd) from local root disk: 25s
booting NetBSD-6.1.4 compressed kernel (netbsd.gz) from local root disk: 17m 30s
$ time gzip -d netbsd.gz
11.65 real 6.83 user 1.49 sys
netbooting NetBSD-6.1.4 Install System compressed
boot.6.1.4: 21m 30s
boot.4.0.1: fails
boot.matt: 2m 3s
netbooting NetBSD-6.1.4 Install System uncompressed
boot.6.1.4: 54s
boot.5.1.2: 58s
boot.4.0.1: fails
boot.matt: 60s
netbooting NetBSD-1.5.3 Install System compressed
boot.mop.1.5.3: 1m 34s
boot.6.1.4: 3m 23s
boot.5.1.2: 3m 27s
boot.4.0.1: 1m 27s
boot.3.1.1: fails
boot.2.1: fails
boot.matt: fails
It is interesting how poor the decompression speed is in boot for the
later releases.
The boot that I have been using for netbooting for the last 5 years or
so is boot.matt which I believe was supplied by Matt Thomas when the
boot included in release was broken. It looks like this was about the
same time that the decompress code in standalone boot transitioned
from lib/libz to net/zlib. It is clearly 10 times faster than in
6.1.4.
ftp://ftp.netbsd.org/pub/NetBSD/misc/matt/boot
Is it possible that the caches are not enabled during the
decompresssion? That might be tough due to per-implementation
differences, but if there's room in the boot code for that stuff, that'd
make a HUGE difference.
The caches are not enabled during the boot process. The (somewhat
giant) slowliness must be depending on something else.

-- Ragge
David Brownlee
2014-05-26 18:55:34 UTC
Permalink
So if the speed change was between 4.x and 5.x.
From a quick eyeball scan of the differences in sys/arch/vax/boot
- RELOC=0x2f0000 -> 0x3f0000
- Setting nexaddr = bootrpb.adpphy if the latter is 0x20087800 in devopen.c
- caddr -> (char *)
- bcopy -> memcpy
- some small shuffling in start.S
- some licence updates

Of course the change could be outside there - libz and/or libsa spring
to mind. I wonder if the default compiler changes between 4 & 5.

It could even be a code change for which the new code is cache sensitive...

A brute force way to try to track this down would be to checkout and
build the bootblocks on a version midway between netbsd-4 and netbsd-5
and then bisect until the offending change is isolated...
Post by Dave McGuire
Post by Charles Dickman
After listen to Holm's travails trying to get NetBSD running on his
vax stations I did a little experimenting.
test vax: VAXstation 4000 Model 60 with 104MB RAM
boot server: Pentium Celeron PC running NetBSD 6.1.3, 100baseTX ethernet
booting NetBSD-6.1.4 uncompressed kernel (netbsd) from local root disk: 25s
booting NetBSD-6.1.4 compressed kernel (netbsd.gz) from local root disk: 17m 30s
$ time gzip -d netbsd.gz
11.65 real 6.83 user 1.49 sys
netbooting NetBSD-6.1.4 Install System compressed
boot.6.1.4: 21m 30s
boot.4.0.1: fails
boot.matt: 2m 3s
netbooting NetBSD-6.1.4 Install System uncompressed
boot.6.1.4: 54s
boot.5.1.2: 58s
boot.4.0.1: fails
boot.matt: 60s
netbooting NetBSD-1.5.3 Install System compressed
boot.mop.1.5.3: 1m 34s
boot.6.1.4: 3m 23s
boot.5.1.2: 3m 27s
boot.4.0.1: 1m 27s
boot.3.1.1: fails
boot.2.1: fails
boot.matt: fails
It is interesting how poor the decompression speed is in boot for the
later releases.
The boot that I have been using for netbooting for the last 5 years or
so is boot.matt which I believe was supplied by Matt Thomas when the
boot included in release was broken. It looks like this was about the
same time that the decompress code in standalone boot transitioned
from lib/libz to net/zlib. It is clearly 10 times faster than in
6.1.4.
ftp://ftp.netbsd.org/pub/NetBSD/misc/matt/boot
Is it possible that the caches are not enabled during the
decompresssion? That might be tough due to per-implementation
differences, but if there's room in the boot code for that stuff, that'd
make a HUGE difference.
The caches are not enabled during the boot process. The (somewhat giant)
slowliness must be depending on something else.
-- Ragge
Martin Husemann
2014-05-27 10:33:12 UTC
Permalink
Post by Charles Dickman
boot included in release was broken. It looks like this was about the
same time that the decompress code in standalone boot transitioned
from lib/libz to net/zlib. It is clearly 10 times faster than in
6.1.4.
Spot on - not the decompress code, but the crc32() function in libsa.
The optimized one in libkern uses large tables, but that makes it fast
on VAX. The tiny one in libsa computes it all on the fly - and seems to
be dog slow on VAX.

I'm testing a change to switch back to the libkern one for VAX.

Martin
Martin Husemann
2014-05-27 11:27:51 UTC
Permalink
Post by Martin Husemann
Spot on - not the decompress code, but the crc32() function in libsa.
The optimized one in libkern uses large tables, but that makes it fast
on VAX. The tiny one in libsa computes it all on the fly - and seems to
be dog slow on VAX.
I'm testing a change to switch back to the libkern one for VAX.
Unfortunately that didn't do the trick. But still we should have an
asm version of crc32 using the crc instruction. Anyone?

Martin
Anders Magnusson
2014-05-27 11:33:27 UTC
Permalink
Post by Martin Husemann
Post by Martin Husemann
Spot on - not the decompress code, but the crc32() function in libsa.
The optimized one in libkern uses large tables, but that makes it fast
on VAX. The tiny one in libsa computes it all on the fly - and seems to
be dog slow on VAX.
I'm testing a change to switch back to the libkern one for VAX.
Unfortunately that didn't do the trick. But still we should have an
asm version of crc32 using the crc instruction. Anyone?
Hm, is the crc function available on all CPUs? So that we do not end up
emulating it on some hardware?

-- R
Paul Koning
2014-05-27 13:47:40 UTC
Permalink
Post by Martin Husemann
Post by Martin Husemann
Spot on - not the decompress code, but the crc32() function in libsa.
The optimized one in libkern uses large tables, but that makes it fast
on VAX. The tiny one in libsa computes it all on the fly - and seems to
be dog slow on VAX.
I'm testing a change to switch back to the libkern one for VAX.
Unfortunately that didn't do the trick. But still we should have an
asm version of crc32 using the crc instruction. Anyone?
Hm, is the crc function available on all CPUs? So that we do not end up emulating it on some hardware?
The VAX architecture manual says it’s optional, and Appendix B seems to say that specifically it is omitted on the MicroVAXen. It might be interesting to use a fast large table version on those, and the machine instruction where it exists. (That assumes the large table version is actually fast; given machines with small caches that might not be true at least once the cache is enabled.)

paul
Mark Pizzolato - Info Comm
2014-05-27 11:37:00 UTC
Permalink
Post by Martin Husemann
Spot on - not the decompress code, but the crc32() function in libsa.
The optimized one in libkern uses large tables, but that makes it fast
on VAX. The tiny one in libsa computes it all on the fly - and seems
to be dog slow on VAX.
I'm testing a change to switch back to the libkern one for VAX.
Unfortunately that didn't do the trick. But still we should have an asm version
of crc32 using the crc instruction. Anyone?
Hmmm... DEC changed away from using the CRC instruction as VMS versions progressed. This drastically improved VMS Backup performance where CRC was the bottleneck.

Additionally, CRC as an instruction is emulated on all but the older hardware. I don't think this is happening early enough to have instruction emulation running....

- Mark

Loading...