SysAdmin Blog | Alexander Bochmann

vSphere host profiles, "A specified parameter was not correct: portgroupName"

Alexander Bochmann Friday 30 of August, 2024

It's not often you run into a VMware error message that has almost no search engine hits, which we managed to do recently...

When trying to apply an existing host profile to a new host added to a cluster, vSphere errored out:

A general system error occurred: Batch host remediation failed.
A specified parameter was not correct: portgroupName

We later learned that the same error would appear when trying to reapply the host profile to a machine that had been in the cluster for quite some time...

I'll spare you the details, but as it turns out this happened ... because we were configuring vmkernel ports on dvSwitch portgroups in the host profile, and I had changed one of those port group names.

Knowing that, the error message suddenly makes sense 🙄

So instead of changing back the names, we went forward and updated all our host profiles with the new designations, which also meant reacknowledging host profile customizations that were referencing these port groups.

force serial console on an HP apollo 715/50 workstation

Alexander Bochmann Sunday 05 of May, 2024

I made the error of setting the console path to graphics in the BOOT_ADMIN console of an old HP apollo 715/50 workstation with no monitor connected (or at least none that is able to detect the system's VGA signal).

On more recent HP 9000 hardware, it seems to be possible to reset the console path to serial by pressing the TOC button after powering on with no keyboard and monitor connected, but as the NetBSD/hppa FAQ (cache) sais, this has no effect on a 715.

As it turns out, there is another way though, and I haven't seen it documented anywhere: The 715/50 has a monitor selection switch for its onboard graphics adapter. It has one setting (both switches on SW1 down) that is labeled as 15" Color (Model 715/33 only) in the service manual.

With this setting, the system comes up with serial A as default with 9600/8/n/1, and it's possible to interrupt the boot process with <ESC>, select <a> to get into boot administration mode, and then change the console path back to serial from the BOOT_ADMIN> prompt:

PATH console rs232_a.9600.8.none
RESET

Screenshot from the HP 715 service manual showing options for the SW1 DIP switch

Windows 10 and WSL: Thousands of "HNS Container Networking" firewall rules

Alexander Bochmann Thursday 02 of May, 2024

My main Windows 10 PC, originally installed in 2018, recently has been having strange networking problems after powering on. For example, WSL would not start for minutes, and Wireguard took ages to activate.

I happened to find this general WSL troubleshooting article (cache) on the Microsoft knowledgebase, which, about half way down, mentions possible problems with "HNS Firewall rules" and has a Powershell oneliner to remove some of those rules.

No idea why this was the first thing I tried out of the many options on that page, but as it turns out, my system had over 12.000 HNS Container Networking rules:

PS C:\Users\bochmann> Get-NetFirewallRule -name "HNS Container Networking*" | measure | select Count
Count
-----
12580

This seemed like a problem since there's only about 300 other firewall rules, not to mention the command took quite some time to complete.

After testing on my notebook, which has a much more recent Windows install, it turns out that each reboot adds six of these rules, provided I shut down the system with a shutdown /s /t 0 instead of using the Windows menu? Which I usually do to force a "real" shutdown and thwart fast startup...

On the notebook, I just nuked all HNS firewall rules (not just those for UDP/53), to no apparent ill effect (needs to be run as Administrator):

wsl --shutdown
Get-NetFirewallRule -Name "HNS Container Networking*" | Remove-NetFirewallRule
hnsdiag delete all
Restart-Service -Force hns

...on the other PC, Powershell tells me that the command will be running for another four hours.

Now I only need to find out why this happens in the first place.

manually applying patches from GitHub

Alexander Bochmann Sunday 09 of July, 2023

I wasn't previously aware that you can take any commit ID on the GitHub web interface and just add .diff to the URL to get a plain context diff that can then be applied to code existing elsewhere with good old patch.

So it's not required to fiddle with git repos and forks and whatever to quickly apply a patch out of band (and then return to the upstream state later on with something like a git checkout --force ... that squashes all the local changes).

Case in point: It was not initially clear when the recent Mastodon patches would be applied to the Hometown fork, but .diffs from relevant commits on the Mastodon repo applied to the code on my disk with minimal fuzz. So it was possible to quickly get into a state where my version had the most important patches without breaking the connection to Hometown upstream, and after the security fixes had landed there, I just checked that version out over my local changes.

quick notes from installing OS/2 Warp 4 way too often

Alexander Bochmann Sunday 18 of June, 2023

I own an old Via EPIA board with a C3 CPU, and for some reason I thought casually installing OS/2 would be a good idea.

I used install media copies from WinWorld
- I have German language install media in original packaging, but turning up all the patch sets in German was too much effort, and a mixed-language OS is annoying
- for some reason, the updated partitioning tool from the OS/2 Warp 4.52 installer failed on the IDE-to-SDcard adapter I was using
- (after lots of tries I ditched that storage solution and used an actual IDE disk - somehow boot sector and partition table kept getting lost when using the SD adapter?)
- fdisk from OS/2 Warp 4 worked without problems?
in BIOS setup, configure "LBA" addressing scheme for the HDD
OS/2 Warp 4 install CD is not bootable, you need install floppies (and a floppy drive)
- the installer has no USB support, and USB floppy drives are not an option (even though elstel.org, linked below, claims that booting from an USB floppy should work)
- (maybe some Via BIOS bug or something?)
- downloaded patched install disks from elstel.org (cache) (those with Dani's IDE driver, last option in the list)
- note these are the install disks, you also need a boot disk (I, uh, don't remember which one I used?)
- also note that elstel.org links to patched bootable OS/2 Warp 4 install CDs (didn't try those)
since the CDROM isn't bootable, I ended up using an SCSI drive behind a LSI/NCR/Symbios Logic 53C810 PCI SCSI card
- there are many releases of the 53C810 driver, but symbios406.zip from os2site.com was the newest one that worked for me with the Warp 4 install disks (versions newer than 4.0.x will hang, older versions may report unknown firmware)
- the 53C810 driver doesn't fit on the first install disk
- do not delete unneeded driver files from the install disk, instead truncate them (also mentioned on elstel.org)
- copying additional drivers from the install disks will fail when files are missing (will updating snoop.lst help?)
do not use quick install, it will create a FAT partition (instead of HPFS)
2GB HPFS install partition is fine
the EPIA C3 board has a 10/100 Via Rhine II, drivers on os2site, copy to an empty disk to install when enabling the TCP/IP stack
Via Soundblaster emulation (when enabled in the BIOS) is a Soundblaster Pro
after installation, I used this patchset from archive.org (note installation order mentioned in the TEXT file that's an additional download)
- has FP17, TCPIP 4.3 and the MPTS updates, Java runtime (not JDK), Netscape Navigator, Scitech SNAP with the "free" code

Debian bullseyse / Devuan chimaera openssl minimum TLS version

Alexander Bochmann Saturday 23 of April, 2022

I recently spent way too much time trying to find out why my mail server wasn't able to send mail to a system that apparently only supported TLSv1. None of the TLS options in the sendmail configuration made any difference.

Things started to click only after I noticed that connecting to the system in question via openssl s_client produced the same error message:

> openssl s_client -connect mail.some.domain:25 -starttls smtp
CONNECTED(00000003)
139770261177664:error:1425F102:SSL routines:ssl_choose_client_version:unsupported 
protocol:../ssl/statem/statem_lib.c:1957:

As it turns out, /etc/ssl/openssl.cnf in current Debian / Devuan has the following global configuration settings:

[system_default_sect]
MinProtocol = TLSv1.2
CipherString = DEFAULT@SECLEVEL=2

So yeah, anything using openssl that doesn't explicitly override that configuration will not be able to make TLS connections to systems that don't support TLSv1.2...

Changing the settings to MinProtocol = TLSv1 made it possible to deliver my mail.

network interfaces renamed following Proxmox 7 upgrade

Alexander Bochmann Wednesday 24 of November, 2021

After upgrading my standalone Proxmox host from PVE 6 to 7, the interface names were suddenly changed back from "predictable" to the old ethX names. The setup is Proxmox on Debian, so when I initially set up the system, I manually installed Debian 10 first and then added the Proxmox 6 repositories and packages.

After some debugging it turned out there was an old systemd network configuration file that prevented systemd-udevd from starting up correctly:

systemd-udevd[xxxx]: /etc/systemd/network/99-default.link: No valid settings found in the [Match] section, ignoring file. To match all interfaces, add OriginalName=* in the [Match] section.

I currently have no idea where the file /etc/systemd/network/99-default.link originated from (it doesn't have a package owner after the upgrade), but apparently it contains an invalid syntax for the systemd-udevd in Debian Bullseye. Removing the file solved the problem, and I'm now back to the interface names in the ifupdown2 configuration used by Proxmox (I rebooted the system to be sure it comes up in the right way now).

WireGuard on the OpenPandora

Alexander Bochmann Sunday 02 of May, 2021

introduction

WireGuard is a VPN system built on modern cryptography that provides for a comparatively simple setup and uses UDP as a transport, with moderate overhead. It "just works" for road warrior setups where one end doesn't have a stable address.

The OpenPandora (cache) is an ARM Linux pocket computer, first released around 2010, that uses an ancient OpenEmbedded Ångström as base OS, with an Linux 3.2 kernel that has quite a few device-specific modules that never were upstreamed.

A couple of weeks ago, I decided to try to combine the two, provided I wouldn't turn out as too much of an effort. With that in mind, I looked at the wireguard-go userspace implementation instead of attempting the make the WireGuard linux-compat kernel module build against the outdated OpenPandora kernel.

Setting up a tunnel requires two WireGuard components:

a WireGuard protocol implementation (like the kernel module or wireguard-go)
a version of wireguard-tools that is used to provide a configuration to WireGuard

As for wireguard-go, I made a short attempt at trying to build golang on the Pandora itself, but hit the "too much effort" barrier pretty quickly. Fortunately, golang now provides for cross-compiling to supported platforms - but the Pandora is not one of those: The Pandora OS (SuperZaxxon) is built with the outdated "softfp" ARM binary ABI, which is backwards-compatible with ARM CPUs that don't have floating-point hardware, but actually is capable to use vfp and NEON in the backend, if supported by the compiler. The workaround here is to crosscompile with ARMv5 as target architecture, which produces a pure software floating point executable (that also works on softfp by design).

cross-building wireguard-go

I built wireguard-go on a Debian Buster host, and since buster-backports only provides go1.14, I couldn't use the most recent version (which currently requires go1.16): Went with wireguard-go 0.0.20210212 instead.

After checking out or unpacking the sources, building a binary is a simple matter of running make with the appropriate environment parameters:

env GOOS=linux GOARCH=arm GOARM=5 make

Just copy the resulting wireguard-go over to /usr/local/bin on your Pandora and make it executable.

compiling wireguard-tools

wireguard-tools has only a small set of build dependencies, the most important of which unfortunately isn't even mentioned: On Linux, you need a copy of the kernel headers that roughly matches the destination kernel.

Turns out that SuperZaxxon only ships the include files for the initial kernel (2.6), but not those for the last available kernel build. Also Linux 2.6 apparently doesn't provide some required functions, so my first attempt failed.

I ended up downloading the latest 3.2 kernel sources from the OpenPandora git.

When I compile software on the Pandora, I usually first try to use the cdevtools PND - it has an older gcc, but is generally more leightweight than the other option (Code::Blocks). So I start cdevtools, make a src/wireguard directory, and then download and unpack both wireguard-tools and the Pandora kernel sources in there.

In the wireguard-tools directory, go to src/ and run something like this:

env CFLAGS="-I`pwd`/../../pandora-kernel-pandora-3.2-c4c68a4/include -Os -mtune=cortex-a8 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -pipe" make

...and then, to install the resulting programs below /usr/local:

sudo env PREFIX=/usr/local WITH_WGQUICK=yes WITH_SYSTEMDUNITS=no make install

Pandora caveats

SuperZaxxon does not autoload the tun module, so /dev/net/tun doesn't exist. (Ironically, it would be loaded if /dev/net/tun did exist and then something tried to access the device...)
wg-quick uses some fancy bash i/o redirection which requires /dev/fd. Which is not there on the Pandora either, but it's easy to create, since it's just a symlink to /proc/self/fd.
Do not use a VPN interface name that starts with "w" (like the default of wg0)! It triggers bugs in other scripts on the OpenPandora, for example loading of the WiFi firmware will fail after a resume from sleep.
Add /usr/local/bin to the PATH of root so the binaries are found in their directory.
A couple of the advanced wg-quick functions fail, mostly due to missing or outdated tools. One that I encountered was changing nameservers, but I assume anything the makes changes to the firewall configuration will be broken too. I did not try calling external commands from the wg-quick config file yet (which might serve as a workaround for some uses).
Basic setup of a v4 tunnel with several routes has been tested successfully.
IPv6 is completely untested.

I wrote a small wrapper script that creates a suitable environment for wg-quick invocation that's included as /usr/local/bin/wg-pandora in the tar file below:

#!/bin/sh

if [ `id -u` -ne "0" ]; then
  echo "[!] script needs to be run as root, use su oder sudo"
  exit 1
fi

if [ "$1" == "" ]; then
  echo "[!] please use the VPN interface name as parameter"
  echo "NOTE: do not use any device names starting with \"w...\" -"
  echo "      it will prevent Wifi reconfiguration on SuperZaxxon."
  exit 1
fi

if [ ! -f /etc/wireguard/$1.conf ]; then
  echo "[!] please create /etc/wireguard/$1.conf with a valid wg-quick configuration"
  exit 1
fi

if [ ! -e /dev/net/tun ]; then
  echo "[+] load tun kernel module"
  modprobe tun
fi

if [ ! -e /dev/fd ]; then
  echo "[+] create missing /dev/fd symlink"
  ln -s /proc/self/fd /dev/fd
fi

echo "[+] launching wg-quick"
/usr/local/bin/wg-quick up "$1"

exit 0

installation

Download wireguard-pandora-20210502.tar.gz and unpack to the root directory:
```
tar -C/ -xpf wireguard-pandora-20210502.tar.gz
```
Create a wg-quick (cache) configuration in /etc/wireguard (man pages are included in the download, but man is not installed on the Pandora by default).
Run /usr/local/bin/wg-pandora <if-name>. (Remember the note about interface names.)
You will need an existing WireGuard endpoint to connect to ;)
Manual setup using wg (see WireGuard quickstart) is also possible, as soon as the tun module has been loaded and wireguard-go is running.
There's a discussion thread over on the OpenPandora forums.

Apache httpd, reverse proxy, and caching

Alexander Bochmann Tuesday 24 of November, 2020

There's tons of guides out there on either how to set up Apache httpd as a reverse proxy, or how to enable (disk) caching for content being served.

The web has surprisingly little information on how to combine both in a working manner, and to have Apache cache content that's being retrieved from a proxied backend.

Just using the default configuration and then dropping something like a CacheEnable disk into the <Location ...> that holds your proxy rules will not work: Nothing ever is written to the cache directory.

With debug logging you see either nothing at all or maybe a quick succession of AH00750: Adding CACHE_SAVE filter .. and AH00751: Adding CACHE_REMOVE_URL filter ... messages in the error.log

So what's up? Likely your configuration is entirely correct, but you're missing one statement:

CacheQuickHandler off

It seems that with the default of CacheQuickHandler being enabled, proxied content never hits the quick handler phase that allows it to be processed for caching.

When CacheQuickHandler is disabled, everything just drops into place, though some fine tuning might be required.

The current configuration for my use case of caching media for my Mastodon instance that's being retrieved from a horribly sluggish Minio backend looks like this:

<IfModule mod_cache_disk.c>
        CacheQuickHandler off
        CacheRoot /var/cache/apache2/mod_cache_disk
        CacheMaxFileSize 10000000
        CacheDirLevels 2
        CacheDirLength 1
        CacheLock off
        CacheIgnoreCacheControl On
        CacheIgnoreQueryString On
        CacheStoreNoStore On
        CacheIgnoreHeaders Set-Cookie X-Amz-Request-Id
</IfModule>

...and then:

<Location "/">
        Require all granted
        ProxyPass http://<backend-address>:9000/
        ProxyPassReverse http://<backend-address>:9000/
        <IfModule mod_cache_disk.c>
               CacheEnable disk
        </IfModule>
</Location>

25Gbit ethernet is complicated...

Alexander Bochmann Monday 10 of February, 2020

We just spent about a week trying to put a bunch of systems into production that had been ordered with 25Gbit fiber interfaces. We had planned to collect those on two of our Arista 7050CX3, using 100GBit QSFP28 in 4 * 25GBit mode and MPO breakout cables to 4 * LC for the 25Gbit SFP28 end. So we cable everything up, configure our LACP channels on both ends, and ... nothing. All of the links stay down.

They do show a signal on the transciever though (at least on the switch side where we can look at optics information). An show interfaces et10/1-4 status says "notconnect" for all four subinterfaces. An show interfaces et10/1-4 phy displays an "errDisabled" on the phy layer. We are stumped.

Over the course of the next few days, we try several changes, to no avail. Directly connecting two Arista switches works though, as does a direct connection between two end hosts. We even swap everything down to 40G on the Arista side and 10G SFP+ in the end hosts, which turns out perfectly fine (so at least our cabling is correct).

At this point, support for the appliances we're trying to connect gives us credentials for shell access. It's a non-root user on what turns out as a normal Linux system, but at least I can see that it comes with QLogic Corp. FastLinQ QL45000 Series 25GbE controllers (for a short moment we had suspected we had the wrong controllers), and I can get some information by using ethtool. One of those is that ethtool reports the host interfaces as "25GBASE-KR", which tells me nothing. Someone on IRC mentions that "-KR" denotes an "electrical backplane" connection. Armed with those two small bits of information, I hit the search engines, and find this useful table in a document on the Marvell web site:
D4cbaae36bf038ba

It's accompanied by the following text:

The –S short reach interfaces aim to support high-quality cables without
ForwardError Correction (FEC) to minimize latency. Full reach interfaces
aim to support the lowest possible cable or backplane cost and the longest
possible reach, which do require the use of FEC. FEC options include
BASE-R FEC (also referred to as Fire Code) and RS-FEC (also referred to
as Reed-Solomon).

There's two different, incompatible, error correction mechanisms on the bitstream layer of 25Gbit interfaces!? I didn't know that.

Since the default on Arista switches seems to be Reed-Solomon, and I don't have any way to configure a detail like that on the end host, we change the configuration on the Arista side:

interface et10/1-4
error-correction encoding fire-code

That's all. We do the same for three other interface groups, and all links work just excpected (except for one that apparently has a bad transciever in the end host). I call off the screen-sharing session with Arista support planned for five minutes later.

Blog Actions

introduction

cross-building wireguard-go

compiling wireguard-tools

Pandora caveats

installation