Wed Jun 15 11:20:14 +07 2022

Rebuilding the firewall

After the power failure of May 10th, 2022, the firewall would not restart.

In this report, I want to summarize the steps that I have taken to get the firewall working again.

Hard disk failure

One of the hard disk had failed. Because the disks are in a RAID 1 array, it was only a matter of replacing the failed disk with one saved from a other machine. The RAID will take care of the mirroring of the nee disk.

The steps to restore the file systems

When trying to boot the firewall, it would hang at a message about Rebuilding LDAP database. My diagnostic was that some file had been corrupted by the power failure and needed to be restored.

In order to restore the file-systems from Amanda, I need to boot from a live system that offers:

  • mounting the Linux ext partitions;
  • connecting to Amanda server with ssh;
  • extracting the files with tar.

Booting the machine from a USB key

That is the part that took me the longest to solve.

The server would boot from a Freebased installation USB, I could launch a live shell, but the partitions on the hard disk were faulty and if the live shell can mount an ext partition, it is missing the fsck for that type of partition.

To mount an ext partition:

  mount -t ext2fs /dev/something /somewhere

Any other type of USB key would not work. I tried Ubuntu, UBCD, Hiren, Vyatta...

I finally resolved to install Ventoy (ventoy.net) on a USB key; and on that Ventoy key I copied Hiren's BootCD 12.0. Note that Ventoy needs a machine under Windows to first create the USB key.

All the files that I used are under /home/pc-application/ZeroShell

For the next step, I also copied ZeroShell 3.9.5 on the Ventoy key.

Restore the file systems

Booting Ventoy key, I could start Hiren BooCD and choose Mini Linux.

In that mini Linux environment, I could fsck the damaged partitions and restore the file systems.

There are 3 partitions on the disk:

  1. /dev/sda1 10 GB, that is mounted on /boot
  2. /dev/sda2 50 GB, that is mounted on /cdrom
  3. /dev/sda3 75 GB, that is mounted on /DB

Note that / is a RAM disk.

Restoring is just a matter of recovering the data from Amanda, mounting each partition to /mnt on the mini Linux and issuing commands like:

  ssh root@192.41.170.11 tar cf - the recovered partition|(cd /mnt; tar xfBp -)

Note that root must be allowed to connect to Amanda.

Rebooting the firewall

When the 3 partitions had been restored, I tried to reboot the firewall, but the error about rebuilding LDAP persisted.

At the end of my rope, I tried to launch ZeroShell 3.9.5 from the Ventoy key. I was not sure of what to expect, but I knew that ZeroShell could run without being copied to the hard disk. That newer version of ZeroShell managed to rebuild LDAP and finished booting. It was just a matter of rebooting the firewall from its original disk.

So far, I don't know why that Dell PowerEdge R200 is so temperamental about the type of USB it can boot from.

I am not sure what LDAP is being used for by ZeroShell because U think I have disabled it in the configuration. I have added a script that dumps LDAP database every two hours.

It seems that the restoring of the file systems has messed up some of the group ownership of some files and directories. The only real effect I could see is in Amanda in /Database/Links/opt/libexec/amanda/.

When booted from the USB key in ZeroShell 3.9.5, the firewall did work properly, but I had no access through SSH or the web interface. That is why I reverted to the previous version.


Posted by Olivier | Permanent link | File under: administration, firewall, backup