Diagnosing and Fixing GRUB Boot Failures in Linux

"Stability is the goal of IT operations, but anomalies are the daily reality."
Photo by Joshua Tsu / Unsplash

πŸ“Œ

Nothing is more frustrating than rebooting your Linux server only to be greeted with a GRUB rescue prompt. Bootloader failures are among the most common startup issues, but they are also some of the most recoverableβ€”if you know what to do.

If you see errors like:
πŸ”΄ error: no such partition
πŸ”΄ grub rescue> prompt
πŸ”΄ kernel panic - not syncing: VFS: Unable to mount root fs

πŸ“Œ Don't panic! In this guide, you will learn:
βœ… How GRUB works and why it fails
βœ… The difference between grub> and grub rescue>
βœ… Step-by-step fixes for GRUB misconfigurations, missing partitions, and bootloader corruption
βœ… Enterprise-grade troubleshooting strategies for fast recovery
βœ… Best practices to prevent GRUB failures in production

πŸ”œ Next in the series: Recovering from Kernel Panics & Initramfs Issues


πŸ” 1. What is GRUB and Why Does It Fail?

GRUB (GRand Unified Bootloader) is the default bootloader for most Linux distributions. It loads the Linux kernel into memory and hands over control to the OS.

πŸ’‘ Common Reasons for GRUB Failures

Failure Type Cause Error Message
GRUB misconfiguration grub.cfg is missing or incorrect error: no such partition
Bootloader corruption GRUB is missing from the MBR grub rescue>
Partition issues The root partition was deleted or changed error: unknown filesystem
Kernel/initrd missing Kernel or initramfs is missing from /boot kernel panic - not syncing

πŸ” 2. Diagnosing GRUB Boot Failures

πŸ’‘ If your Linux machine fails to boot, check the type of GRUB prompt you see:

βœ… grub> prompt β†’ Partial GRUB working, just missing config
βœ… grub rescue> prompt β†’ GRUB cannot find critical boot files
βœ… Black screen, stuck at BIOS β†’ Possible bootloader corruption

πŸ“Œ Step 1: List Available Partitions

From the grub> or grub rescue> prompt, run:

ls

πŸ“Œ Output Example:

(hd0) (hd0,msdos1) (hd0,msdos2) (hd0,msdos3)

πŸ’‘ This command lists all available disks and partitions. Identify where Linux is installed.


πŸ“Œ Step 2: Locate the GRUB Configuration

Try to find the /boot/grub2/ directory:

ls (hd0,msdos2)/boot/grub2/

πŸ“Œ Expected Output:

grub.cfg  grubenv  themes  fonts

πŸ’‘ If you find grub.cfg, GRUB exists but isn’t loading correctly.


πŸ” 3. Fixing GRUB Boot Failures

πŸ› οΈ Fix 1: Temporary Boot into Linux

If you reach the grub> prompt but GRUB isn't loading properly, you can manually boot into Linux:

set root=(hd0,msdos2)
linux /vmlinuz root=/dev/sda2 ro
initrd /initramfs.img
boot

πŸ“Œ Command Breakdown:

  • set root=(hd0,msdos2) β†’ Set the partition where Linux is installed
  • linux /vmlinuz root=/dev/sda2 ro β†’ Load the kernel
  • initrd /initramfs.img β†’ Load the initramfs
  • boot β†’ Start the OS

βœ… If this works, login and run the following to permanently fix GRUB:

grub2-mkconfig -o /boot/grub2/grub.cfg
grub2-install /dev/sda
reboot

πŸ› οΈ Fix 2: Restore GRUB from a Live CD

If grub rescue> appears, you may need to boot from a Linux Live USB and restore GRUB.

1️⃣ Boot from a Linux Live USB
2️⃣ Identify the root partition:

fdisk -l

3️⃣ Mount the partition and chroot into the system:

mount /dev/sda2 /mnt
mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys
chroot /mnt

4️⃣ Reinstall GRUB:

grub2-install /dev/sda
grub2-mkconfig -o /boot/grub2/grub.cfg
exit
reboot

πŸ“Œ Expected Outcome: GRUB is restored, and the system boots normally.


πŸ› οΈ Fix 3: Check for Missing Kernel or Initramfs

If you get kernel panic - not syncing, check if the kernel and initramfs exist:

ls /boot

πŸ“Œ Expected Output:

vmlinuz-5.10.0 initramfs-5.10.0.img

πŸ’‘ If files are missing, regenerate initramfs and reinstall the kernel:

dracut -f
yum reinstall kernel
reboot

πŸ” 4. Enterprise Case Study: GRUB Failure in Production

πŸ“Œ Scenario:
A data center running RHEL 8 experienced sudden downtime on a critical server cluster. After a scheduled update, systems rebooted into GRUB rescue mode.

πŸ“Œ Investigation:

  • Engineers connected via iLO (HP) & DRAC (Dell)
  • Running ls in GRUB rescue showed missing partitions
  • Bootloader corruption detected after RAID firmware updates

πŸ“Œ Solution:
πŸ”Ή Engineers booted from a Live Rescue USB
πŸ”Ή Chrooted into the system and reinstalled GRUB
πŸ”Ή Fixed boot order in BIOS settings

πŸ“Œ Lesson Learned:
⚠️ Always test kernel and bootloader updates in staging environments first
⚠️ Keep offline recovery tools available for remote repair


πŸ” 5. Best Practices to Prevent GRUB Failures

πŸ“Œ To minimize downtime, follow these enterprise best practices:

βœ… Test bootloader updates in staging before deployment
βœ… Keep multiple kernel versions installed in case a rollback is needed
βœ… Regularly back up GRUB configuration (grub.cfg)
βœ… Use hardware-based remote management (IPMI/iLO/DRAC) for emergencies
βœ… Enable automatic boot monitoring via logs (journalctl -b -p err)


πŸ“Œ Summary

GRUB Failure Type Common Cause Solution
grub rescue> prompt GRUB bootloader missing Reinstall GRUB from Live USB
error: no such partition /boot partition missing Manually set root in GRUB
Kernel panic - not syncing Missing initramfs or kernel Rebuild initramfs (dracut -f)

πŸ’‘ Want to learn more? Check out the next article: "Recovering from Kernel Panics & Initramfs Issues" πŸš€


πŸ’¬ Join the Discussion!

πŸ’¬ Have you encountered GRUB boot failures in production?
πŸ’‘ What strategies do you use to quickly recover from boot issues?
πŸš€ Share your experience in the comments!

Read more