Repairing Linux Disks and Recovering Lost Partitions

"Stability is the goal of IT operations, but anomalies are the daily reality."
Photo by The Chaffins / Unsplash

πŸ“Œ

Disk failures and lost partitions can lead to boot failures, data loss, and extended downtime in Linux systems. Whether due to accidental deletion, filesystem corruption, or hardware failure, knowing how to diagnose and recover disk partitions is critical for system administrators and DevOps engineers.

πŸ“Œ In this guide, you will learn:
βœ… How to diagnose disk failures & missing partitions
βœ… Step-by-step recovery methods using fdisk, parted, and testdisk
βœ… Enterprise case studies on real-world partition loss
βœ… Best practices to prevent disk failures and data loss

πŸ”œ Next in the series: Troubleshooting RAID Failures & Recovery Techniques


πŸ” 1. Understanding Linux Disk Failures & Lost Partitions

πŸ“Œ Common Causes of Disk & Partition Loss

Failure Type Cause Error Message
Accidental Deletion fdisk or parted used incorrectly Partition table missing
Filesystem Corruption Power failure, bad shutdown ext4-fs error (device sda1)
Disk Failure Bad sectors, aging storage device I/O error or disk read failure
Bootloader Issues Corrupt MBR or GPT error: no such partition

πŸ” 2. Diagnosing Disk & Partition Issues

πŸ“Œ Step 1: Check Disk Health & Errors

Before attempting recovery, check if the disk is physically damaged:

πŸ”Ή Verify disk connectivity and partitions:

lsblk
fdisk -l
parted -l

πŸ“Œ Expected Output Example:

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0   500G  0 disk 
β”œβ”€sda1   8:1    0   500M  0 part /boot
β”œβ”€sda2   8:2    0   50G   0 part /

πŸ’‘ If partitions do not appear, the partition table may be corrupted or deleted.

πŸ”Ή Check for bad sectors on the disk:

smartctl -a /dev/sda

πŸ“Œ Expected Output:

SMART overall-health self-assessment test result: PASSED

πŸ’‘ If the test fails, the disk may be physically damaged.


πŸ” 3. Repairing Disk & Partition Issues

πŸ’‘ Below are recovery methods for common disk partition failures.

πŸ› οΈ Fix 1: Recover Lost Partitions Using testdisk

If a partition was accidentally deleted, testdisk can help recover it.

1️⃣ Install testdisk:

sudo apt install testdisk   # Debian/Ubuntu
sudo yum install testdisk   # CentOS/RHEL

2️⃣ Launch testdisk and select the affected disk:

sudo testdisk

3️⃣ Choose "Analyze" β†’ Select "Quick Search"
4️⃣ Identify and restore the lost partition
5️⃣ Write changes and reboot:

reboot

πŸ“Œ Expected Outcome: If successful, the lost partition will be restored.


πŸ› οΈ Fix 2: Repair Corrupt Filesystems with fsck

If the partition exists but is unreadable, use fsck to repair filesystem corruption.

1️⃣ Run fsck in emergency mode:

fsck -y /dev/sda2

πŸ“Œ Command Breakdown:

  • fsck β†’ Checks and repairs filesystem errors
  • -y β†’ Automatically accepts fixes

2️⃣ Remount the filesystem:

mount -o remount,rw /dev/sda2

πŸ“Œ Expected Outcome: If successful, the filesystem will be repaired and mounted correctly.


πŸ› οΈ Fix 3: Restore Partition Table with parted

If the partition table is corrupted, use parted to recreate it.

1️⃣ Launch parted:

sudo parted /dev/sda

2️⃣ Check for partitions:

print

3️⃣ If the partition table is missing, recreate it:

mklabel gpt
mkpart primary ext4 1MiB 100%
quit

πŸ“Œ Expected Outcome: The partition table will be restored.


πŸ› οΈ Fix 4: Reinstall GRUB on a Corrupt Bootloader

If GRUB fails to detect partitions, reinstall it:

1️⃣ Boot into a Linux Live USB
2️⃣ Mount the root partition:

mount /dev/sda2 /mnt

3️⃣ Chroot into the system:

chroot /mnt

4️⃣ Reinstall GRUB:

grub2-install /dev/sda
grub2-mkconfig -o /boot/grub2/grub.cfg

5️⃣ Reboot the system:

exit
reboot

πŸ“Œ Expected Outcome: If successful, GRUB will detect all partitions.


πŸ” 4. Enterprise Case Study: Data Recovery After Partition Loss

πŸ“Œ Scenario:
A financial services company accidentally deleted a key partition on a production database server.

πŸ“Œ Symptoms:

  • The server refused to boot (No such partition)
  • Running fdisk -l showed missing partitions
  • The database was inaccessible, causing an outage

πŸ“Œ Investigation:

  • Engineers used a Live USB to inspect disk structure
  • testdisk detected deleted partitions
  • The partition table was corrupt, preventing boot

πŸ“Œ Solution:
πŸ”Ή Used testdisk to restore the deleted partition
πŸ”Ή Ran fsck -y /dev/sda2 to repair filesystem errors
πŸ”Ή Reinstalled GRUB to detect and boot into the restored partition

πŸ“Œ Lesson Learned:
⚠️ Always backup partition tables before making changes
⚠️ Use LVM snapshots for rapid recovery
⚠️ Automate disk monitoring with smartctl


πŸ” 5. Best Practices to Prevent Disk Failures

πŸ“Œ To minimize disk-related failures, follow these best practices:

βœ… Enable disk health monitoring (smartctl -a /dev/sda)
βœ… Schedule periodic fsck checks to prevent filesystem corruption
βœ… Keep multiple backups of partition tables (sfdisk -d /dev/sda > backup.txt)
βœ… Use RAID for redundancy in production environments
βœ… Automate disk failure alerts with monitoring tools (Prometheus, Zabbix)


πŸ“Œ Summary

Issue Cause Solution
Lost Partition Accidental deletion Restore using testdisk
Corrupt Filesystem Improper shutdown Run fsck -y /dev/sda2
Missing Partition Table MBR/GPT corruption Recreate using parted
Bootloader Not Detecting Partitions GRUB corruption Reinstall GRUB (grub2-install)

πŸ’‘ Want to learn more? Check out the next article: "Troubleshooting RAID Failures & Recovery Techniques" πŸš€


πŸ’¬ Join the Discussion!

πŸ’¬ Have you experienced partition loss or disk failure in production?
πŸ’‘ What strategies do you use to prevent data loss?
πŸš€ Share your experience in the comments!


πŸ“Œ Next Up: Troubleshooting RAID Failures & Recovery Techniques

πŸ”œ Continue to the next guide in this series!

πŸ“© Would you like a downloadable PDF version of this guide? Let me know! πŸš€

Read more