How to Check and Monitor Hard Drive Health Using SMART in Linux

"Stability is the goal of IT operations, but anomalies are the daily reality."
Photo by tommao wang / Unsplash

πŸš€

Monitoring the health of your hard drives is crucial for preventing unexpected disk failures. In Linux, you can use smartctl, a tool from the smartmontools package, to check your drive's SMART (Self-Monitoring, Analysis, and Reporting Technology) status.

πŸ“Œ In this guide, you’ll learn:
βœ… How to install and use smartctl to monitor disk health
βœ… How to run self-tests and analyze SMART attributes
βœ… How to interpret SMART logs to detect failing drives early
βœ… Best practices to prevent data loss and ensure disk longevity


πŸ›‘ 1. Why Monitor Hard Drive Health?

Many users ignore disk health until it's too late. Hard drives often show warning signs before they fail, such as:

  • πŸ”Ή Slow performance
  • πŸ”Ή Frequent system crashes
  • πŸ”Ή Unusual clicking sounds
  • πŸ”Ή Files disappearing or becoming corrupted

By using SMART diagnostics, you can detect potential issues early and take preventive measures before a catastrophic failure occurs.


πŸ› οΈ 2. Installing smartmontools (If Not Installed)

Before using smartctl, ensure that smartmontools is installed.

πŸ”Ή Install on Debian/Ubuntu

sudo apt update
sudo apt install smartmontools -y

πŸ”Ή Install on CentOS/RHEL

sudo yum install smartmontools -y

πŸ“Œ Verify Installation:

smartctl --version

If you see version details, the installation was successful.


πŸ” 3. Checking If Your Disk Supports SMART

Before running diagnostics, check if your drive supports SMART:

sudo smartctl -i /dev/sdX

πŸ“Œ Replace /dev/sdX with your actual disk (e.g., /dev/sda).

βœ… Example Output:

Model Family:     Toshiba X300
Device Model:     TOSHIBA HDWD130
Serial Number:    123456789ABC
User Capacity:    3,000,000,000,000 bytes
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

πŸ“Œ If SMART is disabled, enable it using:

sudo smartctl -s on /dev/sdX

πŸ›‘οΈ 4. Checking Disk Health Status

To check whether your disk is healthy or failing, run:

sudo smartctl -H /dev/sdX

βœ… Possible Results:

  • PASSED – Your disk is in good health.
  • FAILED – Your disk may be failing, consider replacing it ASAP.
  • UNKNOWN – SMART is not enabled or not supported.

πŸ“Œ Example Output (Healthy Disk):

SMART overall-health self-assessment test result: PASSED

πŸ“Œ Example Output (Failing Disk):

SMART overall-health self-assessment test result: FAILED

If your disk fails the health check, back up your data immediately.


πŸ“Š 5. Retrieving Detailed SMART Information

For a deeper look at your drive’s health, use:

sudo smartctl -a /dev/sdX

This command provides:

  • πŸ“Œ Power-on hours (disk usage lifetime)
  • πŸ“Œ Current temperature (overheating risks)
  • πŸ“Œ Reallocated sectors (signs of a failing drive)
  • πŸ“Œ Error logs (previous disk failures)

βœ… Example: Checking Reallocated Sectors One of the most critical SMART attributes:

sudo smartctl -A /dev/sdX | grep Reallocated_Sector_Ct

πŸ“Œ If Reallocated_Sector_Ct is above 0, your drive may be failing.


πŸ› οΈ 6. Running a SMART Self-Test

To actively test your disk for errors, run:

sudo smartctl -t short /dev/sdX

βœ… Short test: (~2 minutes) detects surface-level issues.
βœ… Long test: (~20+ minutes) scans the entire drive:

sudo smartctl -t long /dev/sdX

πŸ“Œ Check the test results:

sudo smartctl -l selftest /dev/sdX

βœ… Example Output (No Errors):

Self-test execution status: 0% of test remaining
SMART Self-test log structure revision number 1
Num  Test_Description  Status                  Remaining  LifeTime(hours)
# 1  Short offline    Completed without error       00%      1234

πŸ“Œ If you see "Completed without error", your disk is fine.

βœ… Example Output (Errors Found):

# 1  Extended offline  Completed: read failure      90%      6789

πŸ“Œ If read failures appear, the disk is unreliable and should be replaced.


πŸ“Œ 7. Common SMART Attributes Explained

SMART Attribute Description Warning Signs
Reallocated_Sector_Ct Number of bad sectors remapped >0 means disk is deteriorating
Power_On_Hours Total hours the disk has been running Useful for lifespan estimation
Current_Pending_Sector Sectors waiting to be remapped A high value indicates imminent failure
Temperature_Celsius Disk temperature Over 50Β°C can shorten lifespan

πŸ“Œ Key Takeaway: If Reallocated_Sector_Ct or Current_Pending_Sector is increasing, your disk is at risk.


πŸ›‘οΈ 8. Preventing Hard Drive Failures

To extend disk lifespan and prevent sudden failures, follow these best practices:

βœ… Monitor SMART regularly:

smartctl -H /dev/sdX

βœ… Schedule SMART tests: Add to cron for weekly health checks:

echo "0 3 * * 1 root smartctl -H /dev/sdX" >> /etc/crontab

βœ… Keep your system cool: Check disk temperature:

sudo hddtemp /dev/sdX

πŸ“Œ Ideal range: 30Β°C - 45Β°C.

βœ… Back up important data: Automate backups using rsync:

rsync -av /home/user/ /mnt/backup/

βœ… Use separate partitions for system and data: Keeping /home on a separate partition can reduce the risk of total data loss.


πŸ“Š 9. Summary

Issue Solution
Check if SMART is enabled smartctl -i /dev/sdX
View overall disk health smartctl -H /dev/sdX
Get detailed disk attributes smartctl -a /dev/sdX
Run a self-test smartctl -t short /dev/sdX
Check for bad sectors `smartctl -A /dev/sdX
Monitor disk temperature hddtemp /dev/sdX
Prevent failures Monitor regularly, keep the system cool, back up data

πŸ’¬ Join the Discussion!

Do you regularly monitor your hard drive health?
Have you ever saved a failing disk using SMART diagnostics?

πŸ’¬ Share your experience in the comments below! πŸš€

πŸ‘‰ If you’re troubleshooting Linux disk issues, check out: How to Free Up Disk Space in Linux


Read more