Day-04 : Monitoring system metrics and processes with a bash script / TWS Bash Blaze Challenge
Table of contents
- Why monitoring processes is important?
- Part 2 - Monitoring Metrics Script with Sleep Mechanism
- Task 1:
- Implementing Basic Metrics Monitoring
- Task 2: User-Friendly Interface
- Task 3: Continuous Monitoring with Sleep
- Task 4: Monitoring a Specific Service
- Task 5: Allow the User to Choose a Different Service
- Task 6: Error Handling
- Task 7: Documentation
Welcome to Day 4 of our Bash Scripting Challenge!! Today, we're going to craft user-friendly scripts to keep an eye on system metrics and processes. Get ready to dive into a world of functionalities, where we'll be showcasing CPU usage, memory usage, and available disk space!
But wait, there's more! We'll also be monitoring specific processes and giving users the power to restart them if they've stopped working!
Why monitoring processes is important?
Monitoring finds problems early and helps you use resources better. By monitoring services, you can ensure that critical components are up and running as expected. some important reasons of monitoring processes in bash scripting:
Resource management:
Monitoring processes allow you to keep track of resource usage such as CPU, memory, and disk utilization.
By monitoring processes, you can identify resource-intensive processes that may be consuming excessive system resources and causing performance issues.
This information helps you optimize resource allocation and ensure efficient utilization of system resources.
Performance analysis:
Monitoring processes enable you to analyze the performance of your system. By tracking the behavior of processes over time, you can identify patterns, bottlenecks, and areas for improvement.
For example, you can identify processes that consume a significant amount of CPU or memory and optimize them to reduce their impact on overall system performance.
Troubleshooting and debugging:
When issues arise in a system, monitoring processes can provide valuable insights into the root cause.
By monitoring the processes running on your system, you can identify stuck processes, track their resource usage, and diagnose any underlying problems. This information is crucial for troubleshooting issues and determining the appropriate course of action.
Security monitoring:
Monitoring processes can help you detect and respond to security-related events. By monitoring process activity, you can identify suspicious or unauthorized processes running on your system. Unusual or malicious processes can be a sign of a security breach or compromise.
Monitoring processes allow you to take proactive measures to investigate and mitigate potential security threats.
Automation and scripting:
Monitoring processes are often a crucial component of automated tasks and scripts. For example, you might have a script that monitors the availability of certain critical processes and sends notifications or takes specific actions if any of those processes terminate unexpectedly.
Process monitoring enables you to automate system management tasks and ensure the smooth operation of your system.
⫸ TASKS:
Process Selection:
The script should accept a command-line argument to specify the target process to monitor. For example:
./monitor_process.sh <process_name>
.#!/bin/bash # Check if the correct number of command-line arguments is provided # $# is a special variable in Bash that holds the number of arguments passed to a script if [ $# -ne 1 ]; then echo "Usage: $0 <process_name>" exit 1 fi
Process Existence Check:
Implement a function that checks if the specified process is currently running on the system.
If the process is running, print a message indicating its presence.
# Continuous loop to monitor and restart the process while true do # Check if the specified process is running # pgrep command is used to find or signal processes based on their names if pgrep -x "$process_name" >/dev/null; then echo "Process '$process_name' is running." # Reset restart attempts if the process is running restart_attempts=0
Restarting the Process:
If the process is not running, implement a function that attempts to restart the process automatically.
Print a message indicating the attempt to restart the process.
Ensure the script does not enter an infinite loop while restarting the process. Limit the number of restart attempts.
else # Check if restart attempts are within the limit if [ "$restart_attempts" -lt "$max_attempts" ]; then echo "Process $process_name is not running. Attempting to restart..." # Add the command to restart the process here sudo systemctl restart "$process_name" # Increment the restart attempts counter ((restart_attempts++)) else echo "Maximum restart attempts reached. Please check the process manually."
Automation:
Provide instructions on how to schedule the script to run at regular intervals using a cron job or any other appropriate scheduling method.
# To schedule the script to run at regular intervals using a cron job: # Open the crontab configuration using: crontab -e. # Add a line to specify the interval and the script path, for example, to run every 2 minutes: */2 * * * * /path/to/monitor_process.sh <process_name>
Documentation:
Include clear and concise comments in the script to explain its logic and functionality.
Write a separate document describing the purpose of the script, how to use it, and any specific considerations.
Bonus:
Implement a notification mechanism (e.g., email, Slack message) to alert administrators if the process requires manual intervention after a certain number of restart attempts.
# Send notification if restart attempts reach the threshold if [ "$restart_attempts" -ge "$notification_threshold" ] then echo "Sending email notification to administrators..." subject="Process Monitoring Alert" message="Process '$process_name' failed to restart after $max_attempts attempts." # Send email using the mail command echo -e "$message" | mail -s "$subject" <user@gmail.com>
#!/bin/bash
PROCESS_NAME="$1"
# Maximum number of restart attempts
MAX_RESTART_ATTEMPTS=3
# function to check if the given service exists or not
check_service_exists(){
systemctl list-unit-files -q -all "$PROCESS_NAME.service" > /dev/null 2>&1
}
# Function to check if the process is running
is_process_active() {
local process_status=$(systemctl is-active $PROCESS_NAME 2> /dev/null)
if [[ "$process_status" == "active" ]]; then
exit 0
else
exit 1
fi
}
# Function to restart the process
restart_process() {
echo
echo "Process '$PROCESS_NAME' is not running. Attempting to restart..."
# Loop to check and restart the process
for ((attempt=1; attempt<=$MAX_RESTART_ATTEMPTS; attempt++)); do
if $(is_process_active); then
echo "Process '$PROCESS_NAME' is running properly now."
break
else
if [ $attempt -lt $MAX_RESTART_ATTEMPTS ]; then
sudo systemctl restart $PROCESS_NAME
sleep 2
else
Save this script as a process.sh and give the necessary permission
chmod +x process**.sh**
execute the script by ./process.sh <process name>
Part 2 - Monitoring Metrics Script with Sleep Mechanism
This project aims to create a Bash script that monitors system metrics like CPU usage, memory usage, and disk space usage. The script will provide a user-friendly interface, allow continuous monitoring with a specified sleep interval, and extend its capabilities to monitor specific services like Nginx.
Task 1:
Implementing Basic Metrics Monitoring
Implementing Basic Metrics Monitoring Write a Bash script that monitors the CPU usage, memory usage, and disk space usage of the system. The script should display these metrics in a clear and organized manner, allowing users to interpret the data easily. The script should use the top, free, and df commands to fetch the metrics.
#!/bin/bash
# Function to monitor CPU usage
function monitor_cpu() {
cpu_usage=$(top -bn1 | awk '/Cpu\(s\):/ {print $2 + $4}')
echo "CPU Usage: $cpu_usage%"
}
# Function to monitor memory usage
function monitor_memory() {
memory_usage=$(free | awk '/Mem:/ {printf "%.2f", $3/$2 * 100}')
echo "Memory Usage: $memory_usage%"
}
# Function to monitor disk usage
function monitor_disk() {
disk_usage=$(df -h --output=pcent / | awk 'NR==2 {print $1}')
echo "Disk Usage: $disk_usage"
}
# Function to monitor running processes
function monitor_processes() {
process_count=$(ps aux | wc -l)
echo "Running Processes: $process_count"
}
# Main function
function main() {
echo "System Monitoring Script"
while true; do
echo "--------------------"
echo "System Metrics - $(date)"
monitor_cpu
monitor_memory
monitor_disk
monitor_processes
sleep 5
done
}
# Execute the main function
main
Task 2: User-Friendly Interface
Enhance the script by providing a user-friendly interface that allows users to interact with the script through the terminal. Display a simple menu with options to view the metrics and an option to exit the script.
#!/bin/bash
# Continuous loop for the menu
while true; do
clear
echo "Monitoring Script Menu:"
echo "1. Display System Metrics"
echo "2. Monitor a Specific Service"
echo "3. Exit"
read -p "Enter your choice: " choice
Task 3: Continuous Monitoring with Sleep
Introduce a loop in the script to allow continuous monitoring until the user chooses to exit. After displaying the metrics, add a "sleep" mechanism to pause the monitoring for a specified interval before displaying the metrics again. Allow the user to specify the sleep interval.
# Allow the user to specify the sleep interval
read -p "Enter sleep interval in seconds (0 to exit): " sleep_interval
# Exit the loop if sleep interval is 0
if [ "$sleep_interval" -eq 0 ]; then
echo "Exiting the script."
exit 0
fi
# Sleep for the specified interval
sleep "$sleep_interval"
Task 4: Monitoring a Specific Service
Extend the script to monitor a specific service Check if the service is running and display its status. If it is not running, provide an option for the user to start the service. Use the systemctl or appropriate command to check and control the service.
# Monitor a specific service
clear
read -p "Enter the name of the service to monitor: " service_name
if systemctl is-active "$service_name" >/dev/null
then
echo "$service_name is running."
else
echo "$service_name is not running."
read -p "Do you want to start $service_name? (y/n): " start_choice
if [ "$start_choice" = "y" ]
then
sudo systemctl start "$service_name"
echo "Starting $service_name..."
fi
fi
;;
Task 5: Allow the User to Choose a Different Service
Modify the script to give the user the option to monitor a different service of their choice. Prompt the user to enter the name of the service they want to monitor, and display its status accordingly.
Task 6: Error Handling
Implement error handling in the script to handle scenarios where commands fail or inputs are invalid. Display meaningful error messages to guide users on what went wrong and how to fix it.
# Handle invalid input
echo "Invalid choice. Please select a valid option."
;;
Task 7: Documentation
Add comments within the script to explain the purpose of each function, variable, and section of the code. Provide a clear and concise README file explaining how to use the script, the available options, and the purpose of the script.
#!/bin/bash
# Function to display system metrics - CPU, Memory, Disk
display_metrics() {
echo "----------------------------------------"
echo "System Metrics - $(date)"
echo "----------------------------------------"
echo "CPU Usage:"
top -b -n 1 | grep '%Cpu' | awk '{print " User: " $2 "%, System: " $4 "%, Idle: " $8 "%"}'
echo "----------------------------------------"
echo "Memory Usage:"
free -m | grep 'Mem:' | awk '{print " Total: " $2 " MB, Used: " $3 " MB, Free: " $4 " MB"}'
echo "----------------------------------------"
echo "Disk Space Usage:"
df -h | grep -E '^Filesystem|/dev/' | awk '{print " " $1 ": " $5 " used, " $4 " free"}'
echo "----------------------------------------"
}
# Function to display service status
display_service_status() {
local service_name=$1
echo "Service Status - $service_name"
systemctl status "$service_name" | grep -E 'Active:|Main PID:'
echo "----------------------------------------"
}
# Function to start a service
start_service() {
local service_name=$1
sudo systemctl start "$service_name"
}
# Function to check if a service is running
is_service_running() {
local service_name=$1
systemctl is-active "$service_name" >/dev/null 2>&1
}
# Main loop for continuous monitoring
while true; do
clear
echo "Choose an option:"
echo "1. Display System Metrics"
echo "2. Monitor a Specific Service"
echo "3. Exit"
read -p "Enter your choice: " choice
case $choice in
1)
display_metrics
;;
2)
read -p "Enter the name of the service to monitor: " service_name
display_service_status "$service_name"
if ! is_service_running "$service_name"; then
read -p "The service is not running. Do you want to start it? (y/n): " start_choice
if [ "$start_choice" = "y" ] || [ "$start_choice" = "Y" ]; then
start_service "$service_name"
echo "Service started."
fi
fi
;;
3)
echo "Exiting the script."
exit 0
;;
*)
echo "Invalid choice. Please try again."
;;
esac
read -p "Press Enter to continue..." -t 5 # Sleep for 5 seconds before displaying metrics again
done