Day-04 : Monitoring system metrics and processes with a bash script / TWS Bash Blaze Challenge

Day-04 : Monitoring system metrics and processes with a bash script / TWS Bash Blaze Challenge

Welcome to Day 4 of our Bash Scripting Challenge!! Today, we're going to craft user-friendly scripts to keep an eye on system metrics and processes. Get ready to dive into a world of functionalities, where we'll be showcasing CPU usage, memory usage, and available disk space!

But wait, there's more! We'll also be monitoring specific processes and giving users the power to restart them if they've stopped working!

Why monitoring processes is important?

Monitoring finds problems early and helps you use resources better. By monitoring services, you can ensure that critical components are up and running as expected. some important reasons of monitoring processes in bash scripting:

  1. Resource management:

    • Monitoring processes allow you to keep track of resource usage such as CPU, memory, and disk utilization.

    • By monitoring processes, you can identify resource-intensive processes that may be consuming excessive system resources and causing performance issues.

    • This information helps you optimize resource allocation and ensure efficient utilization of system resources.

  2. Performance analysis:

    • Monitoring processes enable you to analyze the performance of your system. By tracking the behavior of processes over time, you can identify patterns, bottlenecks, and areas for improvement.

    • For example, you can identify processes that consume a significant amount of CPU or memory and optimize them to reduce their impact on overall system performance.

  3. Troubleshooting and debugging:

    • When issues arise in a system, monitoring processes can provide valuable insights into the root cause.

    • By monitoring the processes running on your system, you can identify stuck processes, track their resource usage, and diagnose any underlying problems. This information is crucial for troubleshooting issues and determining the appropriate course of action.

  4. Security monitoring:

    • Monitoring processes can help you detect and respond to security-related events. By monitoring process activity, you can identify suspicious or unauthorized processes running on your system. Unusual or malicious processes can be a sign of a security breach or compromise.

    • Monitoring processes allow you to take proactive measures to investigate and mitigate potential security threats.

  5. Automation and scripting:

  • Monitoring processes are often a crucial component of automated tasks and scripts. For example, you might have a script that monitors the availability of certain critical processes and sends notifications or takes specific actions if any of those processes terminate unexpectedly.

  • Process monitoring enables you to automate system management tasks and ensure the smooth operation of your system.

    ⫸ TASKS:

    1. Process Selection:

      • The script should accept a command-line argument to specify the target process to monitor. For example: ./monitor_process.sh <process_name>.

            #!/bin/bash
        
            # Check if the correct number of command-line arguments is provided
            # $# is a special variable in Bash that holds the number of arguments passed to a script
            if [ $# -ne 1 ]; then
                echo "Usage: $0 <process_name>"
                exit 1
            fi
        
    2. Process Existence Check:

      • Implement a function that checks if the specified process is currently running on the system.

      • If the process is running, print a message indicating its presence.

            # Continuous loop to monitor and restart the process
            while true
             do
                # Check if the specified process is running
                # pgrep command is used to find or signal processes based on their names
                if pgrep -x "$process_name" >/dev/null; then
                    echo "Process '$process_name' is running."
                    # Reset restart attempts if the process is running
                    restart_attempts=0
        
    3. Restarting the Process:

      • If the process is not running, implement a function that attempts to restart the process automatically.

      • Print a message indicating the attempt to restart the process.

      • Ensure the script does not enter an infinite loop while restarting the process. Limit the number of restart attempts.

            else
              # Check if restart attempts are within the limit
              if [ "$restart_attempts" -lt "$max_attempts" ]; then
              echo "Process $process_name is not running. Attempting to restart..."
        
              # Add the command to restart the process here
              sudo systemctl restart "$process_name"
        
              # Increment the restart attempts counter
              ((restart_attempts++))  
              else
               echo "Maximum restart attempts reached. Please check the process manually."
        
    4. Automation:

      • Provide instructions on how to schedule the script to run at regular intervals using a cron job or any other appropriate scheduling method.

            # To schedule the script to run at regular intervals using a cron job:
            # Open the crontab configuration using: crontab -e.
            # Add a line to specify the interval and the script path, for example, to run every 2 minutes:
            */2 * * * * /path/to/monitor_process.sh <process_name>
        
    5. Documentation:

      • Include clear and concise comments in the script to explain its logic and functionality.

      • Write a separate document describing the purpose of the script, how to use it, and any specific considerations.

    6. Bonus:

      • Implement a notification mechanism (e.g., email, Slack message) to alert administrators if the process requires manual intervention after a certain number of restart attempts.

             # Send notification if restart attempts reach the threshold
             if [ "$restart_attempts" -ge "$notification_threshold" ]
              then
                 echo "Sending email notification to administrators..."
                 subject="Process Monitoring Alert"
                 message="Process '$process_name' failed to restart after $max_attempts attempts."           
                # Send email using the mail command
                echo -e "$message" | mail -s "$subject" <user@gmail.com>
        
        #!/bin/bash
        PROCESS_NAME="$1"
        # Maximum number of restart attempts
        MAX_RESTART_ATTEMPTS=3

        # function to check if the given service exists or not
        check_service_exists(){
            systemctl list-unit-files -q -all "$PROCESS_NAME.service"  > /dev/null 2>&1
        }

        # Function to check if the process is running
        is_process_active() {
            local process_status=$(systemctl is-active $PROCESS_NAME 2> /dev/null)
            if [[ "$process_status" == "active" ]]; then
                exit 0
            else
                exit 1
            fi
        }
        # Function to restart the process
        restart_process() {
            echo
            echo "Process '$PROCESS_NAME' is not running. Attempting to restart..."
            # Loop to check and restart the process
            for ((attempt=1; attempt<=$MAX_RESTART_ATTEMPTS; attempt++)); do
                if $(is_process_active); then
                    echo "Process '$PROCESS_NAME' is running properly now."
                    break
                else
                    if [ $attempt -lt $MAX_RESTART_ATTEMPTS ]; then
                        sudo systemctl restart $PROCESS_NAME
                        sleep 2
               else
  • Save this script as a process.sh and give the necessary permission

  • chmod +x process**.sh**

  • execute the script by ./process.sh <process name>


Part 2 - Monitoring Metrics Script with Sleep Mechanism

This project aims to create a Bash script that monitors system metrics like CPU usage, memory usage, and disk space usage. The script will provide a user-friendly interface, allow continuous monitoring with a specified sleep interval, and extend its capabilities to monitor specific services like Nginx.

Task 1:

Implementing Basic Metrics Monitoring

Implementing Basic Metrics Monitoring Write a Bash script that monitors the CPU usage, memory usage, and disk space usage of the system. The script should display these metrics in a clear and organized manner, allowing users to interpret the data easily. The script should use the top, free, and df commands to fetch the metrics.

#!/bin/bash

# Function to monitor CPU usage
function monitor_cpu() {
  cpu_usage=$(top -bn1 | awk '/Cpu\(s\):/ {print $2 + $4}')
  echo "CPU Usage: $cpu_usage%"
}

# Function to monitor memory usage
function monitor_memory() {
  memory_usage=$(free | awk '/Mem:/ {printf "%.2f", $3/$2 * 100}')
  echo "Memory Usage: $memory_usage%"
}

# Function to monitor disk usage
function monitor_disk() {
  disk_usage=$(df -h --output=pcent / | awk 'NR==2 {print $1}')
  echo "Disk Usage: $disk_usage"
}

# Function to monitor running processes
function monitor_processes() {
  process_count=$(ps aux | wc -l)
  echo "Running Processes: $process_count"
}

# Main function
function main() {
  echo "System Monitoring Script"

  while true; do
    echo "--------------------"
    echo "System Metrics - $(date)"
    monitor_cpu
    monitor_memory
    monitor_disk
    monitor_processes
    sleep 5
  done
}

# Execute the main function
main

Task 2: User-Friendly Interface

Enhance the script by providing a user-friendly interface that allows users to interact with the script through the terminal. Display a simple menu with options to view the metrics and an option to exit the script.


#!/bin/bash

# Continuous loop for the menu
while true; do
    clear
    echo "Monitoring Script Menu:"
    echo "1. Display System Metrics"
    echo "2. Monitor a Specific Service"
    echo "3. Exit"
    read -p "Enter your choice: " choice

Task 3: Continuous Monitoring with Sleep

Introduce a loop in the script to allow continuous monitoring until the user chooses to exit. After displaying the metrics, add a "sleep" mechanism to pause the monitoring for a specified interval before displaying the metrics again. Allow the user to specify the sleep interval.

 # Allow the user to specify the sleep interval
    read -p "Enter sleep interval in seconds (0 to exit): " sleep_interval

    # Exit the loop if sleep interval is 0
    if [ "$sleep_interval" -eq 0 ]; then
        echo "Exiting the script."
        exit 0
    fi

    # Sleep for the specified interval
    sleep "$sleep_interval"

Task 4: Monitoring a Specific Service

Extend the script to monitor a specific service Check if the service is running and display its status. If it is not running, provide an option for the user to start the service. Use the systemctl or appropriate command to check and control the service.

# Monitor a specific service
            clear
            read -p "Enter the name of the service to monitor: " service_name
            if systemctl is-active "$service_name" >/dev/null
         then
                echo "$service_name is running."
            else
                echo "$service_name is not running."
                read -p "Do you want to start $service_name? (y/n): " start_choice
                if [ "$start_choice" = "y" ]
                 then
                    sudo systemctl start "$service_name"
                    echo "Starting $service_name..."
                fi
            fi
            ;;

Task 5: Allow the User to Choose a Different Service

Modify the script to give the user the option to monitor a different service of their choice. Prompt the user to enter the name of the service they want to monitor, and display its status accordingly.

Task 6: Error Handling

Implement error handling in the script to handle scenarios where commands fail or inputs are invalid. Display meaningful error messages to guide users on what went wrong and how to fix it.

# Handle invalid input
echo "Invalid choice. Please select a valid option."
;;

Task 7: Documentation

Add comments within the script to explain the purpose of each function, variable, and section of the code. Provide a clear and concise README file explaining how to use the script, the available options, and the purpose of the script.

#!/bin/bash
# Function to display system metrics - CPU, Memory, Disk
display_metrics() {
    echo "----------------------------------------"
    echo "System Metrics - $(date)"
    echo "----------------------------------------"
    echo "CPU Usage:"
    top -b -n 1 | grep '%Cpu' | awk '{print "  User: " $2 "%, System: " $4 "%, Idle: " $8 "%"}'
    echo "----------------------------------------"
    echo "Memory Usage:"
    free -m | grep 'Mem:' | awk '{print "  Total: " $2 " MB, Used: " $3 " MB, Free: " $4 " MB"}'
    echo "----------------------------------------"
    echo "Disk Space Usage:"
    df -h | grep -E '^Filesystem|/dev/' | awk '{print "  " $1 ": " $5 " used, " $4 " free"}'
    echo "----------------------------------------"
}

# Function to display service status
display_service_status() {
    local service_name=$1
    echo "Service Status - $service_name"
    systemctl status "$service_name" | grep -E 'Active:|Main PID:'
    echo "----------------------------------------"
}

# Function to start a service
start_service() {
    local service_name=$1
    sudo systemctl start "$service_name"
}

# Function to check if a service is running
is_service_running() {
    local service_name=$1
    systemctl is-active "$service_name" >/dev/null 2>&1
}

# Main loop for continuous monitoring
while true; do
    clear
    echo "Choose an option:"
    echo "1. Display System Metrics"
    echo "2. Monitor a Specific Service"
    echo "3. Exit"
    read -p "Enter your choice: " choice

    case $choice in
        1)
            display_metrics
            ;;
        2)
            read -p "Enter the name of the service to monitor: " service_name
            display_service_status "$service_name"
            if ! is_service_running "$service_name"; then
                read -p "The service is not running. Do you want to start it? (y/n): " start_choice
                if [ "$start_choice" = "y" ] || [ "$start_choice" = "Y" ]; then
                    start_service "$service_name"
                    echo "Service started."
                fi
            fi
            ;;
        3)
            echo "Exiting the script."
            exit 0
            ;;
        *)
            echo "Invalid choice. Please try again."
            ;;
    esac

    read -p "Press Enter to continue..." -t 5  # Sleep for 5 seconds before displaying metrics again
done