How to create alerts with log data

Melori Arellano
By Melori Arellano

Last update on March 12, 2024

Advanced

Introduction

Loki stores your logs and only indexes labels for each log stream. Using Loki with Grafana Alerting is a powerful way to keep track of what’s happening in your environment. You can create metric alerts based on content in your log lines to notify your team. Even better, you can add label data from the log message directly into your alert notification.

In this tutorial, you’ll:

  • Create a conditional alert using Loki.
  • Create a custom alert message template.
  • Configure an email notification that includes part of the log message.

Before you begin

Create an alert

In these steps you’ll create an alert and define an expression to evaluate. These examples use a classic condition.

Create a Grafana-managed alert

  1. Navigate in Grafana to Alerting, then to Alert Rules and click + New alert rule.

  2. Choose Grafana Managed Alert to create an alert that uses expressions.

  3. Select your Loki datasource from the drop-down.

  4. Enter the alert query in the query editor, switch to code mode in the top right corner of the editor to paste the query below:

    sum by (message)(count_over_time({filename="/var/log/web_requests.log"} != `status=200` | pattern `<_> <message> duration<_>` [10m]))

    This query will count the number of log lines with a status code that is not 200 (OK), then sum the result set by message type using an instant query and the time interval indicated in brackets. It uses the logql pattern parser to add a new label called message that contains the level, method, url, and status from the log line.

    You can use the explain query toggle button for a full explanation of the query syntax. The optional log-generating script creates a sample log line similar to the one below:

    2023-04-22T02:49:32.562825+00:00 level=info method=GET url=test.com status=200 duration=171ms

    Note

    If you’re using your own logs, modify the logql query to match your own log message. Refer to the Loki docs to understand the pattern parser.
  5. Update the default expressions to match the values shown in the tables below:

    Box B - reduce expression

    FunctionSum
    InputA
    ModeStrict

    Box C - threshold expression

    InputB
    Expression valueIs above 5
    Alert conditionThis is the alert condition
  6. Expand Options and select Instant as the query type.

  7. Click preview to see a preview of the query result and alert evaluation.

  8. Expression B shows a table of labels and values returned. The message label captured the message string from the log line and the value shows the number of times that string occurred during the evaluation interval.

    labelsvalues
    message=level=info method=GET url=test.com status=50027
    message=level=info method=POST url=test.com status=5001
  9. Configure your alert evaluation behavior.

    • Choose a folder or use +add new to add a new folder for this alert.
    • Select an existing evaluation group from the drop-down or create a new one if this is your first alert.
    • Set the for value to 0s so the alert will fire instantly.
    • Leave Configure no data and error handling No data handling on the default values.
  10. Add an annotation that refers to labels and values from the query result in your alert notification.

    • Choose +Add new in the drop down and type the annotation name AlertValues into the blank box.
    • In the blank text box paste {{ $labels.message }} has returned an error status {{$values.B}} times.
  11. Click the Save and exit button at the top of the alert definition page.

Create a Loki managed alert

Loki managed alerts are stored and evaluated by Loki. They use LogQL for their expressions.

  1. Choose Mimir or Loki managed alert to create an alert using Loki.

  2. Select your Loki data source from the drop-down.

  3. The optional script will output a sample log line similar to this:

    2023-04-22T02:49:32.562825+00:00 level=info method=GET url=test.com status=200 duration=171ms
  4. Enter the alert query below if you’re using the sample logs or modify it for your own file path and condition.

    sum by (message)(count_over_time({filename="/var/log/web_requests.log"} != `status=200` | pattern `<_> <message> duration<_>` [5m])) > 5

    This query will search the interval period and count the number of log lines with a status code that is not 200 (OK), then sum the result set by message type. It uses the logql pattern parser to add a new label called message that captured the level, method, url, and status from the log line.

    For loki alerts, the interval needs to be specified in brackets instead of a variable and the alert threshold is added to the query. For this example, the interval is 5m and the alert will fire if there are more than 5 non-200 status messages.

  5. Click preview alert to see a preview of the labels and value. Hover over the i icon under the info column to see the query values.

  6. Add an annotation that refers to labels and values from the query result in your alert notification.

    • Choose +Add new in the drop down and type the annotation name AlertValues into the blank box.

    • In the blank text box, paste the following:

      {{ $labels.message }}  has returned an error status {{$values.B}} times
  7. Click Save rule and exit at the top of the alert screen.

Create a message template

  1. Add an alert message template and reference the annotation from your alert.

    • In Alerting under the Contact points tab:

      • Choose Grafana to use the built-in alertmanager

      • Click +Add template

      • Name the template mynotification

      • Add the snippet below to your alert template in the Content field. Notice that you will reference the annotation from your alert by name (.Annotations.AlertValues) to insert the annotation string into the alert notification:

        {{ define "myalert" }}
        [{{.Status}}] {{ .Labels.alertname }}
        {{ .Annotations.AlertValues }}
        {{ end }}
        {{ define "mymessage" }}
        {{ if gt (len .Alerts.Firing) 0 }}
            {{ len .Alerts.Firing }} firing:
            {{ range .Alerts.Firing }} {{ template "myalert" .}} {{ end }}
        {{ end }}
        {{ if gt (len .Alerts.Resolved) 0 }}
            {{ len .Alerts.Resolved }} resolved:
            {{ range .Alerts.Resolved }} {{ template "myalert" .}} {{ end }}
        {{ end }}
        {{ end }}
      • There are two sections to the notification template:

        1. The myalert template creates a single alert notification based on a specific alert.
        2. The mymessage template will find all of the grouped alerts that are firing and send them in a single notification.
      • Save the template.

  2. Add the template to your contact point

    1. Navigate to Alerts > Contact point and edit the email contact point. If you’re using Grafana Cloud, SMTP is already enabled. Otherwise, for local installations you’ll need to configure SMTP.

    2. Add an email address in the to field for the recipient.

    3. Expand Optional Email Settings and refer to the template by adding this to the body field:

      {{ template "mynotification" . }}

Tada! You’re finished! Grafana will email an alert with a message that looks similar to the one below. The format varies slightly depending on which type of alert you created - Loki or Grafana managed. The contents should be the same:

1 firing: [firing] LokiAlertTest1 Error message level=info method=GET url=test.com status=500 has occurred 12 times.

Optional: Use promtail with a sample log-generating script

This optional step uses a python script to generate the sample logs used in this tutorial to create alerts.

  1. Install promtail on your local machine and configure it to send logs to your Loki instance.
  2. Install Python3 on your local machine if needed.
  3. Copy the python script below and paste it into a new file on your local machine.

#!/bin/env python3

import datetime
import math
import random
import sys
import time


requests_per_second = 2
failure_rate = 0.05
get_post_ratio = 0.9
get_average_duration_ms = 500
post_average_duration_ms = 2000


while True:

    # Exponential distribution random value of average 1/lines_per_second.
    d = random.expovariate(requests_per_second)
    time.sleep(d)
    if random.random() < failure_rate:
        status = "500"
    else:
        status = "200"
    if random.random() < get_post_ratio:
        method = "GET"
        duration_ms = math.floor(random.expovariate(1/get_average_duration_ms))
    else:
        method = "POST"
        duration_ms = math.floor(random.expovariate(1/post_average_duration_ms))
    timestamp = datetime.datetime.now(tz=datetime.timezone.utc).isoformat()
    print(f"{timestamp} level=info method={method} url=/ status={status} duration={duration_ms}ms")
    sys.stdout.flush()
  1. Give the script executable permissions.

In a terminal window on linux-based systems run the command:


chmod 755 ./web-server-logs-simulator.py
  1. Run the script.
  • Use tee to direct the script output to the console and the specified file path. For example, if promtail is configured to monitor /var/log for .log files you can direct the script output to /var/log/web_requests.log file.

  • To avoid running the script with elevated permissions, create the log file manually and change the permissions for the output file only.

    sudo touch /var/log/web_requests.log
    chmod 755 /var/log/web_requests.log
    python3 ./web-server-logs-simulator.py | tee -a /var/log/web_requests.log
  1. Verify that the logs are showing up in Grafana’s Explore view:
  • Navigate to explore in Grafana.
  • Select the Loki datasource from the drop-down.
  • Check the toggle for builder | code in the top right corner of the query box and switch the query mode to builder if it’s not already selected.
  • Select the filename label from the drop-down and choose your web_requests.log file from the value drop-down.
  • Click Run Query.
  • You should see logs and a graph of log volume.

Troubleshooting the script

If you don’t see the sample logs in Explore:

  • Does the output file exist, check /var/log/web_requests.log to see if it contains logs.
  • If the file is empty, check that you followed the steps above to create the file and change the permissions.
  • If the file exists, verify that promtail is running and check that it is configured correctly.
  • In Grafana Explore, check that the time range is only for the last 5 minutes.

Optional: Use Docker compose to create the tutorial environment

These optional steps walk you through installing Grafana, Loki and Promtail with Docker compose. You’ll also configure a log-generating script that generates the sample logs used in this tutorial to create alerts.

Pre-requisites

  1. Start a command line from a directory of your choice.
  2. From that directory, get a docker-compose.yaml file to run Grafana, Loki, and Promtail:

Bash


wget https://raw.githubusercontent.com/grafana/loki/v2.8.0/production/docker-compose.yaml -O docker-compose.yaml

Windows Powershell


$client = new-object System.Net.WebClient
$client.DownloadFile("https://raw.githubusercontent.com/grafana/loki/v2.8.0/production/docker-compose.yaml",
"C:\Users\$Env:UserName\Desktop\docker-compose.yaml")
#downloads the file to the Desktop
  1. Run the container

docker compose up -d
  1. Create and edit a python file that will generate logs.

Bash


touch web-server-logs-simulator.py && nano web-server-logs-simulator.py

Windows Powershell


New-Item web-server-logs-simulator.py ; notepad web-server-logs-simulator.py
  1. Paste the following code into the file

#!/bin/env python3

import datetime
import math
import random
import sys
import time



requests_per_second = 2
failure_rate = 0.05
get_post_ratio = 0.9
get_average_duration_ms = 500
post_average_duration_ms = 2000


while True:

    # Exponential distribution random value of average 1/lines_per_second.
    d = random.expovariate(requests_per_second)
    time.sleep(d)
    if random.random() < failure_rate:
        status = "500"
    else:
        status = "200"
    if random.random() < get_post_ratio:
        method = "GET"
        duration_ms = math.floor(random.expovariate(1/get_average_duration_ms))
    else:
        method = "POST"
        duration_ms = math.floor(random.expovariate(1/post_average_duration_ms))
    timestamp = datetime.datetime.now(tz=datetime.timezone.utc).isoformat()
    print(f"{timestamp} level=info method={method} url=/ status={status} duration={duration_ms}ms")
    sys.stdout.flush()
  1. Execute the log-generating python script.

In a terminal window on linux-based systems run the command:


chmod 755 ./web-server-logs-simulator.py
  • Use tee to direct the script output to the console and the specified file path. For example, if promtail is configured to monitor /var/log for .log files you can direct the script output to /var/log/web_requests.log file.

  • To avoid running the script with elevated permissions, create the log file manually and change the permissions for the output file only.


sudo touch /var/log/web_requests.log
chmod 755 /var/log/web_requests.log
python3 ./web-server-logs-simulator.py | tee -a /var/log/web_requests.log

Running on Windows

Run Powershell as administrator


python ./web-server-logs-simulator.py | Tee-Object "C:\ProgramFiles\GrafanaLabs\grafana\var\log\web_requests.log"
  1. Verify that the logs are showing up in Grafana’s Explore view:
  • Navigate to explore in Grafana.
  • Select the Loki datasource from the drop-down.
  • Check the toggle for builder | code in the top right corner of the query box and switch the query mode to builder if it’s not already selected.
  • Select the filename label from the drop-down and choose your web_requests.log file from the value drop-down.
  • Click Run Query.
  • You should see logs and a graph of log volume.

Troubleshooting the script

If you don’t see the logs in Explore, check these things:

  • Does the output file exist, check /var/log/web_requests.log to see if it contains logs.
  • If the file is empty, check that you followed the steps above to create the file and change the permissions.
  • If the file exists, verify that promtail is running and check that it is configured correctly.
  • In Grafana Explore, check that the time range is only for the last 5 minutes.