Exploring Input Plugins_Mastering Elastic Stack-历史小说

书名：Mastering Elastic Stack
作者名：Yuvraj Gupta Ravi Kumar Gupta
本章字数：1871字
更新时间：2025-02-22 18:40:37

Exploring Input Plugins

An input plugin is used to get data from a source or multiple sources and to feed data into Logstash. It acts as the first section, which is required in the Logstash configuration file. Some of the input plugins are described in the following sections.

stdin

The stdin is a fairly simple plugin, which reads the data from a standard input. It reads the data we enter in the console, which then acts as an input to Logstash. This is mostly used to validate whether the installation of Logstash is done properly and whether we are able to access Logstash.

The basic configuration for stdin is as follows:

stdin { 
}

In this plugin, no additional settings are mandatory. If we use the preceding configuration, whatever we write in the console will be taken as input, without any additional parameters.

The additional configuration settings are as follows:

add_field: This is used to add a field to the incoming data.
codec: This is used to decode the incoming data and to interpret in what format the data is coming as input. Possible values of codec are all the codec plugins that are present.
enable_metric: It is used to get metric values from each plugin for reporting of a plugin.
id: It is used to provide a unique identifier to a plugin which can be used to track information of a plugin and can be used for debugging.
tags: This is used to add a tag to the incoming data, which can be used for processing. For similar types of events, data from a given source can be tagged for processing. It is mostly used with conditionals, which helps to perform various sets of actions as per the tags.
type: This is used to add a type field for the incoming data. It is very useful when we have data coming in from different sources wherein we can mention the type to differentiate the data among the sources. It is mainly used while filtering, wherein we can mention different logic for different sources of data while using the type field.

The value type and default values for the settings are as follows:

Configuration example:

input { 
        stdin { 
                add_field => {"current_time" => "%{@timestamp}" } 
                codec => "json" 
                tags => ["stdin-input"] 
                type => "stdin" 
                 } 
        }

In the preceding configuration, we have mentioned the codec as JSON; therefore, our input has to be in JSON format. When we enter anything in JSON, it will be parsed as per the key-value pair and will add a field called current_time, whose value will be that of a timestamp field, which is a metadata field. If we input in any other format apart from JSON, then it will give us _jsonparsefailure in tags, along with the stdin-input tag.

file

The file plugin is one of the most common plugins used for getting inputs from a file. It streams the data from a file or files similar to using tail -f, but with additional capabilities. It is a powerful plugin as it keeps track of any changes in the files, the last location from which the file was read, sends updated files with data, detects file rotations, and provides options to read a file from the beginning or end.

It keeps information about the current position from which it fetched data from the files in a file called sincedb. By default, it is placed in the $HOME directory, but the location can be changed using the sincedb_path setting. The frequency for tracking files can be changed using the sincedb_write_interval setting.

The basic configuration for file is as follows:

file { 
    path => ... 
}

In this plugin, only the path setting is mandatory.

path

This specifies the location of either the directory or the filename from where files will be read. You can provide either the name of the directory, the name of a particular file, or the filename patterns to find. You can specify single or multiple patterns/locations to find.

Note

The path plugin defined must be absolute and not relative.

The additional configuration settings are as follows:

add_field: This is used to add a field to the incoming data.
close_older: This is used to close the input file if it was last modified more than the specified settings. It frees the file I/O operations and it repeatedly checks the file for changes. If a file is being tailed and data is not coming and crosses the specified settings limit, then the file will be closed so that other files can be opened. It will be queued and reopened whenever the file is updated or modified.
codec: This is used to decode the incoming data and to interpret in what format the data is coming as input.
delimiter: This is used as a separator to identify different lines.
discover_interval: This is used to define the setting that determines how often the path will be expanded to search for new files created inside the path.
enable_metric: This is used to get metric values from each plugin for the reporting of a plugin.
exclude: This is used to exclude any file or file patterns that will not be read as an input.
id: This is used to provide a unique identifier to a plugin which can be used to track information of a plugin and can be used for debugging.
ignore_older: This is used to ignore reading the files that have not been modified since the time as mentioned in the settings. Even if the file is ignored, it is read as if files have been modified or updated with new content.
max_open_files: It is used to define the maximum number of files which can be opened at a time. If there are more files to be read than the specified number, then use close_older to close the files which have not been modified lately. Setting the value of this property as very high will cause OS performance issues.
sincedb_path: This is used to define the file location for writing sincedb files for tracking log files.
sincedb_write_interval: This specifies the time interval in which sincedb files will be written, to include the current position read for tracking multiple log files.
start_position: This is used to determine whether to read the file from the beginning or the end. It is initially used only when the file is being read for the first time and does not have an entry in the sincedb file. If your file contains older data, then you can read the file by specifying the start position as beginning.
stat_interval: This is used to check whether files have been modified or not. Increasing the interval will lead to fewer calls made by the system, but it will increase the interval to track changes in the log files.
tags: This is used to add a tag to the incoming data, which can be used for processing. It is mostly used with conditionals, which helps to perform various sets of actions, as per the tags.
type: This is used to add a type field for the incoming data. It is very useful when we have data coming in from different sources wherein we can mention the type to differentiate the data among the sources. It is mainly used while filtering, wherein we can mention different logic for different sources of data while using the type field.

The value type and default values for the settings are as follows:

Configuration example:

input { 
        file { 
                path => ["/var/log/elasticsearch/*","/var/messages/*.log"] 
                add_field => ["[location]", "%{longitude}" ] 
                add_field => ["[location]", "%{latitude}" ] 
                exclude => ["*.txt] 
                start_position => "beginning" 
                tags => ["file-input"] 
                type => "filelogs" 
                 } 
        }

In the preceding configuration, we have mentioned a couple of paths from which files will be read. It will read all the files present within the elasticsearch directory and read files ending with a LOG extension present within the messages folder. Assuming the file contains latitude and longitude, add a location field, which will contain the values of longitude and latitude (the format in which Kibana reads geoip). We are excluding all the text files. We are telling Logstash to read the files from the beginning as they already contain data we want to read.

The preceding configuration can be pided into two different files wherein path values will be different, based on which we can specify the different tags and the different types , as shown in the following code snippet:

input { 
         file { 
                path => "/var/log/elasticsearch/*" 
                tags => ["elasticsearch"] 
                type => "elasticsearch" 
                 } 
         file { 
                path => "/var/messages/*.log" 
                tags => ["messages"] 
                type => "message" 
                 } 
 
        }

udp

The udp plugin is one which reads the data as messages over the network using the UDP protocol. It lists the port from which it reads the event.

The basic configuration for udp is as follows:

udp { 
    port => ... 
}

In this plugin, only the port setting is mandatory:

port: This is the port on which Logstash will listen to the incoming events or messages.

Note

If using a port number less than 1024, then it will require root privileges to use.

The additional configuration settings are as follows:

add_field: This is used to add a field to the incoming data.
buffer_size: This is used to define the maximum packet size to read from the network.
codec: This is used to decode the incoming data and to interpret in what format the data is coming in as input.
enable_metric:This is used to get metric values from each plugin for reporting of a plugin.
host: This is used to specify the hostname Logstash will listen to.
id: This is used to provide a unique identifier to a plugin which can be used to track information of a plugin and can be used for debugging.
queue_size: This is used to specify the maximum number of unprocessed packets that can be held in memory. If the number of packets becomes greater than queue_size, then the data will be lost.
receive_buffer_bytes:This is used to specify the receiving buffer size in bytes. If not set, then the OS default value will be used.
tags: This is used to add a tag to the incoming data, which can be used for processing. It is mostly used with conditionals, which helps to perform various sets of actions, as per the tags.
type: This is used to add a type field for the incoming data. It is very useful when we have data coming in from different sources wherein we can mention the type to differentiate data among sources. It is mainly used while filtering, wherein we can mention different logic for different sources of data while using the type field.
workers: This is used to define the number of threads that will process the packets at once.

The value types and default values for the settings are as follows:

Configuration example:

input { 
        udp { 
                host => "192.168.0.6" 
                port => 5000 
                workers => 4 
                 } 
        }

In the preceding configuration, we have mentioned the host and port from which Logstash will read the events. Also, we specified workers as 4, which means there will be 4 threads which will process the packets in parallel.