- Mastering Elastic Stack
- Yuvraj Gupta Ravi Kumar Gupta
- 1871字
- 2025-02-22 18:40:37
Exploring Input Plugins
An input plugin is used to get data from a source or multiple sources and to feed data into Logstash. It acts as the first section, which is required in the Logstash configuration file. Some of the input plugins are described in the following sections.
stdin
The stdin
is a fairly simple plugin, which reads the data from a standard input. It reads the data we enter in the console, which then acts as an input to Logstash. This is mostly used to validate whether the installation of Logstash is done properly and whether we are able to access Logstash.
The basic configuration for stdin
is as follows:
stdin { }
In this plugin, no additional settings are mandatory. If we use the preceding configuration, whatever we write in the console will be taken as input, without any additional parameters.
The additional configuration settings are as follows:
add_field
: This is used to add a field to the incoming data.codec
: This is used to decode the incoming data and to interpret in what format the data is coming as input. Possible values ofcodec
are all the codec plugins that are present.enable_metric
: It is used to get metric values from each plugin for reporting of a plugin.id
: It is used to provide a unique identifier to a plugin which can be used to track information of a plugin and can be used for debugging.tags
: This is used to add a tag to the incoming data, which can be used for processing. For similar types of events, data from a given source can be tagged for processing. It is mostly used with conditionals, which helps to perform various sets of actions as per thetags
.type
: This is used to add atype
field for the incoming data. It is very useful when we have data coming in from different sources wherein we can mention the type to differentiate the data among the sources. It is mainly used while filtering, wherein we can mention different logic for different sources of data while using thetype
field.
The value type and default values for the settings are as follows:
Configuration example:
input { stdin { add_field => {"current_time" => "%{@timestamp}" } codec => "json" tags => ["stdin-input"] type => "stdin" } }
In the preceding configuration, we have mentioned the codec as JSON; therefore, our input has to be in JSON format. When we enter anything in JSON, it will be parsed as per the key-value pair and will add a field called current_time
, whose value will be that of a timestamp
field, which is a metadata field. If we input in any other format apart from JSON, then it will give us _jsonparsefailure
in tags, along with the stdin-input
tag.
file
The file
plugin is one of the most common plugins used for getting inputs from a file. It streams the data from a file or files similar to using tail -f
, but with additional capabilities. It is a powerful plugin as it keeps track of any changes in the files, the last location from which the file was read, sends updated files with data, detects file rotations, and provides options to read a file from the beginning or end.
It keeps information about the current position from which it fetched data from the files in a file called sincedb
. By default, it is placed in the $HOME
directory, but the location can be changed using the sincedb_path
setting. The frequency for tracking files can be changed using the sincedb_write_interval
setting.
The basic configuration for file
is as follows:
file { path => ... }
In this plugin, only the path
setting is mandatory.
path
This specifies the location of either the directory or the filename from where files will be read. You can provide either the name of the directory, the name of a particular file, or the filename patterns to find. You can specify single or multiple patterns/locations to find.
Note
The path
plugin defined must be absolute and not relative.
The additional configuration settings are as follows:
add_field
: This is used to add a field to the incoming data.close_older
: This is used to close the input file if it was last modified more than the specified settings. It frees the file I/O operations and it repeatedly checks the file for changes. If a file is being tailed and data is not coming and crosses the specified settings limit, then the file will be closed so that other files can be opened. It will be queued and reopened whenever the file is updated or modified.codec
: This is used to decode the incoming data and to interpret in what format the data is coming as input.delimiter
: This is used as a separator to identify different lines.discover_interval
: This is used to define the setting that determines how often thepath
will be expanded to search for new files created inside thepath
.enable_metric
: This is used to get metric values from each plugin for the reporting of a plugin.exclude
: This is used to exclude any file or file patterns that will not be read as an input.id
: This is used to provide a unique identifier to a plugin which can be used to track information of a plugin and can be used for debugging.ignore_older
: This is used to ignore reading the files that have not been modified since the time as mentioned in the settings. Even if the file is ignored, it is read as if files have been modified or updated with new content.max_open_files
: It is used to define the maximum number of files which can be opened at a time. If there are more files to be read than the specified number, then useclose_older
to close the files which have not been modified lately. Setting the value of this property as very high will cause OS performance issues.sincedb_path
: This is used to define the file location for writingsincedb
files for tracking log files.sincedb_write_interval
: This specifies the time interval in whichsincedb
files will be written, to include the current position read for tracking multiple log files.start_position
: This is used to determine whether to read the file from the beginning or the end. It is initially used only when the file is being read for the first time and does not have an entry in thesincedb
file. If your file contains older data, then you can read the file by specifying the start position as beginning.stat_interval
: This is used to check whether files have been modified or not. Increasing the interval will lead to fewer calls made by the system, but it will increase the interval to track changes in the log files.tags
: This is used to add a tag to the incoming data, which can be used for processing. It is mostly used with conditionals, which helps to perform various sets of actions, as per thetags
.type
: This is used to add atype
field for the incoming data. It is very useful when we have data coming in from different sources wherein we can mention thetype
to differentiate the data among the sources. It is mainly used while filtering, wherein we can mention different logic for different sources of data while using thetype
field.
The value type and default values for the settings are as follows:
Configuration example:
input { file { path => ["/var/log/elasticsearch/*","/var/messages/*.log"] add_field => ["[location]", "%{longitude}" ] add_field => ["[location]", "%{latitude}" ] exclude => ["*.txt] start_position => "beginning" tags => ["file-input"] type => "filelogs" } }
In the preceding configuration, we have mentioned a couple of paths from which files will be read. It will read all the files present within the elasticsearch
directory and read files ending with a LOG extension present within the messages
folder. Assuming the file contains latitude
and longitude
, add a location
field, which will contain the values of longitude
and latitude
(the format in which Kibana reads geoip). We are excluding all the text files. We are telling Logstash to read the files from the beginning as they already contain data we want to read.
The preceding configuration can be pided into two different files wherein path
values will be different, based on which we can specify the different tags
and the different types , as shown in the following code snippet:
input { file { path => "/var/log/elasticsearch/*" tags => ["elasticsearch"] type => "elasticsearch" } file { path => "/var/messages/*.log" tags => ["messages"] type => "message" } }
udp
The udp
plugin is one which reads the data as messages over the network using the UDP protocol. It lists the port
from which it reads the event.
The basic configuration for udp
is as follows:
udp { port => ... }
In this plugin, only the port
setting is mandatory:
port
: This is the port on which Logstash will listen to the incoming events or messages.
Note
If using a port number less than 1024
, then it will require root privileges to use.
The additional configuration settings are as follows:
add_field
: This is used to add a field to the incoming data.buffer_size
: This is used to define the maximum packet size to read from the network.codec
: This is used to decode the incoming data and to interpret in what format the data is coming in as input.enable_metric
:This is used to get metric values from each plugin for reporting of a plugin.host
: This is used to specify the hostname Logstash will listen to.id
: This is used to provide a unique identifier to a plugin which can be used to track information of a plugin and can be used for debugging.queue_size
: This is used to specify the maximum number of unprocessed packets that can be held in memory. If the number of packets becomes greater thanqueue_size
, then the data will be lost.receive_buffer_bytes
:This is used to specify the receiving buffer size in bytes. If not set, then the OS default value will be used.tags
: This is used to add a tag to the incoming data, which can be used for processing. It is mostly used with conditionals, which helps to perform various sets of actions, as per thetags
.type
: This is used to add atype
field for the incoming data. It is very useful when we have data coming in from different sources wherein we can mention thetype
to differentiate data among sources. It is mainly used while filtering, wherein we can mention different logic for different sources of data while using thetype
field.workers
: This is used to define the number of threads that will process the packets at once.
The value types and default values for the settings are as follows:
Configuration example:
input { udp { host => "192.168.0.6" port => 5000 workers => 4 } }
In the preceding configuration, we have mentioned the host
and port
from which Logstash will read the events. Also, we specified workers
as 4
, which means there will be 4
threads which will process the packets in parallel.