_R_R_D_C_R_E_A_T_E(1)                        rrdtool                       _R_R_D_C_R_E_A_T_E(1)

NNAAMMEE
       rrdcreate - Set up a new Round Robin Database

SSYYNNOOPPSSIISS
       rrrrddttooooll ccrreeaattee _f_i_l_e_n_a_m_e [----ssttaarrtt|--bb _s_t_a_r_t _t_i_m_e] [----sstteepp|--ss _s_t_e_p]
       [----tteemmppllaattee|--tt _t_e_m_p_l_a_t_e_-_f_i_l_e] [----ssoouurrccee|--rr _s_o_u_r_c_e_-_f_i_l_e]
       [----nnoo--oovveerrwwrriittee|--OO] [----ddaaeemmoonn|--dd _a_d_d_r_e_s_s] [DDSS::_d_s_-_n_a_m_e[==_m_a_p_p_e_d_-_d_s_-
       _n_a_m_e[[[_s_o_u_r_c_e_-_i_n_d_e_x]]]]::_D_S_T::_d_s_t _a_r_g_u_m_e_n_t_s] [RRRRAA::_C_F::_c_f _a_r_g_u_m_e_n_t_s]

DDEESSCCRRIIPPTTIIOONN
       The create function of RRDtool lets you set up new Round Robin Database
       (RRRRDD) files.  The file is created at its final, full size and filled
       with _*_U_N_K_N_O_W_N_* data, unless one or more source RRRRDD files have been
       specified and they hold suitable data to "pre-fill" the new RRRRDD file.

   _ff_ii_ll_ee_nn_aa_mm_ee
       The name of the RRRRDD you want to create. RRRRDD files should end with the
       extension _._r_r_d. However, RRRRDDttooooll will accept any filename.

   ----ssttaarrtt||--bb _ss_tt_aa_rr_tt _tt_ii_mm_ee ((ddeeffaauulltt:: nnooww -- 1100ss))
       Specifies the time in seconds since 1970-01-01 UTC when the first value
       should be added to the RRRRDD. RRRRDDttooooll will not accept any data timed
       before or at the time specified.

       See also "AT-STYLE TIME SPECIFICATION" in rrdfetch for other ways to
       specify time.

       If one or more source files is used to pre-fill the new RRRRDD, the
       ----ssttaarrtt option may be omitted. In that case, the latest update time
       among all source files will be used as the last update time of the new
       RRRRDD file, effectively setting the start time.

   ----sstteepp||--ss _ss_tt_ee_pp ((ddeeffaauulltt:: 330000 sseeccoonnddss))
       Specifies the base interval in seconds with which data will be fed into
       the RRRRDD.  A scaling factor may be present as a suffix to the integer;
       see "STEP, HEARTBEAT, and Rows As Durations".

   ----nnoo--oovveerrwwrriittee||--OO
       Do not clobber an existing file of the same name.

   ----ddaaeemmoonn||--dd _aa_dd_dd_rr_ee_ss_ss
       Address of the rrdcached daemon.  For a list of accepted formats, see
       the --ll option in the rrdcached manual.

        rrdtool create --daemon unix:/var/run/rrdcached.sock /var/lib/rrd/foo.rrd I<other options>

   [[----tteemmppllaattee||--tt _tt_ee_mm_pp_ll_aa_tt_ee_--_ff_ii_ll_ee]]
       Specifies a template RRRRDD file to take step, DS and RRA definitions
       from. This allows one to base the structure of a new file on some
       existing file. The data of the template file is NOT used for pre-
       filling, but it is possible to specify the same file as a source file
       (see below).

       Additional DS and RRA definitions are permitted, and will be added to
       those taken from the template.

   ----ssoouurrccee||--rr _ss_oo_uu_rr_cc_ee_--_ff_ii_ll_ee
       One or more source RRRRDD files may be named on the command line. Data
       from these source files will be used to prefill the created RRRRDD file.
       The output file and one source file may refer to the same file name.
       This will effectively replace the source file with the new RRRRDD file.
       While there is the danger to loose the source file because it gets
       replaced, there is no danger that the source and the new file may be
       "garbled" together at any point in time, because the new file will
       always be created as a temporary file first and will only be moved to
       its final destination once it has been written in its entirety.

       Prefilling is done by matching up DS names, RRAs and consolidation
       functions and choosing the best available data resolution when doing
       so. Prefilling may not be mathematically correct in all cases (e.g. if
       resolutions have to change due to changed stepping of the target RRD
       and old and new resolutions do not match up with old/new bin boundaries
       in RRAs).

       In other words: A best effort is made to preserve data during
       prefilling.  Also, pre-filling of RRAs may only be possible for certain
       kinds of DS types. Prefilling may also have strange effects on Holt-
       Winters forecasting RRAs. In other words: there is no guarantee for
       data-correctness.

       When "pre-filling" a RRRRDD file, the structure of the new file must be
       specified as usual using DS and RRA specifications as outlined below.
       Data will be taken from source files based on DS names and types and in
       the order the source files are specified in. Data sources with the same
       name from different source files will be combined to form a new data
       source. Generally, for any point in time the new RRRRDD file will cover
       after its creation, data from only one source file will have been used
       for pre-filling. However, data from multiple sources may be combined if
       it refers to different times or an earlier named source file holds
       unknown data for a time where a later one holds known data.

       If this automatic data selection is not desired, the DS syntax allows
       one to specify a mapping of target and source data sources for
       prefilling. This syntax allows one to rename data sources and to
       restrict prefilling for a DS to only use data from a single source
       file.

       Prefilling currently only works reliably for RRAs using one of the
       classic consolidation functions, that is one of: AVERAGE, MIN, MAX,
       LAST. It might also currently have problems with COMPUTE data sources.

       Note that the act of prefilling during ccrreeaattee is similar to a lot of
       the operations available via the ttuunnee command, but using ccrreeaattee syntax.

   DDSS::_dd_ss_--_nn_aa_mm_ee[[==_mm_aa_pp_pp_ee_dd_--_dd_ss_--_nn_aa_mm_ee[[[[_ss_oo_uu_rr_cc_ee_--_ii_nn_dd_ee_xx]]]]]]::_DD_SS_TT::_dd_ss_tt _aa_rr_gg_uu_mm_ee_nn_tt_ss
       A single RRRRDD can accept input from several data sources (DDSS), for
       example incoming and outgoing traffic on a specific communication line.
       With the DDSS configuration option you must define some basic properties
       of each data source you want to store in the RRRRDD.

       _d_s_-_n_a_m_e is the name you will use to reference this particular data
       source from an RRRRDD. A _d_s_-_n_a_m_e must be 1 to 19 characters long in the
       characters [a-zA-Z0-9_].

       _D_S_T defines the Data Source Type. The remaining arguments of a data
       source entry depend on the data source type. For GAUGE, COUNTER,
       DERIVE, DCOUNTER, DDERIVE and ABSOLUTE the format for a data source
       entry is:

       DDSS::_d_s_-_n_a_m_e::{_G_A_U_G_E _| _C_O_U_N_T_E_R _| _D_E_R_I_V_E _| _D_C_O_U_N_T_E_R _| _D_D_E_R_I_V_E _|
       _A_B_S_O_L_U_T_E}::_h_e_a_r_t_b_e_a_t::_m_i_n::_m_a_x

       For COMPUTE data sources, the format is:

       DDSS::_d_s_-_n_a_m_e::_C_O_M_P_U_T_E::_r_p_n_-_e_x_p_r_e_s_s_i_o_n

       In order to decide which data source type to use, review the
       definitions that follow. Also consult the section on "HOW TO MEASURE"
       for further insight.

       GGAAUUGGEE
           is  for  things  like temperatures or number of people in a room or
           the value of a RedHat share.

       CCOOUUNNTTEERR
           is for continuous incrementing counters like the ifInOctets counter
           in a router. The CCOOUUNNTTEERR data source assumes that the counter never
           decreases, except when a counter overflows.   The  update  function
           takes  the  overflow into account.  The counter is stored as a per-
           second rate. When the counter  overflows,  RRDtool  checks  if  the
           overflow happened at the 32bit or 64bit border and acts accordingly
           by adding an appropriate value to the result.

       DDCCOOUUNNTTEERR
           the  same  as  CCOOUUNNTTEERR,  but  for  quantities  expressed as double-
           precision floating point number.  Could be used to track quantities
           that increment by non-integer numbers, i.e. number of seconds  that
           some  routine  has  taken  to  run,  total weight processed by some
           technology equipment etc.  The only substantial difference is  that
           DDCCOOUUNNTTEERR  can  either  be upward counting or downward counting, but
           not both at the same  time.   The  current  direction  is  detected
           automatically  on  the  second non-undefined counter update and any
           further change in the direction is considered  a  reset.   The  new
           direction  is  determined  and locked in by the second update after
           reset and its difference to the value at reset.

       DDEERRIIVVEE
           will store the derivative of the line going from the  last  to  the
           current  value  of  the data source. This can be useful for gauges,
           for example, to measure the rate of people entering  or  leaving  a
           room.  Internally,  derive  works  exactly like COUNTER but without
           overflow checks. So if your counter does not reset at 32 or 64  bit
           you might want to use DERIVE and combine it with a MIN value of 0.

       DDDDEERRIIVVEE
           the  same  as  DDEERRIIVVEE,  but  for  quantities  expressed  as double-
           precision floating point number.

           NNOOTTEE oonn CCOOUUNNTTEERR vvss DDEERRIIVVEE

           by Don Baarda <don.baarda@baesystems.com>

           If you cannot tolerate ever mistaking the occasional counter  reset
           for  a legitimate counter wrap, and would prefer "Unknowns" for all
           legitimate counter wraps and resets, always use DERIVE with  min=0.
           Otherwise,  using  COUNTER  with a suitable max will return correct
           values for all legitimate counter wraps, mark some  counter  resets
           as  "Unknown", but can mistake some counter resets for a legitimate
           counter wrap.

           For a  5  minute  step  and  32-bit  counter,  the  probability  of
           mistaking  a  counter reset for a legitimate wrap is arguably about
           0.8% per 1Mbps of maximum bandwidth. Note that this equates to  80%
           for  100Mbps  interfaces,  so  for  high bandwidth interfaces and a
           32bit counter, DERIVE with min=0 is probably preferable. If you are
           using a 64bit counter, just about any max  setting  will  eliminate
           the possibility of mistaking a reset for a counter wrap.

       AABBSSOOLLUUTTEE
           is for counters which get reset upon reading. This is used for fast
           counters  which  tend  to  overflow.  So  instead  of  reading them
           normally you reset them after every read to make sure  you  have  a
           maximum  time  available before the next overflow. Another usage is
           for things you count like number of messages since the last update.

       CCOOMMPPUUTTEE
           is for storing the result  of  a  formula  applied  to  other  data
           sources  in  the  RRRRDD.  This data source is not supplied a value on
           update, but rather its Primary Data Points (PDPs) are computed from
           the PDPs of the data sources according to the  rpn-expression  that
           defines  the  formula.  Consolidation  functions  are  then applied
           normally to the PDPs of the COMPUTE data source (that is  the  rpn-
           expression is only applied to generate PDPs). In database software,
           such data sets are referred to as "virtual" or "computed" columns.

       _h_e_a_r_t_b_e_a_t  defines  the maximum number of seconds that may pass between
       two updates of this data source before the value of the data source  is
       assumed to be _*_U_N_K_N_O_W_N_*.

       _m_i_n  and  _m_a_x  define  the expected range values for data supplied by a
       data source. If _m_i_n and/or _m_a_x are  specified  any  value  outside  the
       defined range will be regarded as _*_U_N_K_N_O_W_N_*. If you do not know or care
       about  min  and  max,  set them to U for unknown. Note that min and max
       always refer to the processed values of the DS. For  a  traffic-CCOOUUNNTTEERR
       type  DS  this would be the maximum and minimum data-rate expected from
       the device.

       _I_f _i_n_f_o_r_m_a_t_i_o_n _o_n _m_i_n_i_m_a_l_/_m_a_x_i_m_a_l _e_x_p_e_c_t_e_d _v_a_l_u_e_s _i_s _a_v_a_i_l_a_b_l_e_,  _a_l_w_a_y_s
       _s_e_t  _t_h_e  _m_i_n  _a_n_d_/_o_r _m_a_x _p_r_o_p_e_r_t_i_e_s_. _T_h_i_s _w_i_l_l _h_e_l_p _R_R_D_t_o_o_l _i_n _d_o_i_n_g _a
       _s_i_m_p_l_e _s_a_n_i_t_y _c_h_e_c_k _o_n _t_h_e _d_a_t_a _s_u_p_p_l_i_e_d _w_h_e_n _r_u_n_n_i_n_g _u_p_d_a_t_e_.

       _r_p_n_-_e_x_p_r_e_s_s_i_o_n defines the formula  used  to  compute  the  PDPs  of  a
       COMPUTE  data  source  from other data sources in the same <RRD>. It is
       similar to defining a CCDDEEFF argument for the graph command. Please refer
       to that manual page for  a  list  and  description  of  RPN  operations
       supported.  For  COMPUTE data sources, the following RPN operations are
       not supported: COUNT, PREV, TIME, and LTIME. In addition,  in  defining
       the RPN expression, the COMPUTE data source may only refer to the names
       of data source listed previously in the create command. This is similar
       to  the  restriction  that  CCDDEEFFs  must  refer  only  to DDEEFFs and CCDDEEFFs
       previously defined in the same graph command.

       When pre-filling the new RRRRDD file using one or more source RRRRDDs, the DS
       specification may hold an optional mapping  after  the  DS  name.  This
       takes  the form of an equal sign followed by a mapped-to DS name and an
       optional source index enclosed in square brackets.

       For example, the DS

        DS:a=b[2]:GAUGE:120:0:U

       specifies that the DS named _a should be pre-filled from the DS named  _b
       in the second listed source file (source indices are 1-based).

   RRRRAA::_CC_FF::_cc_ff _aa_rr_gg_uu_mm_ee_nn_tt_ss
       The  purpose  of  an  RRRRDD  is to store data in the round robin archives
       (RRRRAA). An archive consists of a number of data values or statistics for
       each of the defined data-sources (DDSS) and is defined with an RRRRAA line.

       When data is entered into an RRRRDD, it is first fit into  time  slots  of
       the  length  defined  with  the --ss option, thus becoming a _p_r_i_m_a_r_y _d_a_t_a
       _p_o_i_n_t.

       The data is also processed with the consolidation function (_C_F) of  the
       archive.  There  are  several  consolidation functions that consolidate
       primary data points via an aggregate function: AAVVEERRAAGGEE, MMIINN, MMAAXX, LLAASSTT.

       AVERAGE
           the average of the data points is stored.

       MIN the smallest of the data points is stored.

       MAX the largest of the data points is stored.

       LAST
           the last data points is used.

       Note that data aggregation inevitably leads to loss  of  precision  and
       information.  The trick is to pick the aggregate function such that the
       _i_n_t_e_r_e_s_t_i_n_g properties of your data  is  kept  across  the  aggregation
       process.

       The format of RRRRAA line for these consolidation functions is:

       RRRRAA::{_A_V_E_R_A_G_E _| _M_I_N _| _M_A_X _| _L_A_S_T}::_x_f_f::_s_t_e_p_s::_r_o_w_s

       _x_f_f The xfiles factor defines what part of a consolidation interval may
       be  made  up  from _*_U_N_K_N_O_W_N_* data while the consolidated value is still
       regarded as known. It is given as the ratio of allowed  _*_U_N_K_N_O_W_N_*  PDPs
       to  the  number  of  PDPs  in the interval. Thus, it ranges from 0 to 1
       (exclusive).

       _s_t_e_p_s defines how many of these _p_r_i_m_a_r_y _d_a_t_a _p_o_i_n_t_s are used to build a
       _c_o_n_s_o_l_i_d_a_t_e_d _d_a_t_a _p_o_i_n_t which then goes into  the  archive.   See  also
       "STEP, HEARTBEAT, and Rows As Durations".

       _r_o_w_s  defines  how  many generations of data values are kept in an RRRRAA.
       Obviously,  this  has  to  be  greater  than  zero.   See  also  "STEP,
       HEARTBEAT, and Rows As Durations".

AAbbeerrrraanntt BBeehhaavviioorr DDeetteeccttiioonn wwiitthh HHoolltt--WWiinntteerrss FFoorreeccaassttiinngg
       In  addition to the aggregate functions, there are a set of specialized
       functions that enable RRRRDDttooooll to provide data smoothing (via the  Holt-
       Winters  forecasting  algorithm),  confidence  bands,  and the flagging
       aberrant behavior in the data source time series:

       +o   RRRRAA::_H_W_P_R_E_D_I_C_T::_r_o_w_s::_a_l_p_h_a::_b_e_t_a::_s_e_a_s_o_n_a_l _p_e_r_i_o_d[::_r_r_a_-_n_u_m]

       +o   RRRRAA::_M_H_W_P_R_E_D_I_C_T::_r_o_w_s::_a_l_p_h_a::_b_e_t_a::_s_e_a_s_o_n_a_l _p_e_r_i_o_d[::_r_r_a_-_n_u_m]

       +o   RRRRAA::_S_E_A_S_O_N_A_L::_s_e_a_s_o_n_a_l                             _p_e_r_i_o_d::_g_a_m_m_a::_r_r_a_-
           _n_u_m[::ssmmooootthhiinngg--wwiinnddooww==_f_r_a_c_t_i_o_n]

       +o   RRRRAA::_D_E_V_S_E_A_S_O_N_A_L::_s_e_a_s_o_n_a_l                          _p_e_r_i_o_d::_g_a_m_m_a::_r_r_a_-
           _n_u_m[::ssmmooootthhiinngg--wwiinnddooww==_f_r_a_c_t_i_o_n]

       +o   RRRRAA::_D_E_V_P_R_E_D_I_C_T::_r_o_w_s::_r_r_a_-_n_u_m

       +o   RRRRAA::_F_A_I_L_U_R_E_S::_r_o_w_s::_t_h_r_e_s_h_o_l_d::_w_i_n_d_o_w _l_e_n_g_t_h::_r_r_a_-_n_u_m

       These RRRRAAss differ from the  true  consolidation  functions  in  several
       ways.   First,  each of the RRRRAAs is updated once for every primary data
       point.  Second, these RRRRAAss are interdependent.  To  generate  real-time
       confidence  bounds, a matched set of SEASONAL, DEVSEASONAL, DEVPREDICT,
       and either HWPREDICT or  MHWPREDICT  must  exist.  Generating  smoothed
       values of the primary data points requires a SEASONAL RRRRAA and either an
       HWPREDICT  or  MHWPREDICT  RRRRAA.  Aberrant  behavior  detection requires
       FAILURES, DEVSEASONAL, SEASONAL, and either HWPREDICT or MHWPREDICT.

       The predicted, or smoothed, values  are  stored  in  the  HWPREDICT  or
       MHWPREDICT RRRRAA. HWPREDICT and MHWPREDICT are actually two variations on
       the  Holt-Winters  method.  They  are  interchangeable. Both attempt to
       decompose data into three  components:  a  baseline,  a  trend,  and  a
       seasonal  coefficient.   HWPREDICT adds its seasonal coefficient to the
       baseline to  form  a  prediction,  whereas  MHWPREDICT  multiplies  its
       seasonal  coefficient  by  the  baseline  to  form  a  prediction.  The
       difference is noticeable when the baseline changes significantly in the
       course of a season; HWPREDICT will  predict  the  seasonality  to  stay
       constant  as  the  baseline  changes,  but  MHWPREDICT will predict the
       seasonality to grow or shrink in proportion to the baseline. The proper
       choice of method depends on the thing being  modeled.  For  simplicity,
       the rest of this discussion will refer to HWPREDICT, but MHWPREDICT may
       be substituted in its place.

       The  predicted  deviations  are  stored in DEVPREDICT (think a standard
       deviation which can be scaled to yield a confidence band). The FAILURES
       RRRRAA stores binary indicators. A 1  marks  the  indexed  observation  as
       failure;  that  is,  the  number of confidence bounds violations in the
       preceding window of observations met or exceeded a specified threshold.
       An example of using these RRRRAAss to graph confidence bounds and  failures
       appears in rrdgraph.

       The  SEASONAL  and DEVSEASONAL RRRRAAss store the seasonal coefficients for
       the Holt-Winters forecasting algorithm  and  the  seasonal  deviations,
       respectively.   There  is  one  entry per observation time point in the
       seasonal cycle. For example, if primary data points are generated every
       five minutes and the  seasonal  cycle  is  1  day,  both  SEASONAL  and
       DEVSEASONAL will have 288 rows.

       In  order  to simplify the creation for the novice user, in addition to
       supporting explicit creation of the  HWPREDICT,  SEASONAL,  DEVPREDICT,
       DEVSEASONAL,  and  FAILURES  RRRRAAss,  the RRRRDDttooooll create command supports
       implicit creation of the other four when HWPREDICT is  specified  alone
       and the final argument _r_r_a_-_n_u_m is omitted.

       _r_o_w_s  specifies  the  length  of the RRRRAA prior to wrap around. Remember
       that there is a one-to-one correspondence between primary  data  points
       and  entries in these RRAs. For the HWPREDICT CF, _r_o_w_s should be larger
       than the _s_e_a_s_o_n_a_l _p_e_r_i_o_d. If the DEVPREDICT RRRRAA is implicitly  created,
       the  default number of rows is the same as the HWPREDICT _r_o_w_s argument.
       If the FAILURES RRRRAA is implicitly created, _r_o_w_s  will  be  set  to  the
       _s_e_a_s_o_n_a_l  _p_e_r_i_o_d  argument of the HWPREDICT RRRRAA. Of course, the RRRRDDttooooll
       _r_e_s_i_z_e command is available if these defaults are  not  sufficient  and
       the creator wishes to avoid explicit creations of the other specialized
       function RRRRAAss.

       _s_e_a_s_o_n_a_l  _p_e_r_i_o_d  specifies  the  number  of  primary  data points in a
       seasonal cycle. If SEASONAL and  DEVSEASONAL  are  implicitly  created,
       this  argument  for  those  RRRRAAss  is  set  automatically  to  the value
       specified by HWPREDICT. If they are  explicitly  created,  the  creator
       should verify that all three _s_e_a_s_o_n_a_l _p_e_r_i_o_d arguments agree.

       _a_l_p_h_a  is  the  adaption  parameter  of  the  intercept  (or  baseline)
       coefficient in the Holt-Winters forecasting algorithm. See rrdtool  for
       a  description  of  this  algorithm.  _a_l_p_h_a must lie between 0 and 1. A
       value closer to 1 means that more  recent  observations  carry  greater
       weight  in  predicting  the baseline component of the forecast. A value
       closer  to  0  means  that  past  history  carries  greater  weight  in
       predicting the baseline component.

       _b_e_t_a  is  the  adaption  parameter  of  the  slope  (or  linear  trend)
       coefficient in the Holt-Winters forecasting algorithm.  _b_e_t_a  must  lie
       between  0  and  1 and plays the same role as _a_l_p_h_a with respect to the
       predicted linear trend.

       _g_a_m_m_a is the adaption parameter of the  seasonal  coefficients  in  the
       Holt-Winters   forecasting   algorithm   (HWPREDICT)  or  the  adaption
       parameter  in  the  exponential  smoothing  update  of   the   seasonal
       deviations.  It  must  lie  between  0  and  1.  If  the  SEASONAL  and
       DEVSEASONAL RRRRAAss are created implicitly, they will both have  the  same
       value  for _g_a_m_m_a: the value specified for the HWPREDICT _a_l_p_h_a argument.
       Note that because there is one seasonal coefficient (or deviation)  for
       each  time point during the seasonal cycle, the adaptation rate is much
       slower than the baseline. Each seasonal coefficient is only updated (or
       adapts) when the observed value occurs at the offset  in  the  seasonal
       cycle corresponding to that coefficient.

       If SEASONAL and DEVSEASONAL RRRRAAss are created explicitly, _g_a_m_m_a need not
       be  the  same  for  both.  Note  that _g_a_m_m_a can also be changed via the
       RRRRDDttooooll _t_u_n_e command.

       _s_m_o_o_t_h_i_n_g_-_w_i_n_d_o_w specifies the fraction of  a  season  that  should  be
       averaged  around  each point. By default, the value of _s_m_o_o_t_h_i_n_g_-_w_i_n_d_o_w
       is 0.05, which means each value in SEASONAL  and  DEVSEASONAL  will  be
       occasionally  replaced  by averaging it with its (_s_e_a_s_o_n_a_l _p_e_r_i_o_d*0.05)
       nearest neighbors.  Setting _s_m_o_o_t_h_i_n_g_-_w_i_n_d_o_w to zero will  disable  the
       running-average smoother altogether.

       _r_r_a_-_n_u_m  provides  the  links  between  related  RRRRAAss.  If HWPREDICT is
       specified alone and the other RRRRAAss are created implicitly,  then  there
       is  no  need  to  worry  about  this  argument.  If  RRRRAAss  are  created
       explicitly, then carefully pay attention to this argument. For each RRRRAA
       which includes this argument, there is a dependency  between  that  RRRRAA
       and another RRRRAA. The _r_r_a_-_n_u_m argument is the 1-based index in the order
       of RRRRAA creation (that is, the order they appear in the _c_r_e_a_t_e command).
       The dependent RRRRAA for each RRRRAA requiring the _r_r_a_-_n_u_m argument is listed
       here:

       +o   HWPREDICT _r_r_a_-_n_u_m is the index of the SEASONAL RRRRAA.

       +o   SEASONAL _r_r_a_-_n_u_m is the index of the HWPREDICT RRRRAA.

       +o   DEVPREDICT _r_r_a_-_n_u_m is the index of the DEVSEASONAL RRRRAA.

       +o   DEVSEASONAL _r_r_a_-_n_u_m is the index of the HWPREDICT RRRRAA.

       +o   FAILURES _r_r_a_-_n_u_m is the index of the DEVSEASONAL RRRRAA.

       _t_h_r_e_s_h_o_l_d  is the minimum number of violations (observed values outside
       the confidence bounds) within a window that constitutes a  failure.  If
       the FAILURES RRRRAA is implicitly created, the default value is 7.

       _w_i_n_d_o_w  _l_e_n_g_t_h  is  the number of time points in the window. Specify an
       integer greater than or equal to the threshold and less than  or  equal
       to  28.   The  time  interval  this  window  represents  depends on the
       interval between primary data points. If the FAILURES RRRRAA is implicitly
       created, the default value is 9.

SSTTEEPP,, HHEEAARRTTBBEEAATT,, aanndd RRoowwss AAss DDuurraattiioonnss
       Traditionally RRDtool specified PDP  intervals  in  seconds,  and  most
       other   values  as  either  seconds  or  PDP  counts.   This  made  the
       specification for databases rather opaque; for example

        rrdtool create power.rrd \
          --start now-2h --step 1 \
          DS:watts:GAUGE:300:0:24000 \
          RRA:AVERAGE:0.5:1:864000 \
          RRA:AVERAGE:0.5:60:129600 \
          RRA:AVERAGE:0.5:3600:13392 \
          RRA:AVERAGE:0.5:86400:3660

       creates a database of power values collected once per  second,  with  a
       five  minute  (300  second)  heartbeat,  and four RRRRAAs: ten days of one
       second, 90 days of one minute, 18 months of one hour, and ten years  of
       one day averages.

       Step,  heartbeat,  and  PDP  counts  and  rows may also be specified as
       durations, which are positive integers with a  single-character  suffix
       that  specifies  a scaling factor.  See "rrd_scaled_duration" in librrd
       for scale  factors  of  the  supported  suffixes:  "s"  (seconds),  "m"
       (minutes),  "h" (hours), "d" (days), "w" (weeks), "M" (months), and "y"
       (years).

       Scaled step and heartbeat  values  (which  are  natively  durations  in
       seconds)  are used directly, while consolidation function row arguments
       are divided by their step to produce the number of rows.

       With this feature the same specification as above can be written as:

        rrdtool create power.rrd \
          --start now-2h --step 1s \
          DS:watts:GAUGE:5m:0:24000 \
          RRA:AVERAGE:0.5:1s:10d \
          RRA:AVERAGE:0.5:1m:90d \
          RRA:AVERAGE:0.5:1h:18M \
          RRA:AVERAGE:0.5:1d:10y

TThhee HHEEAARRTTBBEEAATT aanndd tthhee SSTTEEPP
       Here is an explanation by Don Baarda on the inner workings of  RRDtool.
       It  may  help you to sort out why all this *UNKNOWN* data is popping up
       in your databases:

       RRDtool gets fed samples/updates at  arbitrary  times.  From  these  it
       builds  Primary  Data  Points (PDPs) on every "step" interval. The PDPs
       are then accumulated into the RRAs.

       The  "heartbeat"  defines  the  maximum  acceptable  interval   between
       samples/updates.   If   the  interval  between  samples  is  less  than
       "heartbeat", then an average rate is calculated and  applied  for  that
       interval.  If  the interval between samples is longer than "heartbeat",
       then that entire interval is considered "unknown". Note that there  are
       other  things  that  can  make a sample interval "unknown", such as the
       rate exceeding limits, or  a  sample  that  was  explicitly  marked  as
       unknown.

       The known rates during a PDP's "step" interval are used to calculate an
       average  rate  for  that  PDP. If the total "unknown" time accounts for
       more than hhaallff the "step", the entire PDP is marked as "unknown".  This
       means  that  a  mixture of known and "unknown" sample times in a single
       PDP "step" may or may not add up to enough "known" time  to  warrant  a
       known PDP.

       The  "heartbeat"  can  be short (unusual) or long (typical) relative to
       the "step" interval between PDPs. A short "heartbeat" means you require
       multiple samples per PDP, and if  you  don't  get  them  mark  the  PDP
       unknown.  A long heartbeat can span multiple "steps", which means it is
       acceptable to have multiple PDPs calculated from a  single  sample.  An
       extreme  example  of  this  might  be  a  "step"  of  5  minutes  and a
       "heartbeat" of one day, in which case a single sample  every  day  will
       result in all the PDPs for that entire day period being set to the same
       average rate. _-_- _D_o_n _B_a_a_r_d_a _<_d_o_n_._b_a_a_r_d_a_@_b_a_e_s_y_s_t_e_m_s_._c_o_m_>

              time|
              axis|
        begin__|00|
               |01|
              u|02|----* sample1, restart "hb"-timer
              u|03|   /
              u|04|  /
              u|05| /
              u|06|/     "hbt" expired
              u|07|
               |08|----* sample2, restart "hb"
               |09|   /
               |10|  /
              u|11|----* sample3, restart "hb"
              u|12|   /
              u|13|  /
        step1_u|14| /
              u|15|/     "swt" expired
              u|16|
               |17|----* sample4, restart "hb", create "pdp" for step1 =
               |18|   /  = unknown due to 10 "u" labeled secs > 0.5 * step
               |19|  /
               |20| /
               |21|----* sample5, restart "hb"
               |22|   /
               |23|  /
               |24|----* sample6, restart "hb"
               |25|   /
               |26|  /
               |27|----* sample7, restart "hb"
        step2__|28|   /
               |22|  /
               |23|----* sample8, restart "hb", create "pdp" for step1, create "cdp"
               |24|   /
               |25|  /

       graphics by _v_l_a_d_i_m_i_r_._l_a_v_r_o_v_@_d_e_s_y_._d_e.

HHOOWW TTOO MMEEAASSUURREE
       Here are a few hints on how to measure:

       Temperature
           Usually  you  have  some  type  of  meter  you  can read to get the
           temperature.  The temperature is not really connected with a  time.
           The  only  connection is that the temperature reading happened at a
           certain time. You can use the GGAAUUGGEE  data  source  type  for  this.
           RRDtool will then record your reading together with the time.

       Mail Messages
           Assume   you  have  a  method  to  count  the  number  of  messages
           transported by your mail server in a certain amount of time, giving
           you data like '5 messages in the last 65 seconds'. If you  look  at
           the count of 5 like an AABBSSOOLLUUTTEE data type you can simply update the
           RRD  with  the number 5 and the end time of your monitoring period.
           RRDtool will then record the number of messages per second.  If  at
           some   later  stage  you  want  to  know  the  number  of  messages
           transported in a day, you can get the average messages  per  second
           from  RRDtool for the day in question and multiply this number with
           the number of seconds in a  day.  Because  all  math  is  run  with
           Doubles, the precision should be acceptable.

       It's always a Rate
           RRDtool   stores   rates  in  amount/second  for  COUNTER,  DERIVE,
           DCOUNTER, DDERIVE and ABSOLUTE data.  When you plot the  data,  you
           will  get on the y axis amount/second which you might be tempted to
           convert to an absolute amount  by  multiplying  by  the  delta-time
           between  the  points. RRDtool plots continuous data, and as such is
           not appropriate for plotting absolute amounts as for example "total
           bytes" sent and received in a router. What  you  probably  want  is
           plot  rates  that you can scale to bytes/hour, for example, or plot
           absolute amounts with another tool that draws bar-plots, where  the
           delta-time  is clear on the plot for each point (such that when you
           read the graph you see for example GB on the y axis, days on the  x
           axis and one bar for each day).

EEXXAAMMPPLLEE
        rrdtool create temperature.rrd --step 300 \
         DS:temp:GAUGE:600:-273:5000 \
         RRA:AVERAGE:0.5:1:1200 \
         RRA:MIN:0.5:12:2400 \
         RRA:MAX:0.5:12:2400 \
         RRA:AVERAGE:0.5:12:2400

       This   sets   up  an  RRRRDD  called  _t_e_m_p_e_r_a_t_u_r_e_._r_r_d  which  accepts  one
       temperature value every 300 seconds. If no new  data  is  supplied  for
       more  than 600 seconds, the temperature becomes _*_U_N_K_N_O_W_N_*.  The minimum
       acceptable value is -273 and the maximum is 5'000.

       A few archive areas are also defined. The first stores the temperatures
       supplied for 100 hours (1'200 * 300 seconds = 100  hours).  The  second
       RRA  stores  the minimum temperature recorded over every hour (12 * 300
       seconds = 1 hour), for 100 days (2'400 hours). The third and the fourth
       RRA's  do  the  same  for  the   maximum   and   average   temperature,
       respectively.

EEXXAAMMPPLLEE 22
        rrdtool create monitor.rrd --step 300        \
          DS:ifOutOctets:COUNTER:1800:0:4294967295   \
          RRA:AVERAGE:0.5:1:2016                     \
          RRA:HWPREDICT:1440:0.1:0.0035:288

       This  example  is a monitor of a router interface. The first RRRRAA tracks
       the traffic flow in octets; the second RRRRAA  generates  the  specialized
       functions  RRRRAAss  for aberrant behavior detection. Note that the _r_r_a_-_n_u_m
       argument of HWPREDICT is missing, so the other RRRRAAss will implicitly  be
       created with default parameter values. In this example, the forecasting
       algorithm  baseline adapts quickly; in fact the most recent one hour of
       observations (each at 5 minute  intervals)  accounts  for  75%  of  the
       baseline prediction. The linear trend forecast adapts much more slowly.
       Observations  made  during  the  last day (at 288 observations per day)
       account for only  65%  of  the  predicted  linear  trend.  Note:  these
       computations  rely on an exponential smoothing formula described in the
       LISA 2000 paper.

       The  seasonal  cycle  is  one  day  (288  data  points  at  300  second
       intervals), and the seasonal adaption parameter will be set to 0.1. The
       RRD  file  will  store  5  days  (1'440  data  points) of forecasts and
       deviation predictions before wrap around. The file will store 1 day  (a
       seasonal cycle) of 0-1 indicators in the FAILURES RRRRAA.

       The  same  RRD  file  and  RRRRAAss are created with the following command,
       which explicitly creates all specialized  function  RRRRAAss  using  "STEP,
       HEARTBEAT, and Rows As Durations".

        rrdtool create monitor.rrd --step 5m \
          DS:ifOutOctets:COUNTER:30m:0:4294967295 \
          RRA:AVERAGE:0.5:1:2016 \
          RRA:HWPREDICT:5d:0.1:0.0035:1d:3 \
          RRA:SEASONAL:1d:0.1:2 \
          RRA:DEVSEASONAL:1d:0.1:2 \
          RRA:DEVPREDICT:5d:5 \
          RRA:FAILURES:1d:7:9:5

       Of  course,  explicit  creation  need  not replicate implicit create, a
       number of arguments could be changed.

EEXXAAMMPPLLEE 33
        rrdtool create proxy.rrd --step 300 \
          DS:Requests:DERIVE:1800:0:U  \
          DS:Duration:DERIVE:1800:0:U  \
          DS:AvgReqDur:COMPUTE:Duration,Requests,0,EQ,1,Requests,IF,/ \
          RRA:AVERAGE:0.5:1:2016

       This example is monitoring the average request duration during each 300
       sec interval for requests processed by a web proxy during the interval.
       In this case, the proxy exposes two counters, the  number  of  requests
       processed since boot and the total cumulative duration of all processed
       requests.  Clearly  these  counters  both have some rollover point, but
       using the DERIVE data source also handles the reset  that  occurs  when
       the web proxy is stopped and restarted.

       In  the  RRRRDD, the first data source stores the requests per second rate
       during the interval. The second data source stores the  total  duration
       of  all  requests  processed  during  the  interval divided by 300. The
       COMPUTE data source divides  each  PDP  of  the  AccumDuration  by  the
       corresponding  PDP  of  TotalRequests  and  stores  the average request
       duration. The remainder of the RPN expression  handles  the  divide  by
       zero case.

SSEECCUURRIITTYY
       Note  that  new  rrd  files will have the permission 0644 regardless of
       your umask setting. If a file with the same name previously exists, its
       permission settings will be copied to the new file.

AAUUTTHHOORRSS
       Tobias Oetiker <tobi@oetiker.ch>, Peter Stamfest <peter@stamfest.at>

1.9.0                             2024-07-29                      _R_R_D_C_R_E_A_T_E(1)
