Setting up the WXP Ingestor

Data Ingest Concepts

Most realtime meteorological data arrives at a site via asynchronous or synchronous transmission either via land line or satellite. To keep data from falling on the floor, it is necessary to ingest it on a computer and save the data to disk.  The data ingestor therefore has four tasks:

  1. Read data from feed -- The program reads byte from the data feed.   If this is an asynchronous feed, the communication port is set up and bytes read.   The feed could be read from a file or via a socket network communication.
  2. Interprets data -- Each feed type has a protocol defining how data is delivered.  In most cases, control characters such as SOH (ASCII 0x1), STX (ASCII 0x2) and ETX (ASCII 0x3) are used to define portions of a product.  Each product has a product header describing the contents of the product which is then followed by the product contents.  The header generally follows a syntax such as the WMO header format.
  3. Selects data -- In many cases, not all data from a feed is desired.   As a result, the ingestor must be able to select products based on the product header.  Those not selected are lost.
  4. Perform an action on the selected data -- In most cases, the action is to append the product to an existing data file. The output data file could have a naming convention based on product type.  The products could be saved on set hourly boundaries.  In addition, the action could be to perform a task when the product arrives.

4.2 WXP Ingestor

WXP comes with an ingestor that will handle the standard Family of Services (FOS) feeds including Domestic Data, Public Products, Domestic Data Plus, International Data and the High Resolution Data (Model GRIB) feed.  WXP will also handle data from NOAAPORT using the Unisys Weather Gateway system.  The ingestor is set up to only handle one data feed at a time but the program can be run more than once to handle each feed.   In order to simplify this process, specific resources for each feed are set up in the resource file wxp.def

#
#  Ingest resources
#
#	DDPLUS
ddplus.data_path: c:\wxp\ddplus
ddplus.filename: COM1
ddplus.in_file: ddp
ddplus.bull_file: dds.bul
ddplus.log_file: dds-%m%d.log
ddplus.message: out1
#	HDS
hds.data_path: c:\wxp\hds
hds.filename: COM2
hds.in_file: hds
hds.bull_file: hds.bul
hds.log_file: hds-%m%d.log
hds.message: out2
#	NOAAPORT
noaaport.filename: sock:5000
noaaport.parameter: log_unk,cntrl
noaaport.bull_file: noaaport.bul
noaaport.log_file: noaa-%m%d.log
noaaport.message: out2
noaaport.data_path: c:\wxp

And then the ingest program run twice:

ingest -na=ddplus -me=none
ingest -na=hds -me=none

or for NOAAPORT"

ingest -na=noaaport -me=none

The resource define:

Predefined Input Data Types

The WXP ingestor has several predefined data types:

Type Description Parameters Comments
604 FAA 604 4800,odd,stop2 Processes FAA 604 header format
dds Domestic Data 19200,even WMO header format
pps Public Products 19200,even WMO header format
ddp Domestic Data Plus 19200,even WMO header format, feed is combination of domestic and public products
ids International Data 9600,even WMO header format
hds High Res Data 19200,none WMO header format, binary data feed with Alden extensions
kav Kavouras Data 19200,none WMO header format, Kavouras feed type
kav604 Kavouras 604 Data 19200,none WMO header format, Kavouras feed type set up for FAA 604 data
gte GTE Data 19200,even WMO header format, GTE feed type
jma JMA Asynch Data 1200,even WMO header format, JMA feed type
jma64 JMA Sync Data 9600,none WMO header format, JMA feed type, synchronous (HDLC)
wxp WXP Data 19200,even WMO header format, WXP feed type

If the data line does not fit these predefined types, the parameters can be specified:

   ingest -if=dds,9600,even

where you can specify the data type, baud rate and parity. 

NOTE: NOAAPORT uses the default characteristics.

Bulletin File

The ingest program uses a bulletin file to set up product processing.  The bulletin filename is specified with the bull_file resource. The file contains a list of headers, actions and commands to be performed:

header [action] [command...]
header [action] [command...]
...

The header is used as a selection filter that can contain standard wildcard characters for header pattern matching.  The action tells the ingest process exactly what to do with the product.  This can be saving the product to file or running a specified command on the product.  The command/file section is either the command to run or the file to save the product to.  This can contain a set of escape characters which are listed below.

Bulletin File Header Selection

The header can specify the exact header or a pattern to which headers can be matched. The headers listed in the file can use the following regular expression characters:

. or ? match a single character
- or * match any character
[letters] match a character from the set.
[^letters] match any character except those from the set
(str1|str2...) match strings
_ underscore matches a space
/data match extra information

Some example header strings are:

AB           Anything that starts with AB
S[AP]        SA or SP
(W|AC|RG)    Starts with W or AC or RG
F[^O]        Anything that starts with F, second character NOT O
FQUS1_KIND   Full header specification with spaces as underscores.
*_KIND       Wildcard match on any product that ends with KIND.

4.2.2.2 Bulletin Actions

The actions are:

>> append to file with header
append same as above
> write to file with header, previous content overwritten
write same as above
# write to file without header, previous contents overwritten
file same as above
| pipe product to listed command
pipe same as above
@ run command when product complete
run same as above

The action can be prepended by a "R".  The R specifies to save the file as a raw file and not strip control characters. 

4.2.2.3 Bulletin Commands and Filenames

The command can either be the file to save the product to or a command to run with the pipe or run actions. The command can have several escape characters:

Examples based on system time 1455Z Jan 12, 1997,
product header FPUS5 KIND 281512

Wildcard Explanation Example
@tag Name convention tag  
%Y current system year 1997
%y current system year (last 2 digits) 97
%b current system month (3 letters) jan
%m current system month 01
%d current system day 12
%j current system julian day 12
%h current system hour 14
%n current system minute 55
%pd product day 28
%ph product hour 15
%pn product minute 12
%T product type FPUS5
%t product type (lower case) fpus5
%L product localle KIND
%l product localle (lower case) kind
%D data_path resource  
%C con_path resource  
%R raw_path resource  
%G grid_path resource  
%W watch_path resource  
%I image_path resource  
%F file_path resource  

Some of the above wildcards can be preceded with a number.  For dates, the number is a modifier which rounds down to the nearest value which is a mulitple of that number.  For example, "%6h" would round down to the nearest 6 hour boundary.  For the previous example, it results in the value 12.  

For the product type and localle, this number is used in a substring operation.  The first digit of the number is the offset into the string and the second digit refers to the number of characters to use.  For example, "%12T" results in "FP".  To get "IND", use "%23L".

Time Offset

A time offset can be applied to the action.  This adjusts the system time prior to determining the filenames in the command section.  The minute offset is appended to the action.  Here are some examples:

S[AP]             >>-15 %D/%y%m%d%h_sao.wmo
SD                >>+07 %D/%y%m%d%h_rad.wmo
U[^AB]            >>-65 %D/%y%m%d%12h_upa.wmo    

In the first case, the time is offset 15 minutes backwards so that the 18Z data file will contain data ingested in the period of 1745-1845Z.  The second example, offsets the time ahead 5 minutes so that the 18Z data will come in from 1805-1905Z.  Finally, there is the case of the upper air data which is saved in a 12 hourly file (%12h).   The data is offset backwards 65 minutes so that the arrival time for 12Z data will be from 1055-2255Z.

Sample Data Setup

The WXP distribution has a standard file naming structure. These name conventions can be modified if needed.

Directory Description Sample Name Convention
  Domestic Data and Public Products
c:\wxp\ddplus Climate Data %D/%y%m%d%h_cli.wmo
c:\wxp\ddplus Earthquake Data %D/%y%m%d%h_eqk.wmo
c:\wxp\ddplus Forecast Products %D/%y%m%d%h_for.wmo
c:\wxp\ddplus Front Location Data %D/%y%m%d%h_frt.wmo
c:\wxp\ddplus SHEF/Hydrological Data %D/%y%m%d%h_hyd.wmo
c:\wxp\ddplus Miscellaneous Data %D/%y%m%d%h_msc.wmo
c:\wxp\ddplus Model Output Statistics %D/%y%m%d%h_mod.wmo
c:\wxp\ddplus MDR Radar Data %D/%y%m%d%h_rad.wmo
c:\wxp\ddplus Severe Advisories %D/%y%m%d%h_sev.wmo
c:\wxp\ddplus Weather Summaries %D/%y%m%d%h_sum.wmo
c:\wxp\ddplus Surface SAOs. METARS %D/%y%m%d%h_sao.wmo
c:\wxp\ddplus Synoptic Data %D/%y%m%d%h_syn.wmo
c:\wxp\ddplus Tropical Advisories %D/%y%m%d%h_trp.wmo
c:\wxp\ddplus Severe Weather Watches %D/%y%m%d%h_wws.wmo
c:\wxp\ddplus Upper Air Data %D/%y%m%d%h_upa.wmo
  High Res GRIB Products  
c:\wxp\hds NGM Model Products %D/%y%m%d%h_ngm.grb
c:\wxp\hds ETA Model Products %D/%y%m%d%h_eta.grb
c:\wxp\hds Aviation Model Products (US Sector) %D/%y%m%d%h_avus.grb
  Satellite Products  
c:\wxp\sat GOES East Visible Images %D/%y%m%d%h_vie.sat
c:\wxp\sat GOES East Infrared Images %D/%y%m%d%h_ire.sat
c:\wxp\sat GOES East Water Vapor Images %D/%y%m%d%h_wve.sat
  Miscellaneous Products  
c:\wxp\nids NIDS Base Reflectivity Data %D/%e/%Y%m%d%h%m.bref1
c:\wxp\nowrad WSI NOWRad Products %D/%y%m%d%h%m.master
c:\wxp\nldn NLDN Lightning Data %D/%y%j%h.nldn
c:\wxp\profiler Profiler Data %D/%y%m%d%h.prf

Running the Ingestor

The ingestor must be run for each data type being ingested.  If you are ingesting DD+ and HDS, the ingest program is run twice:

ingest -na=ddplus -me=none &
ingest -na=hds -me=none &

This will run ingest based on the named resources for ddplus and hds which have been set up in the resource file (see above).  Resetting the message level to none is recommended for background ingest.  Without this, status information on the products will scroll on the screen.  This may be preferred in some cases in which case, opening up the ingestor in separate windows is recommended.

Data Management and the Scour Script

WXP provides a scour program for removing older data files to prevent new data from exhausting disk space.  Scour is normally run on each data directory and the converted data directory and should be run at least once an hour.  The following script should be run under cron at roughly 35 after the hour.

c:\wxp\bin\scour c:\wxp\ddplus   msize=150
c:\wxp\bin\scour c:\wxp\hds      msize=200
c:\wxp\bin\scour c:\wxp\convert  msize=20

or for NOAAPORT:

c:\wxp\bin\scour c:\wxp\data msize=500
c:\wxp\bin\scour c:\wxp\text msize=250
c:\wxp\bin\scour c:\wxp\model msize=500
c:\wxp\bin\scour c:\wxp\convert msize=100     
c:\wxp\bin\scour c:\wxp\sat msize=750

NOTE: Make sure the total disk space reserved in the scour routines does not exceed the disk space on the computer. 

NOTE: Make sure the wxp account has permissions to delete the datafiles. Otherwise, scour will not work properly.


For further information about WXP, email devo@ks.unisys.com
Last updated by Dan Vietor on July 28, 1998