Parsing Programs

This tutorial will take you through the basics of data parsing in WXP.  WXP provides a variety of parsing programs as tools to extract various types of data.

Basic Text Data Parsing

This is done with the "parse" program.  The parse program takes basically two inputs: a file and a WMO header.  The program assumes a knowledge of data organization by WMO headers. WXP does not create an interface between specific data types and WMO header because these are always changing and keeping a current database would be next to impossible.  But for most data types, the WMO headers are for the most part set.

In the tutorial, we'll assume a command line only operation.  This way we can investigate the various command line operations.

The first example will be for parsing zone forecasts from Philadelphia.  The WMO header is "FPUS51 KPHI" but to use this on the command line, the space will be replaced with an underscore "_".  We'll first check the latest file:

   % parse -cu=la -ph=FPUS51_KPHI
   % _

The "-cu=la" specifies to use the latest file and "-ph=" is used to specify the WMO header.  We find out there is no zone forecast product in the latest file.  If we are curious which file was parsed, we can raise the message level to output more information.   WXP message levels are: print, mess, error, warn, out1, out2, out3, out4 and debug.  The list is sequential so specifying out2 will print all messages from print to out2.  For most programs, the default is out2 but for parsing programs, the default is warn.

   % parse -cu=la -ph=FPUS51_KPHI -me=out2
   RAW DATA FILE PARSER (Ver 5.014-LINUX-X11)

   Current filename: /noaaport/nwstg/text/01062106_for.wmo
   Searching for: FPUS51_KPHI...
   ** 21/6Z **
   Parsing: /noaaport/nwstg/text/01062106_for.wmo
   %

This tells us there are no products in the 01062106_for.wmo file.   This is often the case since most products are not sent every hour or even every 6 (as is the case with forecast data files). 

In order to check back further in time, we need to specify a number of hours.

   % parse -cu=la -ph=FPUS51_KPHI -nh=-6
   ** FPUS51 KPHI 210055 ***
   ZFPPHI

   ZONE FORECAST PRODUCT...UPDATED
   NATIONAL WEATHER SERVICE MOUNT HOLLY NJ
   853 PM EDT WED JUN 20 2001

   PAZ054-055-211353-
   UPDATED
   CARBON PA-MONROE PA-
   853 PM EDT WED JUN 20 2001

   .TONIGHT...A 40 PERCENT CHANCE OF SHOWERS AND THUNDERSTORMS THIS
   EVENING...THEN MOSTLY CLOUDY. LOWS IN THE LOWER 60S. NORTHEAST WIND 5
   TO 10 MPH.
   .THURSDAY...PARTLY SUNNY...THEN A 60 PERCENT CHANCE OF SHOWERS AND
   THUNDERSTORMS IN THE AFTERNOON. HIGHS IN THE UPPER 70S. EAST WIND 10
   TO 15 MPH.
   .THURSDAY NIGHT...A 60 PERCENT CHANCE OF SHOWERS AND THUNDERSTORMS IN
   THE EVENING...THEN MOSTLY CLOUDY. LOWS IN THE LOWER 60S.
   .FRIDAY...A 70 PERCENT CHANCE OF SHOWERS AND THUNDERSTORMS. HIGHS IN
   THE MID 70S.

   .EXTENDED FORECAST...
   .FRIDAY NIGHT...THUNDERSTORMS LIKELY. LOWS NEAR 60.
   .SATURDAY...MOSTLY CLOUDY WITH A CHANCE OF SHOWERS AND THUNDERSTORMS.
   HIGHS IN THE LOWER 70S.
   .SUNDAY...MOSTLY CLOUDY WITH A CHANCE OF SHOWERS AND THUNDERSTORMS.
   LOWS IN THE UPPER 50S AND HIGHS 75 TO 80.
   .MONDAY...PARTLY CLOUDY. LOWS NEAR 60 AND HIGHS 80 TO 85.
   .TUESDAY...MOSTLY CLEAR. LOWS IN THE LOWER 60S AND HIGHS 80 TO 85.
   .WEDNESDAY...MOSTLY CLEAR. LOWS IN THE MID 60S AND HIGHS 80 TO 85.
  
   $$

   ...

The product continues on but I'll only show the start of the output.

As you can tell, some of these products are fairly lengthy so being able to parse to more specific information contained within the product is handy.  To do this, we'll take advantage of the identifier resource.  This will start parsing within a product when it finds the specific identifier.  In the default usage, this will search for a keyword at the beginning of a line.  But since we're looking at zone forecasts, I would like to parse for a specific zone.  To do this, we proceed the zone specification with a "%".

   % parse -cu=la -ph=FPUS51_KPHI -nh=-6 -id=%PAZ067
   ** FPUS51 KPHI 210055 ***
   PAZ067>069-NJZ009-010-012-015-211353-
   UPDATED
   BUCKS PA-CHESTER PA-HUNTERDON NJ-MERCER NJ-MIDDLESEX NJ-
   MONTGOMERY PA-SOMERSET NJ-
   853 PM EDT WED JUN 20 2001

   .TONIGHT...A 30 PERCENT CHANCE OF SHOWERS THIS EVENING...OTHERWISE
   PARTLY CLOUDY. LOWS IN THE UPPER 60S. SOUTHWEST WIND 5 TO 10 MPH.
   .THURSDAY...PARTLY SUNNY...THEN MOSTLY CLOUDY WITH A 50 PERCENT
   ...
   .TUESDAY...MOSTLY CLEAR. LOWS IN THE UPPER 60S AND HIGHS 85 TO 90.
   .WEDNESDAY...MOSTLY CLEAR. LOWS NEAR 70 AND HIGHS IN THE LOWER 90S.
  
   $$

   PAZ070-071-NJZ013-017>020-211353-
   UPDATED
   BURLINGTON NJ-CAMDEN NJ-DELAWARE PA-GLOUCESTER NJ-OCEAN NJ-
   PHILADELPHIA PA-WESTERN MONMOUTH NJ-
   853 PM EDT WED JUN 20 2001
   ...

WXP is smart enough to parse zone ranges (067>069).  So you could have searched for zone 68 and gotten this product.

A problem with the above example is that is starts at that location but doesn't stop.   We need to add a parameter to tell parse to stop.  Well, zone forecasts terminate with a "$$" so parse provides an option to stop parsing at a dollar sign.

   % parse -cu=la -ph=FPUS51_KPHI -nh=-6 -id=%PAZ067 -pa=dollar
   ** FPUS51 KPHI 210055 ***  
   PAZ067>069-NJZ009-010-012-015-211353-
   UPDATED
   BUCKS PA-CHESTER PA-HUNTERDON NJ-MERCER NJ-MIDDLESEX NJ-
   MONTGOMERY PA-SOMERSET NJ-
   853 PM EDT WED JUN 20 2001

   .TONIGHT...A 30 PERCENT CHANCE OF SHOWERS THIS EVENING...OTHERWISE
   PARTLY CLOUDY. LOWS IN THE UPPER 60S. SOUTHWEST WIND 5 TO 10 MPH.
   .THURSDAY...PARTLY SUNNY...THEN MOSTLY CLOUDY WITH A 50 PERCENT
   ...
   .TUESDAY...MOSTLY CLEAR. LOWS IN THE UPPER 60S AND HIGHS 85 TO 90.
   .WEDNESDAY...MOSTLY CLEAR. LOWS NEAR 70 AND HIGHS IN THE LOWER 90S.
  
   ** FPUS51 KPHI 210319 ***
   PAZ067>069-NJZ009-010-012-015-ALL-211618-
   UPDATED
   BUCKS PA-CHESTER PA-HUNTERDON NJ-MERCER NJ-MIDDLESEX NJ-
   ...

But this leaves another problem.  Often there are several updates to a zone and rather than printing out one zone forecast, you get several updates.  So we need to specify to print only the last update:

   % parse -cu=la -ph=FPUS51_KPHI -nh=-6 -id=%PAZ067 -pa=dollar,last
   ** FPUS51 KPHI 210055 ***  
   PAZ067>069-NJZ009-010-012-015-211353-
   UPDATED
   BUCKS PA-CHESTER PA-HUNTERDON NJ-MERCER NJ-MIDDLESEX NJ-
   MONTGOMERY PA-SOMERSET NJ-
   853 PM EDT WED JUN 20 2001

   .TONIGHT...A 30 PERCENT CHANCE OF SHOWERS THIS EVENING...OTHERWISE
   PARTLY CLOUDY. LOWS IN THE UPPER 60S. SOUTHWEST WIND 5 TO 10 MPH.
   .THURSDAY...PARTLY SUNNY...THEN MOSTLY CLOUDY WITH A 50 PERCENT
   ...
   .TUESDAY...MOSTLY CLEAR. LOWS IN THE UPPER 60S AND HIGHS 85 TO 90.
   .WEDNESDAY...MOSTLY CLEAR. LOWS NEAR 70 AND HIGHS IN THE LOWER 90S.
  
  
%

So we've gone through a complete parsing example.  The bottom line is if you know the WMO header, you can parse for almost any text product that is saved.

NOTE: The parse program makes assumptions about the input file type based on the WMO header.  In some cases, this won't work and you will have to specify the input file name convention as well.  This is done with the "in_file" resource.  For example:

   % parse -cu=la -if=for_txt -ph=FPUS51_KPHI -nh=-6 -id=%PAZ067 -pa=dollar,last

This will override the automatic WMO conversion to file type.  The "for_txt" specifies the name convention used to access the proper filename.  More information on name conventions will be in a later tutorial.

Lets go through another example.  Say we want a state weather roundup

   % parse -cu=la -ph=ASUS41_KPHL -pa=last
   ** ASUS41 KPHL 210602 ***
   SWRNJ

   REGIONAL WEATHER ROUNDUP
   NATIONAL WEATHER SERVICE MOUNT HOLLY NJ
   200 AM EDT THU JUN 21 2001

   NOTE:  "FAIR" INDICATES FEW OR NO CLOUDS BELOW 12,000 FEET WITH NO
   SIGNIFICANT WEATHER AND/OR OBSTRUCTIONS TO VISIBILITY.

    * = STATION DOES NOT REPORT PRECIPITATION (E.G. RAIN, SNOW, ETC.)
           THUNDER OR FOG.

   NJZ015>026-210700-
   SOUTHERN NEW JERSEY
   CITY           SKY/WX    TMP DP  RH WIND       PRES   REMARKS
   POMONA         PTCLDY    72  67  84 SW8       30.12S
   RIO GRANDE*    PTCLDY    72  66  83 SW7       30.14S
   MILLVILLE      MOCLDY    72  71  97 W5        30.12F        
   ...

Again, you can use the "last" parameter to only print the latest report. 

You can use wildcard characters such as "." for a single match and "*" for multiple match.  But be careful since you may find out you get a ton of data.

To check what WMO headers match, rather than printing the entire product:

   % parse -cu=la -ph=FPUS5 -pa=hdr
   ** FPUS53 KMPX 210601 AMD ***
   ** FPUS55 KPUB 210607 ***
   ** FPUS53 KFSD 210618 AMD ***
   ** FPUS53 KGID 210629 AMD ***
   ** FPUS53 KLBF 210649 AMD ***
   ** FPUS53 KLSX 210657 AMD ***
   ** FPUS51 KGYX 210700 ***
   ** FPUS51 KRLX 210702 ***
   ** FPUS52 KMHX 210706 ***
   ** FPUS53 KDVN 210715 AMD ***
   ** FPUS53 KDVN 210717 AMD ***
   ** FPUS51 KAKQ 210722 ***
   ** FPUS52 KCHS 210725 ***
   ** FPUS54 KMRX 210727 ***
   ** FPUS51 KBUF 210730 ***
   ** FPUS51 KLWX 210733 ***
   ** FPUS51 KBGM 210735 ***
   ** FPUS51 KALY 210735 ***
   ** FPUS53 KLMK 210739 ***
   ** FPUS52 KCAE 210740 ***
   ** FPUS51 KCLE 210740 ***

This will give you a chance to check what's there and then parse for a specific product.  Another example:

   % parse -cu=la -ph='*_KPHI' -if=for_txt -pa=hdr
   ** FXUS61 KPHI 210703 ***
   ** FZUS51 KPHI 210729 ***
   ** FRUS41 KPHI 210735 ***
   ** FRUS41 KPHI 210735 ***
   ** FPUS41 KPHI 210742 ***
   ** FPUS51 KPHI 210742 ***

In this example, the "*" matches everything.  You will not get EVERY product from KPHI, only those within a specific file.  In addition, this will not work with the automatic WMO conversion.  So you must specify the input file type.

Surface Data Parsing



For further information about WXP, email technical-support@weather.unisys.com
Last updated by Dan Vietor on June 20, 2001