HaleNet Data Analysis Procedures

T. Giambelluca
M. Nullet
Geography, U. Hawaii
28 August 2000

Note: special procedures are used to calculate wind direction statistics.

General:

  • The HaleNet data set consists mostly of hourly mean observations, each of which represents up to 360 individual measurements taken during each hour and averaged by the data logger. Values are reported for the hour ending at the given time; e.g., the 0200 (2 AM) statistic is derived from observations during 0100 to 0200 (1 AM to 2 AM).

  • Rainfall is totaled, rather than averaged, by the logger. Since 1992 for windward stations, and since 1999 for leeward stations, rainfall data are archived at 1-hour and 1-minute intervals. Prior to that, rainfall was archived only at the 1-hour interval.

  • Diurnal, daily, and monthly statistics of HaleNet observations are presented in tables and charts.

Diurnal statistics are mean hourly values over a particular month. For example, we present a diurnal table for station 153 for June 2000. For each measured element, e.g. relative humidity, the table has a column of 24 hourly means, each of which is derived from the 30 1-hour observations taken for each hour during the month of June.

Daily statistics are 24-hour mean values for each day of observation.

Monthly statistics are monthly means calculated as the mean of the 24 1-hour means (diurnal statistics) for a given month.

  • Prior to any statistics generation data are subjected to a thorough checking procedure (see below). Bad data are identified and excluded from subsequent statistical analysis.

  • Statistics are reported only when a sufficient number of "good" observations are available for the period in question. A Minimum Acceptable Reporting Percentage (MARP) must be met for a particular statistic to be generated. At present, for diurnal and monthly, for all elements except RF, MARP is 33% (e.g. for a valid value at 1500 in the diurnal analysis for March, an element must have at least 11 of a possible 31 valid data for that hour). For RF a MARP of 50% is required.

  • In the case of diurnal and daily analyses, there is also a MARP of 10% required to generate tables and charts for the period. For example, for the Apr-Jun daily analysis a minimum of 10 days (of a possible 91 days) for at least one element must have valid statistics or the table and charts are not generated.

 


Data Checking Procedure:

We identify and flag bad or suspect data. Data are never discarded, allowing us to re-evaluate the status of any datum at a subsequent time. Flagged data are excluded from all statistical analyses.

  • During field visits, we note sensor or logger malfunctions, sensors out of position, or any interference with sensors. Based on field notes, some data may be flagged as bad. We also flag periods during which sensors were affected by checking and maintenance activities at the station.

  • We screen all data visually by displaying them as a time series plot. Each plot is examined at various time scales and in comparison with the time series of other sensors at the same station and the same sensors at adjacent stations. Unusual patterns or values outside typical ranges are flagged unless supported by other observations.

  • While this painstaking procedure eliminates most of the obviously bad data, some erroneous measurements are certain to remain unnoticed.

 


Diurnal:

  • Except for wind direction and rainfall, all values given are the means of the individual hourly values for the indicated time. e.g. assuming all the data for the month of March 1996 is available, the 31 values for 1500 (1 per day) are averaged to get the 1500 value for the month. If there are not 31 values available (i.e. there are missing data for whatever reason) and the MARP criterion is satisfied the value for the month is the mean of the available data.

  • A special procedure is used to cumpute wind direction statistics.
  • The RF is also calculated as a mean as above, but may be adjusted if there are missing values. (see "Tables" below).

  • What's available:

    • Tables:

      • one per station per month for 1988 - 2000.

      • If the station is missing all data for that month (for whatever reason) the file says something like "no data available" and gives the station, month, etc.

      • If a particular element is missing for the entire month it is not presented in the table (i.e. no "--" for each hour) nor in the legend.

      • If RF is reported (i.e. the MARP condition is satisfied) the last column of the table (heading "Obs%") will indicate the percentage of the possible RF that actually reported. This percentage is used to generate an adjusted RF value if the data is incomplete. The adjusted value is calculated as the mean of the measured values (note that 0 is a measured value) divided by the percentage of data actually reporting. e.g. assume the last 6 days of March 1996 are missing, i.e. there are 25 of a possible 31 data recorded for the 1500 period, and assume that the mean of these 25 values is 4.1 mm. The adjusted RF for 1500, March 1996 would be given as 5.1 mm ( 4.1 / (25/31) ). The "percentage reporting" is shown on the diurnal tables under "Obs%" Since the raingage is generally a fairly reliable instrument, this percentage usually gives an indication of any station downtime (logger or other general malfuction affecting all observations at the site) during the reporting period (this particular "feature" should not be construed as an absolute, however).

      • saved as tab-delimited text files (*.txt)

    • Charts:

      • one per station per month per "element group" for 1988 - present.

      • An "element group" may be a single element (e.g. WD) or two or more "related" elements (e.g. SHF, SHF1, SHF2) on the same chart. They are distinguished by line color and line type.

      • a legend is given if there is more than one element on the chart, otherwise it is omitted.

      • if at least one element in an element group is available, the chart is generated and there is no indication of what might be missing.

      • if all data is missing for all elements in an element group that is supposed to be on a chart a like-named file is generated that says something like "No data available", gives the station name, month and the list of elements that are not available.

      • saved as pdf files (*.pdf)


Daily

  • Except for WD, WSr and RF the reported daily value is the mean of all 24 of the hourly values for the day. 24 values are required (i.e. MARP = 100%) or the statistic is regarded as missing.

  • WD and WSr are calculated as per WD_question.doc and RF is a total.

  • What's available:

  • Tables:

    • one per station per quarter for 1988 - 2000.

    • If the station is missing all data for that quarter (for whatever reason) the file says "No data available" and gives the station name and period (e.g. "Apr96 - Jun96").

    • If a particular element is missing for the entire month it is not presented in the table (i.e. no "--" for each day) nor in the legend.

    • saved as tab-delimited text files (*.txt)

  • Charts:

    • one per station per quarter per "element group" for 1988 - 2000.

    • An "element group" may be a single element (e.g. WD) or two or more "related" elements (e.g. SHF, SHF1, SHF2) on the same chart. They are distinguished by line color and line type.

    • a legend is given if there is more than one element on the chart, otherwise it is omitted.

    • if at least one element in an element group is available, the chart is generated and there is no indication of what might be missing.

    • if all data is missing for all elements in an element group that is supposed to be on a chart a like-named file is generated that says something like "No data available", gives the station name, quarter and the list of elements that are not available.

    • saved as pdf files (*.pdf)


Monthly

  • Except for WD, WSr and RF the reported monthly value is the mean of the 24 diurnal values (i.e. the mean of the 0100, 0200, ... 2400 values) for that month.

  • WD and WSr are calculated as per "WD_question.doc" and RF is a total.

  • What's available:

  • Tables:

    • one per station for the entire period of record.

    • each element has its own "sub-table," one mean for each month of the period of record.

    • the each individual monthly value shown is the mean of the diurnal means for that element for the month. A monthly value is given only if all 24 diurnal values are available.

    • only those years where the element was being measured are presented. e.g. if the station period of record is June 1988 to present and a particular element was only measured from March 1992 through June 1996 only the years 1992 through 1996 are presented.

    • the last row of each "sub-table" (labeled "Mean") is the mean of the valid values for that month for the entire period of record. If there are less than two values for a particular month, this value is not given.

    • the last column of each "sub-table" (labeled "Ann") is the mean of the twelve monthly values for that year. 12 valid values are required or this value is considered missing.

    • the value in the lower right corner (row "Mean", column "Ann") is the mean of the twelve monthly means for the entire period of record. All twelve values are required.

    • the RF table is different:

      • the values for each month are totals rather than means
      • each year has 3 rows labeled "Meas," "Obs%" and "Est."
      • "Meas" is the total rainfall recorded for the month
      • "Obs%" is the percentage of the month that RF was actually measured as described under "Diurnal" above
      • "Est" is the estimated RF based on the RF recorded and "Obs%" (as described under "Diurnal" above).
      • the values in the last column (labeled "Ann") are the totals of the twelve monthly values for "Meas" and "Est" and the mean for "Obs%." Note that all twelve monthly values are required for "Meas" and "Est" or the statistic is considered missing.
      • the values in the last row of the "sub-table" (labeled "Mean") are the means of the estimated RF for that month for the entire period of record.
      • the value in the lower right (row "Mean", column "Ann") is the total of the twelve monthly means (sort of a mean annual total).
  • Charts:

    • one per station per year per element for the period of record

    • each chart shows the data for one element, Jan through Dec for that year and the "long-term mean" for that element. The "long-term mean" is the mean for each month for the period of record (as described for the row labeled "Mean" in the tables).

    • the RF charts are based on the estimated values for both the monthly total and the "long-term mean."

Back to Top
Go back to HaleNet home page.

For further information contact Tom Giambelluca: thomas@hawaii.edu
Page last modified 09 September 2000