Data processing and file formats

Within this book you will find two chapters. The first covers the processing that takes place to generate the data that is ultimately received by users of the CMDS. The second chapter goes in to the format of these data files, and the information that can be extracted from them.

1. Data formats and processing methodology: OLCI

Data levels

Within remote sensing and it's applications there are a series of levels that are used to define the amount of processing that has been performed to provide a given dataset:

  • Level 0 is the most raw data format available and refers to full resolution data, as it comes from the instrument, with some processing applied to remove artefacts from data communication between the satellite and the ground stations.  It is unlikely you will work with this level of data, especially for more modern sensors, as this data lack information such as geo-referencing and time-referencing ancillary information.

  • Level 1 is further divided into A and B.  Level 1A is the full resolution sensor data with time-referencing, ancillary information including radiometric and geometric calibration coefficients and georeferencing parameters computed and added to the file.  Level 1B is the stage following the application of the parameters appended to the files at L1A (such as instrument calibration coefficients). This level also includes quality and classification flags.

    • e.g. for ocean colour this would be often referred to as the “top of atmosphere” radiance []. 

  • Level 2 refers to derived geophysical variables.  This will have required processing to remove the atmospheric component of the signal, as well as the application of algorithms to measurements to generate other products. This is the level at which many users will use the data, particularly if they are interested in event scale processes that require the highest time and space resolution available from the data stream.

    •  e.g. in the case of ocean colour, the core level 2 product is the water-leaving reflectance, whilst chlorophyll a is the commonly derived product. For OLCI these are available both at full resolution (300m pixel size) and reduced resolution (1km pixel size).
  • Levels 3 and 4 refer to binned and or merged/filled versions of the level 2 data for a given spatial or temporal resolution. The level 2 data from the CMDS feeds in to a further component of Copernicus (the Copernicus Marine Environmental Monitoring Service - CMEMS). This data can be useful for those wanting better spatial coverage, and those looking at marine processes that happen over longer time periods. In particular merged data can be used to generate long time series from multiple sensors, which is essential for climate studies. 

The flow of data through these levels is illustrated in Figure 1.1 (taken from National Research Council, 2011).  Obviously, as each layer builds on the processing that goes before it is of great importance to ensure that the early stages (such as calibration and atmospheric correction) are performed with a great deal of care and attention in order to ensure usable products for those wishing to implement algorithms at level 2 or beyond.  The  International Ocean Colour Coordinating Group (IOCCG) provide a wealth of information on the processing and uses of ocean colour data but of particular utility for information relating to data processing are IOCCG (1998, 2010, 2004).


FIGURE 1.1 Ocean colour radiance is used to derive products directly or indirectly. Secondary products are based on the primary products and ancillary data. These products are then used to address scientific and societal questions. Some satellite missions apply the vicarious calibration when processing Level 2 data. (CDOM: Coloured Dissolved Organic Matter; PAR: Photosynthetically Available Radiance; PIC: Particulate Inorganic Carbon; POC: Particulate Organic Carbon; K490: diffuse attenuation coefficient at 490 nm; HAB: Harmful Algal Bloom).


Processing methodology

Atmospheric correction

The objective of ocean colour sensors is to retrieve the spectral distribution of upwelling radiance just above the sea surface, which is termed the water-leaving radiance (Lw). However, the sensors actually measure the Top Of Atmosphere (TOA) radiance (Lt) and so the contribution resulting from processes such as the atmosphere’s scattering and absorption needs to be accounted for – termed Atmospheric Correction (AC).


Lt(λ) = Lr)+La(λ)+Lra(λ)+t(λ)Lwc(λ)+T(λ)Lg(λ)+t(λ)t0(λ)cos(θ0)Lwn(λ)          (Eq. 1)


where Lr, La, Lra, Lwc, and Lg are radiance due to Rayleigh scattering, aerosol scattering, interaction between aerosols and molecules, white caps, and glint respectively. The terms t and t0 are diffuse transmittances of the atmosphere from the surface to the sensor and from the sun to the surface, T is the direct transmittance from surface to sensor,  θ0 is the solar zenith angle and Lwn(λ) is the normalised water leaving radiance (see equation 4).

Over the first few decades of ocean-colour remote-sensing, various atmospheric correction algorithms were developed and implemented for a growing suite of sensors including OCTS, POLDER-1, SeaWiFS, MODIS-Terra, MODIS-Aqua, MERIS, GLI, POLDER-2, and VIIRS.  It is of note that this computation requires ancillary information such as an estimate of the surface atmospheric pressure (and in some implementations, the surface wind speed is also used).  The primary difference between these atmospheric correction algorithms is how the estimate of La(λ) + Lra(λ) in the visible wavelengths is derived from the estimate of La(λ) + Lra(λ) in the near infrared (NIR).  Recently, there have been attempts to apply more complicated techniques to the problem of atmospheric correction, including spectral optimization (Shanmugam and Ahn 2007, Kuchink et al. 2009, Steinmetz et al. 2011) and neural networks (Schiller and Doerffer 1999, Brockmann et al. 2016), for particularly difficult areas (such as case 2 waters or regions of high glint).  In these methods the atmospheric and oceanic parameters are retrieved simultaneously rather than sequentially.

Atmospheric correction for ocean colour data remains challenging (IOCCG, 2010) as only about 10% of the radiance measured by a satellite instrument in visible blue and green wavelengths originates from the water surface.  This dictates that the sensors require a high signal to noise ratio (SNR), particularly for the ‘blue’ bands (~ 400 nm).

Once corrected for the influence of the atmosphere, the water-leaving radiance is converted to remote-sensing reflectance (Rrs in Eq. 1, where Es is the surface solar irradiance) or water-leaving reflectance (ρw Eq. 2). Then, there can be a further conversion to normalised water-leaving radiance (Lwn(λ) Eq. 3, where E0 is the average extra-atmospheric solar irradiance), which equates to a situation where that would exist if there were no atmosphere, and the sensor was at the nadir (directly above the point being viewed).


Rrs(λ) = Lw(λ)/Es(λ)                    (Eq. 2)

ρw(λ) = π Lw(λ)/Es(λ)                 (Eq. 3)

Lwn(λ) = Rrs(λ)E0(λ)                    (Eq. 4)


For OLCI, the ‘Baseline AC’ (BAC) algorithm is based on the algorithm developed for MERIS (Antoine and Morel, 1999), to ensure consistency between the two instruments' records. As it was designed for Case 1 waters, with a spectral signature dominated by phytoplankton pigments, then there is also a Bright Pixel AC (BPAC) integrated within it (Moore et al. 1999). The BPAC accounts for situations where the Near Infrared (NIR) water-leaving radiance is not negligible i.e., high scattering waters where there’s a high Chl and/or TSM concentration. The algorithm also provides aerosol optical depth (T865) and Ångström exponent (A865) that are calculated as part of the AC process, and indicate the AC’s success in estimating and subtracting the aerosol’s contribution; if the AC is working correctly then the atmospheric by-products shouldn’t show contamination from marine features.

Alternatively, for turbid Case 2 waters that are significantly influenced by CDOM and/or TSM, there is an artificial neural network algorithm (C2RCC); developed from work on algorithms for MERIS (Doerffer and Schiller, 2007).


Primary and secondary products

Once the surface reflectances and radiances have been calculated they can be used to estimate geophysical parameters through the application of specific bio-optical algorithms e.g. estimates of phytoplankton biomass through determining the Chlorophyll-a (Chl) concentration.

Again, taking OLCI as an example the algal pigment concentration is then estimated using the Ocean Colour for MERIS (OC4Me) algorithm developed by Morel et al. (2007).  As with the atmospheric correction the best algorithm for a particular product of interest can depend upon the sensor used, the water type (such as case one or case two), the product itself (some algorithms are only designed to produce Chl products, while others estimate inherent optical properties) and even the atmospheric correction scheme used.  Approaches range from simple band ratio algorithms to neural networks. Additionally there has recently been examples of improved product performance through the use of blended products (Jackson et al. 2017), where multiple algorithms are implemented and the final product is a weighted average, based on some other information such as a spectral classification (Moore et al. 2014).


File Names

A lot of thought goes into file naming conventions to provide useful information to the users without them even having to open the files.  Consider an OLCI file for example for example:


In this example each underscore separates a field of information.

  • S3A is the mission (Sentinel 3A)

  • OL is the sensor (OLCI)

  • 1 is the processing level

  • EFR is the Data type (“EFR___” = L1B TOA radiances at Full Resolution, "ERR___” = L1B TOA radiances at RR, “WFR___” = L2 FR water-leaving reflectance, ocean colour and atmospheric by-products, “WRR___” = L2 RR water-leaving reflectance, ocean colour and atmospheric by-products)

  • 20160509T103945 is the date and time (follows the T) of the start of data acquisition

  • 20160509T104245 is the date and time (follows the T) of the end of data acquisition

  • 20160509T124907 is the date and time (follows the T) of the file creation

  • 0180, 004, 051 and 1979 refer to duration, cycle number, relative orbit number and frame along track coordinate

  • MAR refers to the processing centre (MAR = Marine (EUMETSAT)

  • O means operational (a number of other possibilities include F for reference, D for development and R for reprocessing)

  • NR means “Near Real time” processing (the most rapidly available data released; other possibilities include NT - non time critical, which includes the most up to date meteorological data used for processing).

  • 001 is the baseline collection

  • Finally .SEN3 is the filename extension (which data viewing programs etc may use for opening the file correctly).


File types

The file types listed below are the most common storage architecture for all sentinel-3 products, irrespective of sensor. While the discussion below refers to an ocean colour perspective, the information remains valid for both SLSTR and Altimetry domains.


NetCDF (Network Common Data Format)

These are self describing files that are commonly used for array-oriented scientific data.  This is probably the most common file type you will come across while working with ocean color data.


HDF (Hierarchical Data Format)

This includes a set of file formats (e.g HDF4, HDF5) designed to store and organize large amounts of data. NetCDF users can create HDF5 files with benefits not available with the netCDF format, such as much larger files and multiple unlimited dimensions.


xml (Extensible Markup Language)

These files use a markup language that allows both human and machine readability of document encoding for textual data.  Although xml was designed for use with documents, the language is widely used for the representation of data structures.  You will find that folders of OLCI data can contain an xml file that allows all the various products, flags and metadata to be loaded into a program such as SNAP in a single operation.