lidarpy.preprocessing

This module contains a file lidarpy.preprocessing.preprocess which apply the following corrections to the lidar raw data:

  • Dark measurement [AN only]
  • Dead time correction [PC only]
  • Trigger delay [PC only] [tbi]
  • Zero bin [AN only]
  • Background substraction
  • Merge of polarized channels

The following files contains the depolarization calibation factors and the GHK parameters:

  • lidarpy.preprocessing.depolarization_calibration_mhc
  • lidarpy.preprocessing.depolarization_calibration_alh
  • lidarpy.preprocessing.depolarization_calibration_vlt

Be sure these files are updated.

Gluing:

There are some gluing implementations already. The main two are:

  • lidarpy.preprocessing.gluing_de_la_rosa: Calculates a windowed correlation coefficient in everypoint within a range. Then, selects the highest and with a linear regression scales analog channel to photocounting untis. It works, but needs consistency. For a good adjustment, windows has to be large and it works even better if the adjustment window is even larger. For instance:

win_an = rcs_an_whole[idx, gl_bin - str(hw * 1.5) : gl_bin + hw * 5]
win_pc = rcs_pc_whole[idx, gl_bin - str(hw * 1.5) : gl_bin + hw * 5]

For instance in the default configuration The day /2021/07/05 from MULHACEN has a gluing inconsistency at time profile 217. Neighbours are a bit higher, resulting in a visible vertical line in the quicklook. It is thought that it happens because with larger adjustment windows the regression captures better how the scale (a and b params of the linear regression) despite the noise at higher altitudes.

  • lidarpy.preprocessing.lidar_gluing_bravo_aranda: Calculates a windowed linear regression in everypoing for both channels and within a range. Chooses the point where this is minimum and make a gluing with another window larger than the linear regression. For larger adjustment windows it works better.
 1# [cómo usar restructuretext --> https://www.sphinx-doc.org/es/master/usage/restructuredtext/index.html]
 2
 3from .lidar_preprocessing import preprocess
 4from . import gluing_de_la_rosa, lidar_gluing_bravo_aranda
 5
 6__all__ = ["preprocess", "gluing_de_la_rosa", "lidar_gluing_bravo_aranda"]
 7
 8__doc__ = """
 9This module contains a file `lidarpy.preprocessing.preprocess` which apply the following corrections to the lidar raw data:
10
11* Dark measurement [AN only]
12* Dead time correction [PC only]
13* Trigger delay [PC only] [tbi]
14* Zero bin [AN only]
15* Background substraction
16* Merge of polarized channels
17
18The following files contains the depolarization calibation factors and the GHK parameters:
19* `lidarpy.preprocessing.depolarization_calibration_mhc`
20* `lidarpy.preprocessing.depolarization_calibration_alh`
21* `lidarpy.preprocessing.depolarization_calibration_vlt`
22
23.. warning::
24   Be sure these files are updated.
25
26### Gluing:
27
28There are some gluing implementations already. The main two are:
29- `lidarpy.preprocessing.gluing_de_la_rosa`: Calculates a windowed correlation coefficient in everypoint within a range.
30   Then, selects the highest and with a linear regression scales analog channel to photocounting untis.
31   It works, but needs consistency.  For a good adjustment, windows has to be large and it works even better if the adjustment window is even larger. For instance:
32   ```
33   win_an = rcs_an_whole[idx, gl_bin - str(hw * 1.5) : gl_bin + hw * 5]
34   win_pc = rcs_pc_whole[idx, gl_bin - str(hw * 1.5) : gl_bin + hw * 5]
35   ```
36   For instance in the default configuration The day /2021/07/05 from MULHACEN has a gluing inconsistency at time profile 217.
37   Neighbours are a bit higher, resulting in a visible vertical line in the quicklook.
38   It is thought that it happens because with larger adjustment windows the regression captures better how the scale (a and b params of the linear regression)
39   despite the noise at higher altitudes.
40
41- `lidarpy.preprocessing.lidar_gluing_bravo_aranda`: Calculates a windowed linear regression in everypoing for both channels and within a range.
42   Chooses the point where this is minimum  and make a gluing with another window larger than the linear regression.
43   For larger adjustment windows it works better.
44
45"""
def preprocess( file_or_path: pathlib.Path | str, channels: list[str] | None = None, crop_ranges: tuple[float, float] | None = (0, 20000), background_ranges: tuple[float, float] | None = None, apply_bin: bool = False, bin_factor: int = 2, apply_dc: bool = True, apply_dt: bool = True, apply_bg: bool = True, apply_bz: bool = True, apply_ov: bool = False, save_dc: bool = False, save_bg: bool = False, gluing_products: bool = False, force_dc_in_session: bool = False, overlap_path: pathlib.Path | str | None = None, apply_sm: bool = False, smooth_mode: str | list[str] | None = None, debug: bool = False, debug_max_range: float = 2000.0, **kwargs) -> xarray.core.dataset.Dataset:
104def preprocess(
105    file_or_path: Path | str,
106    channels: list[str] | None = None,
107    crop_ranges: tuple[float, float] | None = (0, 20000),
108    background_ranges: tuple[float, float] | None = None,
109
110    # 1) FIRST: possible rebinning (range-resolution reduction)
111    apply_bin: bool = False,
112    bin_factor: int = 2,
113
114    # 3) MAIN PHYSICAL/INSTRUMENTAL CORRECTIONS
115    apply_dc: bool = True,
116    apply_dt: bool = True,
117    apply_bg: bool = True,
118    apply_bz: bool = True,
119    apply_ov: bool = False,
120    save_dc: bool = False,
121    save_bg: bool = False,
122    gluing_products: bool = False,
123    force_dc_in_session: bool = False,
124    overlap_path: Path | str | None = None,
125
126    # 5) LAST: smoothing
127    apply_sm: bool = False,
128    smooth_mode: str | list[str] | None = None,
129
130    # DEBUG
131    debug: bool = False,
132    debug_max_range: float = 2000.0,
133
134    **kwargs,
135) -> xr.Dataset:
136    """
137    Lidar preprocessing pipeline – **ordered**.
138
139    The idea of this pipeline is: do **geometry/resolution changes first**,
140    then **select the channels** you really want to process, then **apply
141    all instrumental/physical corrections in a strict order**, and only
142    then do **post-processing** (height, cropping, gluing) and finally
143    **smoothing** (because smoothing must see the final, corrected signals).
144
145    Processing order
146    ----------------
147    1. (Optional) **Rebin / range rescale** (`apply_bin`, `bin_factor`)
148       - This changes the vertical resolution and MUST be done before
149         any correction that depends on the range grid (DC, BG, overlap).
150       - If the main dataset is rebinned, every auxiliary dataset
151         used later (DC, overlap) must be rebinned to the same resolution.
152
153    2. **Channel selection + INFO update**
154       - Decide which channels to keep.
155       - Add any required/product channels defined in the system INFO.
156       - Drop variables for channels you do not want to process.
157       - Update dataset from metadata/header, if present.
158
159    3. **Main corrections (strict order!)**
160       - DC  (dark current)          → removes dark offset, usually analog
161       - DT  (dead-time correction)  → for photon-counting channels
162       - BG  (background subtraction)→ removes far-range night-sky background
163       - BZ  (bin-zero correction)   → shifts profiles to correct laser–detector delay
164       - OV  (overlap correction)    → corrects geometric overlap losses
165       The order DC → DT → BG → BZ → OV is important because:
166         * DC must be removed before everything else.
167         * DT must act on the *raw photon count*.
168         * BG must act on a *physically meaningful* profile.
169         * BZ shifts the profile in range, so it must be applied
170           when the signal is already corrected.
171         * OV is usually the last purely range-dependent correction.
172
173    4. **Post-processing**
174       - Crop to the desired range (for plots/products).
175       - Optionally glue analog & photon-counting channels.
176       - Add height coordinate from range, taking zenith angle into account.
177
178    5. (Optional) **Smoothing**
179       - Apply only at the end, on the fully corrected product.
180       - Several modes are available: gaussian, moving, sliding, ATLAS, adaptive_sliding.
181       - Gaussian smoothing uses `scipy.ndimage.gaussian_filter1d` **along `range`**.
182
183    Debugging
184    ---------
185    If `debug=True`, the function will print a small snapshot of the signals
186    (time=0, range=0–`debug_max_range`) **after every step** above, in the
187    same order that is documented here. That way you can verify that each
188    correction is doing what you expect.
189
190    Parameters
191    ----------
192    file_or_path : Path | str
193        NetCDF file to be opened.
194    channels : list[str] | None
195        Channels to keep/process. If None, all channels in the file are used.
196    crop_ranges : (float, float) | None
197        Range interval to keep *after* corrections.
198    background_ranges : (float, float) | None
199        Altitude interval for background estimation; if None, the values
200        from the dataset attributes are used.
201    apply_* : bool
202        Switches for every step in the pipeline.
203    apply_sm : bool
204        If True, smoothing is applied as the *last* step.
205    smooth_mode : str | list[str] | None
206        One or multiple smoothing modes to apply, in order.
207        Passed to `apply_smooth`.
208    debug : bool
209        If True, show signal snapshots after every step.
210    debug_max_range : float
211        Upper limit (m) of range shown in debug snapshots.
212
213    Returns
214    -------
215    xr.Dataset
216        Fully preprocessed lidar dataset.
217    """
218    p = Path(file_or_path)
219    if not p.exists():
220        raise ValueError(f"{file_or_path} not found")
221
222    # -------------------------------------------------------------
223    # Read lidar info from filename and associated YAML
224    # -------------------------------------------------------------
225    lidar_nick, _, _, _, _, date = filename2info(p.name)
226    global INFO
227    INFO = read_yaml_from_info(lidar_nick, date)
228
229    # open dataset (no lazy chunks here, we want to modify it later)
230    # Load fully into memory to avoid keeping file handles open and
231    # calling into netCDF/HDF5 libraries from multiple threads later.
232    dataset = xr.open_dataset(p, chunks={}, engine="netcdf4").load()
233
234    if debug:
235        _debug_dump_signals(dataset, "0-open", channels, max_range=debug_max_range)
236
237    # =============================================================
238    # 1) REBIN (FIRST STEP)
239    # =============================================================
240    if apply_bin and bin_factor > 1:
241        logger.info(
242            f"Applying range bin rescale: factor={bin_factor} → "
243            f"{bin_factor * BIN_RES_M:.2f} m"
244        )
245        # sum para cuentas (canales *p), mean para el resto
246        var_aggs = _build_var_aggs_for_binning(dataset)
247
248        dataset = bin_rescale(
249            dataset,
250            bin_factor,
251            default_agg="mean",
252            var_aggs=var_aggs,
253            bin_coord="center",  # o "left"/"right" si prefieres ejes de borde
254        )
255    else:
256        # guarda resolución actual aunque no se rebine
257        dataset.attrs["rebin_factor"] = 1
258        dataset.attrs["range_resolution_m"] = float(BIN_RES_M)
259
260    if debug:
261        _debug_dump_signals(dataset, "1-rebin", channels, max_range=debug_max_range)
262
263    # =============================================================
264    # 2) CHANNEL SELECTION / INFO UPDATE
265    # =============================================================
266    if channels is None:
267        # process all channels found in the file
268        channels = dataset.channel.values.tolist()
269        raw_channels = channels
270    else:
271        # ensure we also keep the "required" channels defined in the info file
272        channels = add_required_channels(lidar_nick, channels, date)
273        # product channels must not be dropped, but raw signals might be
274        raw_channels = [
275            ch for ch in channels if ch not in INFO["product_channels"].keys()
276        ]
277        dataset = drop_unwanted_channels(dataset, keep_channels=raw_channels)
278
279    # update variables/attrs from INFO->header, if present
280    if "header" in INFO.keys():
281        dataset = update_from_info(dataset, INFO["header"])
282
283    if debug:
284        _debug_dump_signals(dataset, "2-ch-selection", channels, max_range=debug_max_range)
285
286    # =============================================================
287    # 3) MAIN CORRECTIONS – in fixed order
288    # =============================================================
289    # 3.1 Dark current
290    if apply_dc:
291        dataset = apply_dark_current_correction(
292            dataset,
293            rs_path=p,
294            save_dc=save_dc,
295            force_dc_in_session=force_dc_in_session,
296        )
297    else:
298        dataset.attrs["dc_corrected"] = str(False)
299
300    if debug:
301        _debug_dump_signals(dataset, "3.1-after-DC", channels, max_range=debug_max_range)
302
303    # 3.2 Dead-time (photon-counting channels)
304    if apply_dt:
305        dataset = apply_dead_time_correction(dataset)
306    else:
307        dataset.attrs["dt_corrected"] = str(False)
308
309    if debug:
310        _debug_dump_signals(dataset, "3.2-after-DT", channels, max_range=debug_max_range)
311
312    # 3.3 Background (far-range) subtraction
313    if apply_bg:
314        dataset = apply_background_correction(
315            dataset,
316            background_ranges=background_ranges,
317            save_bg=save_bg,
318        )
319    else:
320        dataset.attrs["bg_corrected"] = str(False)
321
322    if debug:
323        _debug_dump_signals(dataset, "3.3-after-BG", channels, max_range=debug_max_range)
324
325    # 3.4 Bin-zero (range shift) correction
326    if apply_bz:
327        dataset = apply_bin_zero_correction(dataset, rs_path=p)
328    else:
329        dataset.attrs["bz_corrected"] = str(False)
330
331    if debug:
332        _debug_dump_signals(dataset, "3.4-after-BZ", channels, max_range=debug_max_range)
333
334    # 3.5 Overlap correction (optional)
335    if apply_ov:
336        dataset = apply_overlap_correction(dataset, overlap_path)
337        dataset.attrs["ov_corrected"] = "True"
338    else:
339        dataset.attrs["ov_corrected"] = "False"
340
341    if debug:
342        _debug_dump_signals(dataset, "3.5-after-OV", channels, max_range=debug_max_range)
343
344    # =============================================================
345    # 4) POST PROCESSING
346    # =============================================================
347    # 4.1 Crop to user-defined range
348    dataset = apply_crop_ranges_correction(dataset, crop_ranges=crop_ranges)
349
350    if debug:
351        _debug_dump_signals(dataset, "4.1-after-crop", channels, max_range=debug_max_range)
352
353    # 4.2 Optional: merge / glue detection modes (analog + pc → g)
354    if gluing_products:
355        dataset = apply_detection_mode_merge(dataset)
356        if debug:
357            # show also newly created glued channels
358            _debug_dump_signals(dataset, "4.2-after-gluing", None, max_range=debug_max_range)
359
360    # 4.3 Add height coordinate (range * cos(zenith))
361    dataset = add_height(dataset)
362
363    if debug:
364        _debug_dump_signals(dataset, "4.3-after-height", channels, max_range=debug_max_range)
365
366    # =============================================================
367    # 5) SMOOTHING (LAST)
368    # =============================================================
369    if apply_sm:
370        logger.info(f"Applying smoothing: mode={smooth_mode}")
371        dataset = apply_smooth(
372            dataset,
373            smooth_mode=smooth_mode,
374            debug=debug,
375            debug_max_range=debug_max_range,
376            **kwargs,
377        )
378    else:
379        logger.info("No smoothing applied.")
380
381    if debug:
382        _debug_dump_signals(dataset, "5-after-smoothing", channels, max_range=debug_max_range)
383
384    return dataset

Lidar preprocessing pipeline – ordered.

The idea of this pipeline is: do geometry/resolution changes first, then select the channels you really want to process, then apply all instrumental/physical corrections in a strict order, and only then do post-processing (height, cropping, gluing) and finally smoothing (because smoothing must see the final, corrected signals).

Processing order

  1. (Optional) Rebin / range rescale (apply_bin, bin_factor)

    • This changes the vertical resolution and MUST be done before any correction that depends on the range grid (DC, BG, overlap).
    • If the main dataset is rebinned, every auxiliary dataset used later (DC, overlap) must be rebinned to the same resolution.
  2. Channel selection + INFO update

    • Decide which channels to keep.
    • Add any required/product channels defined in the system INFO.
    • Drop variables for channels you do not want to process.
    • Update dataset from metadata/header, if present.
  3. Main corrections (strict order!)

    • DC (dark current) → removes dark offset, usually analog
    • DT (dead-time correction) → for photon-counting channels
    • BG (background subtraction)→ removes far-range night-sky background
    • BZ (bin-zero correction) → shifts profiles to correct laser–detector delay
    • OV (overlap correction) → corrects geometric overlap losses The order DC → DT → BG → BZ → OV is important because:
      • DC must be removed before everything else.
      • DT must act on the raw photon count.
      • BG must act on a physically meaningful profile.
      • BZ shifts the profile in range, so it must be applied when the signal is already corrected.
      • OV is usually the last purely range-dependent correction.
  4. Post-processing

    • Crop to the desired range (for plots/products).
    • Optionally glue analog & photon-counting channels.
    • Add height coordinate from range, taking zenith angle into account.
  5. (Optional) Smoothing

    • Apply only at the end, on the fully corrected product.
    • Several modes are available: gaussian, moving, sliding, ATLAS, adaptive_sliding.
    • Gaussian smoothing uses scipy.ndimage.gaussian_filter1d along range.

Debugging

If debug=True, the function will print a small snapshot of the signals (time=0, range=0–debug_max_range) after every step above, in the same order that is documented here. That way you can verify that each correction is doing what you expect.

Parameters

file_or_path : Path | str NetCDF file to be opened. channels : list[str] | None Channels to keep/process. If None, all channels in the file are used. crop_ranges : (float, float) | None Range interval to keep *after* corrections. background_ranges : (float, float) | None Altitude interval for background estimation; if None, the values from the dataset attributes are used. apply_* : bool Switches for every step in the pipeline. apply_sm : bool If True, smoothing is applied as the last step. smooth_mode : str | list[str] | None One or multiple smoothing modes to apply, in order. Passed to apply_smooth. debug : bool If True, show signal snapshots after every step. debug_max_range : float Upper limit (m) of range shown in debug snapshots.

Returns

xr.Dataset Fully preprocessed lidar dataset.