lidarpy.preprocessing
This module contains a file lidarpy.preprocessing.preprocess which apply the following corrections to the lidar raw data:
- Dark measurement [AN only]
- Dead time correction [PC only]
- Trigger delay [PC only] [tbi]
- Zero bin [AN only]
- Background substraction
- Merge of polarized channels
The following files contains the depolarization calibation factors and the GHK parameters:
lidarpy.preprocessing.depolarization_calibration_mhclidarpy.preprocessing.depolarization_calibration_alhlidarpy.preprocessing.depolarization_calibration_vlt
Be sure these files are updated.
Gluing:
There are some gluing implementations already. The main two are:
lidarpy.preprocessing.gluing_de_la_rosa: Calculates a windowed correlation coefficient in everypoint within a range. Then, selects the highest and with a linear regression scales analog channel to photocounting untis. It works, but needs consistency. For a good adjustment, windows has to be large and it works even better if the adjustment window is even larger. For instance:
win_an = rcs_an_whole[idx, gl_bin - str(hw * 1.5) : gl_bin + hw * 5]
win_pc = rcs_pc_whole[idx, gl_bin - str(hw * 1.5) : gl_bin + hw * 5]
For instance in the default configuration The day /2021/07/05 from MULHACEN has a gluing inconsistency at time profile 217. Neighbours are a bit higher, resulting in a visible vertical line in the quicklook. It is thought that it happens because with larger adjustment windows the regression captures better how the scale (a and b params of the linear regression) despite the noise at higher altitudes.
lidarpy.preprocessing.lidar_gluing_bravo_aranda: Calculates a windowed linear regression in everypoing for both channels and within a range. Chooses the point where this is minimum and make a gluing with another window larger than the linear regression. For larger adjustment windows it works better.
1# [cómo usar restructuretext --> https://www.sphinx-doc.org/es/master/usage/restructuredtext/index.html] 2 3from .lidar_preprocessing import preprocess 4from . import gluing_de_la_rosa, lidar_gluing_bravo_aranda 5 6__all__ = ["preprocess", "gluing_de_la_rosa", "lidar_gluing_bravo_aranda"] 7 8__doc__ = """ 9This module contains a file `lidarpy.preprocessing.preprocess` which apply the following corrections to the lidar raw data: 10 11* Dark measurement [AN only] 12* Dead time correction [PC only] 13* Trigger delay [PC only] [tbi] 14* Zero bin [AN only] 15* Background substraction 16* Merge of polarized channels 17 18The following files contains the depolarization calibation factors and the GHK parameters: 19* `lidarpy.preprocessing.depolarization_calibration_mhc` 20* `lidarpy.preprocessing.depolarization_calibration_alh` 21* `lidarpy.preprocessing.depolarization_calibration_vlt` 22 23.. warning:: 24 Be sure these files are updated. 25 26### Gluing: 27 28There are some gluing implementations already. The main two are: 29- `lidarpy.preprocessing.gluing_de_la_rosa`: Calculates a windowed correlation coefficient in everypoint within a range. 30 Then, selects the highest and with a linear regression scales analog channel to photocounting untis. 31 It works, but needs consistency. For a good adjustment, windows has to be large and it works even better if the adjustment window is even larger. For instance: 32 ``` 33 win_an = rcs_an_whole[idx, gl_bin - str(hw * 1.5) : gl_bin + hw * 5] 34 win_pc = rcs_pc_whole[idx, gl_bin - str(hw * 1.5) : gl_bin + hw * 5] 35 ``` 36 For instance in the default configuration The day /2021/07/05 from MULHACEN has a gluing inconsistency at time profile 217. 37 Neighbours are a bit higher, resulting in a visible vertical line in the quicklook. 38 It is thought that it happens because with larger adjustment windows the regression captures better how the scale (a and b params of the linear regression) 39 despite the noise at higher altitudes. 40 41- `lidarpy.preprocessing.lidar_gluing_bravo_aranda`: Calculates a windowed linear regression in everypoing for both channels and within a range. 42 Chooses the point where this is minimum and make a gluing with another window larger than the linear regression. 43 For larger adjustment windows it works better. 44 45"""
104def preprocess( 105 file_or_path: Path | str, 106 channels: list[str] | None = None, 107 crop_ranges: tuple[float, float] | None = (0, 20000), 108 background_ranges: tuple[float, float] | None = None, 109 110 # 1) FIRST: possible rebinning (range-resolution reduction) 111 apply_bin: bool = False, 112 bin_factor: int = 2, 113 114 # 3) MAIN PHYSICAL/INSTRUMENTAL CORRECTIONS 115 apply_dc: bool = True, 116 apply_dt: bool = True, 117 apply_bg: bool = True, 118 apply_bz: bool = True, 119 apply_ov: bool = False, 120 save_dc: bool = False, 121 save_bg: bool = False, 122 gluing_products: bool = False, 123 force_dc_in_session: bool = False, 124 overlap_path: Path | str | None = None, 125 126 # 5) LAST: smoothing 127 apply_sm: bool = False, 128 smooth_mode: str | list[str] | None = None, 129 130 # DEBUG 131 debug: bool = False, 132 debug_max_range: float = 2000.0, 133 134 **kwargs, 135) -> xr.Dataset: 136 """ 137 Lidar preprocessing pipeline – **ordered**. 138 139 The idea of this pipeline is: do **geometry/resolution changes first**, 140 then **select the channels** you really want to process, then **apply 141 all instrumental/physical corrections in a strict order**, and only 142 then do **post-processing** (height, cropping, gluing) and finally 143 **smoothing** (because smoothing must see the final, corrected signals). 144 145 Processing order 146 ---------------- 147 1. (Optional) **Rebin / range rescale** (`apply_bin`, `bin_factor`) 148 - This changes the vertical resolution and MUST be done before 149 any correction that depends on the range grid (DC, BG, overlap). 150 - If the main dataset is rebinned, every auxiliary dataset 151 used later (DC, overlap) must be rebinned to the same resolution. 152 153 2. **Channel selection + INFO update** 154 - Decide which channels to keep. 155 - Add any required/product channels defined in the system INFO. 156 - Drop variables for channels you do not want to process. 157 - Update dataset from metadata/header, if present. 158 159 3. **Main corrections (strict order!)** 160 - DC (dark current) → removes dark offset, usually analog 161 - DT (dead-time correction) → for photon-counting channels 162 - BG (background subtraction)→ removes far-range night-sky background 163 - BZ (bin-zero correction) → shifts profiles to correct laser–detector delay 164 - OV (overlap correction) → corrects geometric overlap losses 165 The order DC → DT → BG → BZ → OV is important because: 166 * DC must be removed before everything else. 167 * DT must act on the *raw photon count*. 168 * BG must act on a *physically meaningful* profile. 169 * BZ shifts the profile in range, so it must be applied 170 when the signal is already corrected. 171 * OV is usually the last purely range-dependent correction. 172 173 4. **Post-processing** 174 - Crop to the desired range (for plots/products). 175 - Optionally glue analog & photon-counting channels. 176 - Add height coordinate from range, taking zenith angle into account. 177 178 5. (Optional) **Smoothing** 179 - Apply only at the end, on the fully corrected product. 180 - Several modes are available: gaussian, moving, sliding, ATLAS, adaptive_sliding. 181 - Gaussian smoothing uses `scipy.ndimage.gaussian_filter1d` **along `range`**. 182 183 Debugging 184 --------- 185 If `debug=True`, the function will print a small snapshot of the signals 186 (time=0, range=0–`debug_max_range`) **after every step** above, in the 187 same order that is documented here. That way you can verify that each 188 correction is doing what you expect. 189 190 Parameters 191 ---------- 192 file_or_path : Path | str 193 NetCDF file to be opened. 194 channels : list[str] | None 195 Channels to keep/process. If None, all channels in the file are used. 196 crop_ranges : (float, float) | None 197 Range interval to keep *after* corrections. 198 background_ranges : (float, float) | None 199 Altitude interval for background estimation; if None, the values 200 from the dataset attributes are used. 201 apply_* : bool 202 Switches for every step in the pipeline. 203 apply_sm : bool 204 If True, smoothing is applied as the *last* step. 205 smooth_mode : str | list[str] | None 206 One or multiple smoothing modes to apply, in order. 207 Passed to `apply_smooth`. 208 debug : bool 209 If True, show signal snapshots after every step. 210 debug_max_range : float 211 Upper limit (m) of range shown in debug snapshots. 212 213 Returns 214 ------- 215 xr.Dataset 216 Fully preprocessed lidar dataset. 217 """ 218 p = Path(file_or_path) 219 if not p.exists(): 220 raise ValueError(f"{file_or_path} not found") 221 222 # ------------------------------------------------------------- 223 # Read lidar info from filename and associated YAML 224 # ------------------------------------------------------------- 225 lidar_nick, _, _, _, _, date = filename2info(p.name) 226 global INFO 227 INFO = read_yaml_from_info(lidar_nick, date) 228 229 # open dataset (no lazy chunks here, we want to modify it later) 230 # Load fully into memory to avoid keeping file handles open and 231 # calling into netCDF/HDF5 libraries from multiple threads later. 232 dataset = xr.open_dataset(p, chunks={}, engine="netcdf4").load() 233 234 if debug: 235 _debug_dump_signals(dataset, "0-open", channels, max_range=debug_max_range) 236 237 # ============================================================= 238 # 1) REBIN (FIRST STEP) 239 # ============================================================= 240 if apply_bin and bin_factor > 1: 241 logger.info( 242 f"Applying range bin rescale: factor={bin_factor} → " 243 f"{bin_factor * BIN_RES_M:.2f} m" 244 ) 245 # sum para cuentas (canales *p), mean para el resto 246 var_aggs = _build_var_aggs_for_binning(dataset) 247 248 dataset = bin_rescale( 249 dataset, 250 bin_factor, 251 default_agg="mean", 252 var_aggs=var_aggs, 253 bin_coord="center", # o "left"/"right" si prefieres ejes de borde 254 ) 255 else: 256 # guarda resolución actual aunque no se rebine 257 dataset.attrs["rebin_factor"] = 1 258 dataset.attrs["range_resolution_m"] = float(BIN_RES_M) 259 260 if debug: 261 _debug_dump_signals(dataset, "1-rebin", channels, max_range=debug_max_range) 262 263 # ============================================================= 264 # 2) CHANNEL SELECTION / INFO UPDATE 265 # ============================================================= 266 if channels is None: 267 # process all channels found in the file 268 channels = dataset.channel.values.tolist() 269 raw_channels = channels 270 else: 271 # ensure we also keep the "required" channels defined in the info file 272 channels = add_required_channels(lidar_nick, channels, date) 273 # product channels must not be dropped, but raw signals might be 274 raw_channels = [ 275 ch for ch in channels if ch not in INFO["product_channels"].keys() 276 ] 277 dataset = drop_unwanted_channels(dataset, keep_channels=raw_channels) 278 279 # update variables/attrs from INFO->header, if present 280 if "header" in INFO.keys(): 281 dataset = update_from_info(dataset, INFO["header"]) 282 283 if debug: 284 _debug_dump_signals(dataset, "2-ch-selection", channels, max_range=debug_max_range) 285 286 # ============================================================= 287 # 3) MAIN CORRECTIONS – in fixed order 288 # ============================================================= 289 # 3.1 Dark current 290 if apply_dc: 291 dataset = apply_dark_current_correction( 292 dataset, 293 rs_path=p, 294 save_dc=save_dc, 295 force_dc_in_session=force_dc_in_session, 296 ) 297 else: 298 dataset.attrs["dc_corrected"] = str(False) 299 300 if debug: 301 _debug_dump_signals(dataset, "3.1-after-DC", channels, max_range=debug_max_range) 302 303 # 3.2 Dead-time (photon-counting channels) 304 if apply_dt: 305 dataset = apply_dead_time_correction(dataset) 306 else: 307 dataset.attrs["dt_corrected"] = str(False) 308 309 if debug: 310 _debug_dump_signals(dataset, "3.2-after-DT", channels, max_range=debug_max_range) 311 312 # 3.3 Background (far-range) subtraction 313 if apply_bg: 314 dataset = apply_background_correction( 315 dataset, 316 background_ranges=background_ranges, 317 save_bg=save_bg, 318 ) 319 else: 320 dataset.attrs["bg_corrected"] = str(False) 321 322 if debug: 323 _debug_dump_signals(dataset, "3.3-after-BG", channels, max_range=debug_max_range) 324 325 # 3.4 Bin-zero (range shift) correction 326 if apply_bz: 327 dataset = apply_bin_zero_correction(dataset, rs_path=p) 328 else: 329 dataset.attrs["bz_corrected"] = str(False) 330 331 if debug: 332 _debug_dump_signals(dataset, "3.4-after-BZ", channels, max_range=debug_max_range) 333 334 # 3.5 Overlap correction (optional) 335 if apply_ov: 336 dataset = apply_overlap_correction(dataset, overlap_path) 337 dataset.attrs["ov_corrected"] = "True" 338 else: 339 dataset.attrs["ov_corrected"] = "False" 340 341 if debug: 342 _debug_dump_signals(dataset, "3.5-after-OV", channels, max_range=debug_max_range) 343 344 # ============================================================= 345 # 4) POST PROCESSING 346 # ============================================================= 347 # 4.1 Crop to user-defined range 348 dataset = apply_crop_ranges_correction(dataset, crop_ranges=crop_ranges) 349 350 if debug: 351 _debug_dump_signals(dataset, "4.1-after-crop", channels, max_range=debug_max_range) 352 353 # 4.2 Optional: merge / glue detection modes (analog + pc → g) 354 if gluing_products: 355 dataset = apply_detection_mode_merge(dataset) 356 if debug: 357 # show also newly created glued channels 358 _debug_dump_signals(dataset, "4.2-after-gluing", None, max_range=debug_max_range) 359 360 # 4.3 Add height coordinate (range * cos(zenith)) 361 dataset = add_height(dataset) 362 363 if debug: 364 _debug_dump_signals(dataset, "4.3-after-height", channels, max_range=debug_max_range) 365 366 # ============================================================= 367 # 5) SMOOTHING (LAST) 368 # ============================================================= 369 if apply_sm: 370 logger.info(f"Applying smoothing: mode={smooth_mode}") 371 dataset = apply_smooth( 372 dataset, 373 smooth_mode=smooth_mode, 374 debug=debug, 375 debug_max_range=debug_max_range, 376 **kwargs, 377 ) 378 else: 379 logger.info("No smoothing applied.") 380 381 if debug: 382 _debug_dump_signals(dataset, "5-after-smoothing", channels, max_range=debug_max_range) 383 384 return dataset
Lidar preprocessing pipeline – ordered.
The idea of this pipeline is: do geometry/resolution changes first, then select the channels you really want to process, then apply all instrumental/physical corrections in a strict order, and only then do post-processing (height, cropping, gluing) and finally smoothing (because smoothing must see the final, corrected signals).
Processing order
(Optional) Rebin / range rescale (
apply_bin,bin_factor)- This changes the vertical resolution and MUST be done before any correction that depends on the range grid (DC, BG, overlap).
- If the main dataset is rebinned, every auxiliary dataset used later (DC, overlap) must be rebinned to the same resolution.
Channel selection + INFO update
- Decide which channels to keep.
- Add any required/product channels defined in the system INFO.
- Drop variables for channels you do not want to process.
- Update dataset from metadata/header, if present.
Main corrections (strict order!)
- DC (dark current) → removes dark offset, usually analog
- DT (dead-time correction) → for photon-counting channels
- BG (background subtraction)→ removes far-range night-sky background
- BZ (bin-zero correction) → shifts profiles to correct laser–detector delay
- OV (overlap correction) → corrects geometric overlap losses
The order DC → DT → BG → BZ → OV is important because:
- DC must be removed before everything else.
- DT must act on the raw photon count.
- BG must act on a physically meaningful profile.
- BZ shifts the profile in range, so it must be applied when the signal is already corrected.
- OV is usually the last purely range-dependent correction.
Post-processing
- Crop to the desired range (for plots/products).
- Optionally glue analog & photon-counting channels.
- Add height coordinate from range, taking zenith angle into account.
(Optional) Smoothing
- Apply only at the end, on the fully corrected product.
- Several modes are available: gaussian, moving, sliding, ATLAS, adaptive_sliding.
- Gaussian smoothing uses
scipy.ndimage.gaussian_filter1dalongrange.
Debugging
If debug=True, the function will print a small snapshot of the signals
(time=0, range=0–debug_max_range) after every step above, in the
same order that is documented here. That way you can verify that each
correction is doing what you expect.
Parameters
file_or_path : Path | str
NetCDF file to be opened.
channels : list[str] | None
Channels to keep/process. If None, all channels in the file are used.
crop_ranges : (float, float) | None
Range interval to keep *after* corrections.
background_ranges : (float, float) | None
Altitude interval for background estimation; if None, the values
from the dataset attributes are used.
apply_* : bool
Switches for every step in the pipeline.
apply_sm : bool
If True, smoothing is applied as the last step.
smooth_mode : str | list[str] | None
One or multiple smoothing modes to apply, in order.
Passed to apply_smooth.
debug : bool
If True, show signal snapshots after every step.
debug_max_range : float
Upper limit (m) of range shown in debug snapshots.
Returns
xr.Dataset Fully preprocessed lidar dataset.