Login

I'm trying to design a reinforcement learning agent to optimize experimental parameters in my lab's spectroscopy setup, but I'm hitting a wall with the reward function. It's difficult to quantify a "good" spectral reading into a single scalar reward that drives meaningful policy improvement, especially when the state space of instrument settings is so high-dimensional.

I tried turning the spectrum into a single score by combining SNR, peak alignment, and how well a model fits the line. It helped a bit, but the policy kept steering toward settings that looked good in the metric but broke under small perturbations. We ended up dropping the single scalar and tried a lightweight multiobjective reward instead, with separate signals for stability and accuracy.

Maybe the hook isn't the reward at all but how we encode the state. We used coarse grids over instrument voltages and filter settings, so the agent never saw the real knobs that move the spectrum in the right directions.

One time I drifted into evaluating the controller by a moving baseline, and the measurements started to wander because the laser drift, not the policy, was driving changes. It taught me to fix the hardware bias first. We added a baseline calibration step and kept the reward simple ever since.

Have you tried a moving baseline for the reward, or a Pareto style setup with several objectives? I ask because a single scalar felt too brittle, and I never got clean improvement.

Login
Username:
Password:	Lost Password?
	Remember me