One of a series of regressions…

Training on a few hundred rows this time - x & noisy-y

Uniform-X, Nyqist x 100.zip (28.0 KB)

The (quadratic) loss over all epochs looks very… constant. Should I believe this? And how can the validation loss be less that the training loss? If dropout were enabled that would be possible in theory, but both dropout and batch normalisation are off on all dense layers.

NB I *did* mis-specify the activation functions, but changing them ALL to ReLU made no difference

For the curious, the function looks like this: red dots are the true values uniformly sampled on the domain, the green crosses are the true values at the sampled X values, and the blue dots are the y values with additional noise. The objective is to assess how well the model predicts between sample points as training progresses. The function is just arbitrarily wiggly, but with Dirichlet boundary conditions (derivative values at each end) and a maximum component frequency so that there is a well-defined Nyquist sampling frequency.

(I have other versions in which x is also random)