Since probabilistic forecasts are neither right nor wrong, conventional measurements such as the root mean square cannot be used.

Still, if under certain conditions a forecast streamflow is expected to occur only 10 percent of the time, then the observed data should show this. To check if realistic values are being produced, other metrics that measure bias, skill over historic values, reliability, and discrimination are needed to assess forecast behavior and trends. This can only be accomplished by evaluating forecasts and observations, using tests such as the Ranked Probability Score, Reliability Diagrams, and the Brier Score.

If verification tests show that forecasts are not tracking actual patterns, then model calibration issues should be explored.