Trajectory Normalized Scoring for Neutrality+
I am working with Elliott Thornley on an RL implementation of Neutrality+ (a part of his POST-Agency Proposal). Neutrality+ agents are theoretically shutdownable because their preferences are represented by an average utility across trajectory lengths rather than an expected utility across trajectory lengths (no weighting by probability). This means they are indifferent to shifting probability across trajectory lengths, which results in them not taking costly actions to avoid shutdown. Our RL implementation uses empirical results for probabilities of trajectory lengths in a batch as an estimator for objective probabilities in order to implement the Neutrality+ objective function.