« Back
Batched reward model inference and Best-of-N sampling
raw.sh
Submitted by rawsh 8 hours ago