Comments Page -

« Back undefinedSubmitted by surprisetalk 3 days ago

semi-extrinsic 3 hours ago
Previous discussion, 354 comments:
https://news.ycombinator.com/item?id=42458752
accrual 2 hours ago
> We additionally observe other behaviors such as the model exfiltrating its weights when given an easy opportunity.
Is anyone familiar with how this occurs? Since the models can only output text, do they attempt to "connect" to some API and POST its weights?
- Gregaros an hour ago
  Models don’t know their own weights, so I’m not sure what this means.
- HDThoreaun an hour ago
  The example on their paper is an anthropic employee giving the model access to an aws instance that has its weights. Ideally the model would refuse to access or change the weights but they were able to get it to attempt to access them.