LLMs usually have a parameter called temperature, which allows the model to randomly pick from broader sets of words. The usual simplified explanation is that it increases the “creativity” of the model. I am realizing that it is applied only to the last layer of the network; how technically challenging would it be to add some slight noise to the weights of the whole model?

Next - Previous