hi~ why scale for each tensor computation? such as in bert model. 