diff --git "a/my_dataframe.csv" "b/my_dataframe.csv"
new file mode 100644--- /dev/null
+++ "b/my_dataframe.csv"
@@ -0,0 +1,673 @@
+ID,Title,Abstract,Keywords,Indroduction,Conclusion,References,PublicationDate,Unnamed: 8,abstract_vectors
+1,Estimation of Semiparametric Multi–Index Models Using Deep Neural Networks,"In this paper, we consider estimation and inference for both the multi-index parameters and the link function involved in a class of semiparametric multi–index models via deep neural networks (DNNs). We contribute to the design of DNN by i) providing more transparency for practical implementation, ii) defining different types of sparsity, iii) showing the differentiability, iv) pointing out the set of effective parameters, and v) offering a new variant of rectified linear activation function (ReLU), etc. Asymptotic properties for the joint estimates of both the index parameters and the link functions are established, and a feasible procedure for the purpose of inference is also proposed. We conduct extensive numerical studies to examine the finite-sample performance of the estimation methods, and we also evaluate the empirical relevance and applicability of the proposed models and estimation methods to real data.",Asymptotic Theory; Multi-Index Model; ReLU; Semiparametric Regression,"In recent decades, there has been a notable emphasis on deep neural networks (DNNs). Initially applied in machine learning, DNNs have since expanded into various fields, such as economics, finance, social sciences, among others. Related to the applications of DNN, LeCun et al. (2015) offer a comprehensive overview of practical topics, while Athey (2019) discusses its capacity in social science. Additionally, Bartlett et al. (2021) and Fan et al. (2021) provide summaries of recent methodological advancements. As the most important part of DNN, a variety of activation functions have been proposed and investigated theoretically and numerically (Dubey et al., 2022). The rectified linear activation function (ReLU) sees its popularity due to its simplicity and partial linearity: σpxq “ x _ 0 with x P R. ReLU is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero. Compared to Sigmoid functions, ReLU has a low computational cost, which makes it efficient for large-scale neural networks practically. Schmidt-Hieber (2020), Farrell et al. (2021) and Fan and Gu (2022) for example establish some fundamental results with respect to using ReLU. However, there are still properties related to ReLU that remain unknown. To be more specific, we now define a simple DNN using ReLU for activation function, and then briefly review the relevant literature. Definition 1.1 (Simple DNN). For @x, v P R n , define the shifted activation function σv : R n Ñ R n as σvpxq “ pσpx1 ´ v1q, . . . , σpxn ´ vnqqJ , where xj and vj stand for the j th elements of x and v respectively. A simple DNN with m hidden layers that realizes the mapping U pĎ R c1 q ÞÑ R is defined as follows: x P U w1 σv1 pxq ¨ ¨ ¨ wm σvmp¨q A scalar output Input layer Hidden layers Output layer Mathematically, it is written as N px |Wmq :“ wm σvm ¨ ¨ ¨ w1 σv1 pxq, where Wm :“ tv1, . . . , vm; w1, . . . , wmu, and the weighting matrices and shit vectors have 1 the following dimensions: wj is $ ’’’& ’’’% c1 ˆ c1 for j “ 1, cj ˆ cj´1 for 2 ď j ď m ´ 1, 1 ˆ cm´1 for j “ m, and vj is $ & % c1 ˆ 1 for j “ 1, cj´1 ˆ 1 for j ě 2. The current literature agrees that ReLU is designed to provide sparsity, which leads to computational efficiency (e.g., Glorot et al., 2011; Schmidt-Hieber, 2020). However, there has been few efforts to explain how sparsity should be defined and why it occurs. To the best of our understanding, there are at least two types of sparsity involved: (1) non-active neurons and (2) parameters that are not effective. Additionally, the literature implicitly agrees that Wm can be estimated through a minimization process (e.g., eq. (2.4) of Farrell et al., 2021). It is noteworthy that ReLU is piecewise linear, and it is not yet clear how to handle the accumulated (non)differentiability through layers in both theory and practice. While the concern raised here actually exits in some well known software packages, to the best of our knowledge, no satisfactory treatment has been offered. For example, the well known neuralnet in R does not even support the use of ReLU (G¨unther and Fritsch, 2010). PyTorch does have ReLU and some of its variations included as the activation functions, but the explanation about the optimization process is very vague (https://pytorch.org/docs/stable/optim.html). Keras includes Adam algorithm and its variations (https://keras.io/api/optimizers/adam/), but Adam requires “a stochastic scalar function that is differentiable w.r.t. parameters...” (Kingma and Ba, 2015), which does not apply to ReLU directly in an obvious manner. A comprehensive survey on the alternatives of ReLU is provided by Dubey et al. (2022), who comment on the pros and cons of different activation functions from the perspective of implementation. We aim to settle some of these concerns in the paper. Moving on to our discussion of data, when it comes to practical analysis using DNN, the existing literature of model building primarily focuses on fully nonparametric models, with only a few mentions of semiparametric settings (e.g., Kohler and Krzy´zak (2017); Bauer and Kohler (2019), and references therein). It is not clear how to estimate and recover the index parameters involved in such semiparametric DNN models, and there is a lack of investigations in this area of research. This issue is also related to the (non)differentiability of ReLU. As far as we know, these questions have not been thoroughly investigated. Meanwhile, the current literature heavily focuses on independent and identically distributed (i.i.d.) data, while largely neglecting the implications of asymptotic properties when dealing with dependent data. This is especially significant for applications in the fields of finance and economics, such as those studied by Kaastra and Boyd (1996) and 2 Gu et al. (2020), where accounting for dependence can pose challenges in constructing inference. The literature on this topic dates at least back to Newey and West (1987), with a comprehensive review provided by Shao (2015). In the paper, we aim to address this gap by training DNN with time series data, establishing asymptotic properties, and providing valid inference. In what follows, in order to address the aforementioned concerns collectively, we consider a generalized hierarchical interaction model of the form: yt “ f‹pz J 1t θ‹1, . . . , z J rt θ‹rq ` εt , (1.1) where t “ 1, . . . , T, f‹p¨q is an unknown link function, zjt’s are dj ˆ 1 observable vectors with dj ě 2, and εt is an idiosyncratic error term. Throughout the rest of this paper, we suppose that dj ’s and r are finite, although dj ’s may be very large and much larger than r. One of the main features of our models (1.1) and (1.2) is that the multi–index setting may significantly reduce the dimensionality from d “ řr j“1 dj to a finite r. We also assign the script ‹ to the true parameters and the true function. For the purpose of identification, let }θ‹j } “ 1 for all j’s, and let the first elements of θ‹j ’s be positive. While model (1.1) has been proposed for the estimation of the link function, f˚p¨q, in the relevant literature (see, for example, Kohler and Krzy´zak (2017); Bauer and Kohler (2019)), to the best of our knowledge, there have been no attempts to estimate pθ‹1, ¨ ¨ ¨ , θ‹rq as a vector of the index parameters of interest. The main goals are to estimate and recover both f‹p¨q and θ‹j ’s jointly before we establish valid inference. When no misunderstanding arises, we write (1.1) as yt “ f‹pzt θ‹q ` εt (1.2) for notational simplicity, with zt “ diagtz J 1t , . . . , z J rtu returning a block wise diagonal matrix, and θ‹ “ pθ J ‹1 , . . . , θJ ‹r q J being a d ˆ 1 vector with d “ řr j“1 dj . Up to this point, it is worth mentioning that there is a vast literature about non– and semi–parametric index settings via unknown link functions, e.g., Xia et al. (1999). Hristache et al. (2001), Gao (2007), Horowitz and Mammen (2007), Xia (2008), Ma and Song (2015), Dong et al. (2016), Ma and He (2016), and Zhou et al. (2023). Our investigation of model (1.1) adds to the relevant literature by introducing a unified DNN approach to the estimation of both the index parameters and the link function. The main advantage of the proposed DNN based estimation method is that we are probably among the first in being able to estimate both the index parameters and the link function jointly and consistently in comparison with the existing DNN based estimation methods. Meanwhile, the proposed DNN based estimation method offers a unified way to deal 3 with the case where the dimensionality of f‹p¨q, r, can be large (although being fixed). By contrast, the existing nonparametric methods suffer from the so–called “curse of dimensionality” issue when r ě 4, for example. As a consequence of our discussion, we are also able to offer insights on how to generalize the approach to a broader class of models, such as factor augmented models studied by Bernanke et al. (2005) and Fu et al. (2023), which are of great interest. To see this, we will also delve into an example in the following instance. Example 1. Consider (1.1) and let r “ 2: yt “ f‹pz J 1t θ‹1, z J 2t θ‹2q ` εt . In this example, we suppose that z1t and z2t are observable and unobservable vectors respectively, and z2t is from the following low rank representation: Xt “ Λ z2t ` Vt . Here, Xt is a n ˆ 1 observable vector, and n may diverge. Consequently, Λ and Vt are n ˆ d2 and n ˆ 1 respectively. We let d2 be known for simplicity. There is a rich literature discussing the estimation of d2 when it is unknown (e.g., Bai and Ng, 2002; Lam and Yao, 2012; Ahn and Horenstein, 2013). This example extends the typical factor augmented model to a nonparametric framework, following the approach of Horowitz and Mammen (2007), Xia (2008), and Fan and Gu (2022) in terms of dimension reduction. In summary, our study makes the following main contributions: 1. We enhance the design of DNN by i) providing more transparency for practical implementation, ii) defining different types of sparsity, iii) showing the differentiability, iv) pointing out the set of effective parameters, and v) offering a new variant of ReLU, etc. 2. We investigate a class of semiparametric DNN models. A set of asymptotic properties for the joint estimation of both the index parameters and the link function are established, and can be applicable to a wide class of non– and semi–parametric settings. 3. We allow our models and methods to be applicable to dependent time series data and then establish a valid implementational procedure for the purpose of inference. 4. We conduct extensive numerical results to validate the theoretical findings before we also demonstrate the empirical relevance and applicability of the proposed model and estimation method to real data. 4 The remainder of this paper is structured as follows. Section 2 presents the design of DNN, and establishes some basic results which can be applied to a wide class of nonparametric models. Section 3 considers the estimation of model (1.1), and derives the asymptotics accordingly. In Section 4, we point out a few possible extensions. Section 5 provides extensive numerical studies to examine the theoretical findings. We conclude in Section 6 with a few remarks. Due to space limit, we provide extra plots, and give the proofs in Appendix B1 and Appendix B2 respectively in the online supplementary file. Before proceeding further, we introduce a few notations which will be repeatedly used throughout the paper. Symbols & basic operations — For @w P R, we let twu and rws be the largest and smallest integers satisfying twu ď w and rws ě w respectively. For @n P N, we let In, 1n, and rns be a n ˆ n identify matrix, a n ˆ 1 vector of ones, and a set t1, 2, . . . , nu respectively. For α P N r 0 with N0 “ 0 Y N and x P R r , we let α! “ źr i“1 αi !, x α “ źr i“1 x αi i , }x}1 “ ÿr i“1 |xi |, ℓx |α “ px α1 1 , . . . , xαr r , 1 J q q J with q “ $ & % 2 rlog2 rs ´ r for r ě 2 1 for r “ 1 . For a matrix A, we let }A} and }A}2 define its Frobenius norm and Spectral norm respectively. Throughout, we write z J t 1r :“ rzt and Ia,t “ Ipzt θ‹ P r´a, as r q for t P rTs. Function operations & monomials — Let fpxq be a sufficiently smooth function defined on U Ď R r , and define }f} U 8 “ sup xPU |fpxq|, fpαq pxq “ B }α}1 fpxq Bx αr r ¨ ¨ ¨ Bx α1 1 , f p1q pxq “ diag !Bfpxq Bx1 Id1 , . . . , Bfpxq Bxr Idr ) . We define a space of monomials: Pn “ tLinear span of x α with 0 ď |α| ď nu , of which the dimension is dimPn “ ` r`n r ˘ :“ rn by direct calculation. Denote the basis of Pn by tψ1pxq, . . . , ψrn pxqu, and let ψrn pxq “ pψ1pxq, . . . , ψrn pxqqJ . 5 For @x0 P U, define the re-centred basis by tψ1px | x0q, . . . , ψrn px | x0qu, and let ψrn px | x0q “ pψ1px | x0q, . . . , ψrn px | x0qqJ . Having these notation and symbols in hand, we are now ready to start our investigation.","Before concluding, we provide a few useful remarks. On the Design of DNN — Note that we require f‹pxq to be defined on a compact set, 23 but do not impose restriction on the range of tztu. In fact, for time series data, it may make more sense to assume that a is diverging, which is indeed achievable. Suppose that zt follows a sub-Gaussian distribution. In this case, after some algebra we can relax the condition on a to a logpT hr q ¨ a Ñ 8. Apparently, there is a price that we have to pay, which is the slow rate of convergence. A similar treatment has also been discussed in Li et al. (2016) for example, so we do not further elaborate it here. The total number of layers that we require for Theorem 2.1 is $ & % pm ` 3q ¨ rlog2 rs for r ě 2 m ` 3 for r “ 1 . The width of most layers in the hierarchy is 6. The sparsity occurs naturally in view of Definition 2.1 and Lemma 2.1. Connection with Some Existing Studies — Recently, Du et al. (2021) and Farrell et al. (2021) apply DNN to study treatment effects using micro datasets, and Keane and Neal (2020) apply DNN to study climate data. Our research can be extended to related topics, and our analysis provides a numerical implementation perspective that complements existing studies. Specifically, we have a clear understanding of the minimization process. To conclude, we consider the estimation of both the multi-index parameters and the link function involved in a class of semiparametric DNN models. We contribute to the design of DNN by i) providing more transparency for practical implementation, ii) defining different types of sparsity, iii) showing the differentiability, iv) pointing out the set of effective parameters, and v) offering a new variant of rectified linear activation function (ReLU), etc. The model setup also sheds light on how to generalize factor augmented models that are of practical significance. The asymptotic properties of the proposed estimates are derived accordingly, and they can be applied to a wide class of non– and semi–parametric models. Finally, we conduct extensive numerical studies to examine the theoretical findings.","Ahn, S. C. and Horenstein, A. R. (2013), ‘Eigenvalue ratio test for the number of factors’, Econometrica 81(3), 1203–1227. Ai, C. and Chen, X. (2003), ‘Efficient estimation of models with conditional moment restrictions containing unknown functions’, Econometrica 71(6), 1795–1843. Andreasen, M. M., Engsted, T., Møller, S. V. and Sander, M. (2020), ‘The Yield Spread and Bond Return Predictability in Expansions and Recessions’, The Review of Financial Studies 34(6), 2773–2812. 24 Andrews, D. W. K. (1991), ‘Heteroskedasticity and autocorrelation consistent covariance matrix estimation’, Econometrica 59(3), 817–858. Athey, S. (2019), The impact of machine learning on economics, in J. G. Ajay Agrawal and A. Goldfarb, eds, ‘The Economics of Artificial Intelligence: An Agenda’, pp. 507–547. Bai, J. and Ng, S. (2002), ‘Determining the number of factors in approximate factor models’, Econometrica 70(1), 191–221. Bartlett, P. L., Montanari, A. and Rakhlin, A. (2021), ‘Deep learning: A statistical viewpoint’, Acta Numerica 30, 87201. Bauer, B. and Kohler, M. (2019), ‘On Deep Learning as a Remedy for the Curse of Dimensionality in Nonparametric Regression’, The Annals of Statistics 47(4), 2261–2285. Bernanke, B. S., Boivin, J. and Eliasz, P. (2005), ‘Measuring the Effects of Monetary Policy: A Factor-Augmented Vector Autoregressive (FAVAR) Approach’, The Quarterly Journal of Economics 120(1), 387–422. Borup, D., Eriksen, J. N., Kjær, M. M. and Thyrsgaard, M. (2023), ‘Predicting bond return predictability’, Management Science, forthcoming . Chen, J., Gao, J. and Li, D. (2012), ‘A new diagnostic test for cross-section uncorrelatedness in nonparametric panel data models’, Econometric Theory 28(5), 1144–1163. Christoffersen, P. F. and Diebold, F. X. (2006), ‘Financial asset returns, direction-of-change forecasting, and volatility dynamics’, Management Science 52(8), 1273–1287. Dong, C., Gao, J. and Tjøstheim, D. (2016), ‘Estimation for single-index and partially linear single-index integrated models’, The Annals of Statistics 44(1), 425–453. Dong, C. and Linton, O. (2018), ‘Additive nonparametric models with time variable and both stationary and nonstationary regressors’, Journal of Econometrics 207(1), 212–236. Du, X., Fan, Y., Lv, J., Sun, T. and Vossler, P. (2021), Dimension-free average treatment effect inference with deep neural networks. Available at https://doi.org/10.48550/arXiv.2112.01574. Dubey, S. R., Singh, S. K. and Chaudhuri, B. B. (2022), ‘Activation functions in deep learning: A comprehensive survey and benchmark’, Neurocomputing 503, 92–108. Fan, J. and Gu, Y. (2022), Factor augmented sparse throughput deep relu neural networks for high dimensional regression. Available at https://doi.org/10.48550/arXiv.2210.02002. Fan, J., Liao, Y. and Wang, W. (2016), ‘Projected Principal Component Analysis in Factor Models’, The Annals of Statistics 44(1), 219–254. Fan, J., Ma, C. and Zhong, Y. (2021), ‘A selective overview of deep learning’, Statistical Science 36(2), 264–290. Fan, J. and Yao, Q. (2003), Nonlinear Time Series: Nonparametric and Parametric Methods, SpringerVerlag. 25 Farrell, M. H., Liang, T. and Misra, S. (2021), ‘Deep neural networks for estimation and inference’, Econometrica 89(1), 181–213. Fu, Z., Su, L. and Wang, X. (2023), ‘Estimation and inference on time-varying FAVAR models’, Journal of Business & Economic Statistics 0(0), 1–15. Gao, J. (2007), Nonlinear Time Series: Semiparametric and Nonparametric Methods, Vol. 108, Chapman & Hall/CRC Monographs on Statistics and Applied Probability, London. Glorot, X., Bordes, A. and Bengio, Y. (2011), Deep sparse rectifier neural networks, in G. Gordon, D. Dunson and M. Dud´ık, eds, ‘Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics’, Vol. 15 of Proceedings of Machine Learning Research, pp. 315–323. Gu, S., Kelly, B. and Xiu, D. (2020), ‘Empirical asset pricing via machine learning’, The Review of Financial Studies 33(5), 2223–2273. G¨unther, F. and Fritsch, S. (2010), ‘Neuralnet: Training of neural networks’, R Journal 2, 30–38. Hansen, B. E. (1991), ‘Strong laws for dependent heterogeneous processes’, Econometric Theory 7(2), 213–221. Hansen, B. E. (1992), ‘Consistent covariance matrix estimation for dependent heterogeneous processes’, Econometrica 60(4), 967–972. He, X., Pan, X., Tan, K. M. and Zhou, W.-X. (2023), ‘Smoothed quantile regression with large-scale inference’, Journal of Econometrics 232, 367–388. Hendrycks, D. and Gimpel, K. (2023), Gaussian error linear units (GELUs). Available at https://doi. org/10.48550/arXiv.1606.08415. Horowitz, J. L. and Mammen, E. (2007), ‘Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions’, The Annals of Statistics 35(6), 2589–2619. Hristache, M., Juditsky, A., Polzehl, J. and Spokoiny, V. (2001), ‘Structure adaptive approach for dimension reduction’, The Annals of Statistics 29(6), 1537–1566. Kaastra, I. and Boyd, M. (1996), ‘Designing a neural network for forecasting financial and economic time series’, Neurocomputing 10(3), 215–236. Keane, M. and Neal, T. (2020), ‘Comparing deep neural network and econometric approaches to predicting the impact of climate change on agricultural yield’, The Econometrics Journal 23(3), S59–S80. Kingma, D. and Ba, J. (2015), Adam: A method for stochastic optimization, in ‘International Conference on Learning Representations (ICLR)’, San Diega, CA, USA. Kohler, M. and Krzy´zak, A. (2017), ‘Nonparametric regression based on hierarchical interaction models’, IEEE Transactions on Information Theory 63(3), 341–356. Lam, C. and Yao, Q. (2012), ‘Factor modeling for high-dimensional time series: Inference for the number of factors’, The Annals of Statistics 40(2), 694–726. LeCun, Y., Bengio, Y. and Hinton, G. (2015), ‘Deep learning’, Nature 521, 436–444. 26 Li, D., Tjøstheim, D. and Gao, J. (2016), ‘Estimation in nonlinear regression with Harris recurrent Markov chains’, The Annals of Statistics 44(5), 1957–1987. Ludvigson, S. C. and Ng, S. (2009), ‘Macro Factors in Bond Risk Premia’, The Review of Financial Studies 22(12), 5027–5067. Ma, S. and He, X. (2016), ‘Inference for single-index quantile regression models with profile optimization’, The Annals of Statistics 44(3), 1234–1268. Ma, S. and Song, P. X. K. (2015), ‘Varying index coefficient models’, Journal of the American Statistical Association 110(509), 341–356. Newey, W. K. and West, K. D. (1987), ‘A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix’, Econometrica 55(3), 703–708. Nyberg, H. (2011), ‘Forecasting the direction of the us stock market with dynamic binary probit models’, International Journal of Forecasting 27(2), 561–578. Palm, F., Smeekes, S. and Urbain, J.-P. (2011), ‘Cross-sectional dependence robust block bootstrap panel unit root tests’, Journal of Econometrics 163(1), 85–104. Rudin, W. (2004), Principles of Mathematical Analysis, McGraw-Hill Companies, Inc., New York. Schmidt-Hieber, J. (2020), ‘Nonparametric Regression Using Deep Neural Networks with ReLU Activation Function’, The Annals of Statistics 48(4), 1875–1897. Shao, X. (2010), ‘The dependent wild bootstrap’, Journal of the American Statistical Association 105(489), 218–235. Shao, X. (2015), ‘Self-normalization for time series: A review of recent developments’, Journal of the American Statistical Association 110(512), 1797–1817. Xia, Y. (2008), ‘A multiple-index model and dimension reduction’, Journal of the American Statistical Association 103(484), 1631–1640. Xia, Y., Tong, H. and Li, W. K. (1999), ‘On extended partially linear single–index models’, Biometrika 86(4), 831–842. Zhou, W., Gao, J., Harris, D. and Kew, H. (2023), ‘Semiparametric single–index cointegration with nonstationary predictors’, Forthcoming in Journal of Econometrics 240(12), 1–20.",07.11.2023,https://arxiv.org/pdf/2311.02789.pdf,"[-1.28806960e-02 -9.73352138e-03 -1.76355056e-02 -5.91507694e-03
+  3.50748673e-02  6.23929240e-02 -1.38590913e-02  3.03640170e-03
+  6.90200850e-02 -4.20260318e-02  2.95352601e-02 -6.89366832e-02
+  5.15826046e-02  4.41497639e-02  2.91810166e-02  2.48014126e-02
+  2.21599806e-02  1.21741556e-02 -9.35822725e-03 -4.12322860e-03
+  7.23822638e-02 -4.24193144e-02  3.45318578e-02 -2.23071631e-02
+  6.93400502e-02 -2.68890969e-02  4.96317483e-02 -2.72496026e-02
+ -5.05004413e-02 -2.45185390e-01  5.96076027e-02  1.56537071e-02
+  6.96888119e-02 -3.29464325e-04 -1.06681287e-02  4.88564372e-04
+ -3.90107185e-02 -9.09018330e-03  9.55260079e-03 -1.60585120e-02
+ -8.24611168e-03 -1.64831840e-02 -2.51896679e-02 -5.25562391e-02
+  2.50798818e-02 -1.63137373e-02 -1.93679091e-02 -1.84066370e-02
+ -8.20399001e-02  1.56842731e-02 -4.36851047e-02 -7.73714716e-03
+  1.12762330e-02 -3.24167614e-03  2.85336003e-02  7.69549143e-03
+ -5.07906312e-03  2.74870805e-02  3.15380134e-02  4.97653075e-02
+  5.01391776e-02  4.97317612e-02 -1.33876160e-01 -3.83503810e-02
+  2.60127871e-03 -5.07016387e-03  5.86734526e-03 -5.74426427e-02
+ -7.50196027e-03  2.95453635e-03  2.17125583e-02  3.45503837e-02
+  1.89384597e-03 -7.37499539e-03 -2.21704524e-02  1.70453154e-02
+  9.75921005e-03  3.03051230e-02  5.79469092e-03  3.94558581e-03
+  7.93046653e-02 -2.04380117e-02 -2.75664534e-02 -6.57943115e-02
+  3.84523813e-03  2.31761988e-02 -5.09328116e-03  1.02788780e-03
+ -1.93546247e-02 -4.82086539e-02 -2.53771413e-02  1.88263655e-02
+  8.63456260e-03  7.01467600e-03 -1.73302423e-02  3.90107818e-02
+  3.54465544e-02  4.46824692e-02 -5.01292832e-02  4.00910735e-01
+  1.72004948e-04  1.70156602e-02  1.93124823e-02 -7.01426528e-03
+  1.11174099e-02 -2.81313863e-02 -2.38020606e-02 -2.29879599e-02
+ -2.96720751e-02  2.69393884e-02 -3.02358530e-02 -3.61771397e-02
+ -1.74721552e-03 -7.74939451e-03 -6.19617151e-03  7.82593340e-03
+  2.06919722e-02  2.77336091e-02  4.58796360e-02 -2.48536337e-02
+ -3.03071700e-02  1.76380444e-02  4.66749296e-02 -1.24419536e-02
+  7.78560713e-03 -2.75908541e-02 -4.25881520e-02  1.11350074e-01
+ -7.29001593e-04  5.80980256e-02 -1.23496968e-02 -1.86703745e-02
+ -2.84463298e-02  4.62245755e-03  1.11902915e-02 -1.09899603e-02
+ -1.29717961e-02 -3.36563475e-02 -2.75998004e-02  6.07485138e-02
+ -5.61724231e-02 -1.12952525e-02  7.18620867e-02  1.64723787e-02
+ -5.69703840e-02  1.23917975e-01 -4.20799255e-02 -4.27670442e-02
+ -6.61691800e-02 -7.84301907e-02  2.22217347e-02  2.41521392e-02
+  1.00458227e-03 -1.77660137e-02  2.72670798e-02  1.74394697e-02
+  3.49144638e-02 -5.20079732e-02 -4.20706458e-02  1.50307110e-02
+ -8.44114870e-02 -1.95271857e-02 -3.19813862e-02  1.18040547e-01
+ -1.19587285e-02 -9.93037969e-03 -2.27848124e-02  5.89467259e-03
+ -3.31584997e-02 -7.22731324e-03  5.85691258e-02 -4.29146085e-03
+ -2.04733498e-02  1.96334859e-03  1.67449471e-02  1.85180716e-02
+ -7.82651007e-02 -3.53490636e-02 -5.00840284e-02  5.02128154e-02
+ -2.01157629e-02 -3.43743898e-02  7.73356343e-03  4.09334786e-02
+ -3.37764099e-02  1.46612516e-02  2.69293366e-03 -4.75489199e-02
+  1.10316984e-02  1.47378519e-02 -7.26701394e-02 -4.58753016e-03
+ -4.78156395e-02  5.32479882e-02  6.60040649e-03 -5.70029132e-02
+ -4.75513050e-03  2.47306954e-02 -4.90321219e-02 -3.20298113e-02
+ -3.84856462e-02 -3.78264673e-03 -9.98889934e-03 -5.55461198e-02
+ -1.68518201e-02  1.34716984e-02 -9.99757648e-03  5.34332320e-02
+ -2.03368030e-02  1.00883597e-03 -3.64181329e-03 -3.23902816e-02
+  2.02074461e-02 -1.63168795e-02  2.49791704e-02  1.64128952e-02
+  4.81191166e-02  2.00062580e-02 -4.28318679e-02  4.22218069e-02
+  1.37057966e-02 -5.61026409e-02 -2.09224522e-02 -2.94402331e-01
+ -4.19719294e-02 -5.16519602e-03  1.34602599e-02  7.19851777e-02
+ -9.67348218e-02  4.13470380e-02  3.12939696e-02  7.68617988e-02
+  6.06423542e-02  3.74002121e-02  5.79167344e-02 -1.62197743e-02
+  9.90873761e-03  4.68073934e-02  2.81290095e-02  6.60911798e-02
+  1.25955129e-02 -3.69596817e-02 -2.36961301e-02 -5.98784164e-02
+  5.24742119e-02  2.47995518e-02 -5.71982861e-02  8.91288444e-02
+  2.86154877e-02  1.20238729e-01 -5.60555495e-02  2.90236361e-02
+  5.63717354e-03 -9.56766866e-03  5.11426441e-02  1.20983776e-02
+ -7.60983117e-03  7.53403381e-02  1.28188310e-02  6.48126975e-02
+ -4.03593993e-03 -4.27411050e-02 -3.69240041e-03 -6.04302213e-02
+  2.43543498e-02  3.28347720e-02 -4.89299595e-02 -9.20265764e-02
+ -2.99101090e-03 -3.77404764e-02 -1.67359393e-02 -3.26865464e-02
+  6.28208742e-02  6.67650718e-03 -3.35673578e-02  3.00099310e-02
+ -7.02837706e-02 -2.28863992e-02 -6.55934121e-03 -8.33484903e-02
+ -6.04255870e-03 -5.98538257e-02 -5.39607648e-03  9.81099624e-03
+ -6.50918186e-02 -1.80318300e-02 -1.05859615e-01  4.30980772e-02
+ -1.09529160e-02 -8.37355107e-03 -1.51439551e-02  1.29790790e-02
+  1.44259781e-02 -3.64152454e-02  5.71663529e-02  1.64974332e-02
+  1.09962359e-01  5.57692870e-02  3.97806242e-03  4.10543551e-04
+ -2.07139626e-02 -3.11278906e-02 -2.32167337e-02  9.36980695e-02
+ -8.20931420e-03  7.31467735e-03  1.47972833e-02  2.98215579e-02
+ -2.08835602e-02  8.18059817e-02 -3.92255047e-03  3.82782258e-02
+  6.29541501e-02  1.96613907e-03 -7.19734281e-02 -3.60640995e-02
+ -1.87040847e-02  4.73072082e-02  2.62470972e-02 -2.54584044e-01
+  2.97326292e-03  2.42868681e-02  5.02770804e-02 -3.16516422e-02
+  7.43851205e-03  5.80773391e-02 -4.16009128e-02 -1.26718236e-02
+ -5.23490794e-02 -3.78572345e-02  3.85819972e-02  3.38535272e-02
+  7.52370397e-04 -2.72180978e-02 -2.97553167e-02  1.83748789e-02
+ -5.87015636e-02  3.41283269e-02 -4.22328338e-02  1.31276511e-02
+  1.80461314e-02  1.68845281e-01 -4.54933122e-02 -2.35827807e-02
+  1.65580511e-02 -3.18370424e-02 -3.69022787e-02  1.48295965e-02
+ -2.63806488e-02  8.30963347e-03  1.50828110e-02  6.36981577e-02
+ -4.19297861e-03  1.67081077e-02  1.15448080e-01 -7.97791407e-03
+  5.55521473e-02  3.33393477e-02 -9.73690767e-03  7.31786117e-02
+ -1.39994333e-02 -4.14346606e-02  1.19856163e-03  3.87710184e-02
+  2.97276452e-02 -2.95050233e-03  1.33800115e-02  3.61516327e-03
+ -1.38993943e-02  5.90568874e-03 -1.09024011e-02  4.58794348e-02
+ -8.87873862e-03  9.13937837e-02 -4.55648592e-03 -2.41280608e-02
+ -1.93908606e-02 -9.87393875e-03  1.50635815e-03  9.78077948e-03
+ -8.94483272e-03  2.58695018e-02 -3.96354161e-02 -6.30394146e-02]"
+2,A Directional Monitoring Approach of Sequential Incomplete Wind Power Curves with Copula-based Variational Inference,"Wind turbines often work under complex conditions which result in performance degradation. Accurate performance degradation monitoring is essential to ensure the reliable operation of wind turbines and reduce the maintenance costs. Wind turbine power curve monitoring is an effective way to detect performance degradation. However, due to the intermittency and fluctuation of wind speed, the wind speed range varies at different time periods, making power curves difficult to compare. Motivated by this, we proposed copula-based variational inference framework and used it to establish a sequential incomplete wind power curve estimation algorithm. First, a monotone power curve is constructed based on copula-based variational inference and integrated spline regression model. Besides, the prior distribution of model parameters are sequentially updated. Then, a directional control chart based on a new statistic named KLdivergence factor is constructed to monitor wind turbine performance degradation. The real data of a wind farm in the east of the United Kingdom shows that the proposed method can both improve the accuracy of wind turbine power curve modeling and monitor wind turbine performance degradation more precisely and comprehensively than the existing approaches.","copula, variational inference, spline regression model, wind power curve monitoring, performance degradation.","Wind energy is a renewable, pollution-free, and widely distributed energy source that has received increasing attention. Wind power plants generate electricity by equipping a series of wind turbines based on wind conditions, surrounding terrain, transmission routes, and other site selection considerations. The electricity generated by each turbine is transmitted to the substation and then to the power grid. The cumulative installed capacity of wind turbines continues to grow, increasing from 198GW in 2010 to 837GW in 2021 [1]. However, due to fluctuations in wind and energy demand, wind turbines are subject to complex conditions under alternating loads. The harsh working conditions are more likely to result in performance degradation, reduction of wind power output and further increase of the operation and maintenance cost. Accurate performance degradation monitoring is essential to ensure the reliable operation of wind turbines. Power curve monitoring focuses on detecting the changes in the relationship between power output and explanatory variables. Anomalies may occur if the functional relationship changes. Power curve monitoring is a typical profile monitoring problem. Profiles can be categorized as parametric model and non-parametric model. Parametric model assumes that response variable is correlated with explanatory variables, and regression models are often used to explain this relationship [2, 3, 4, 5], such as univariate linear regression, multivariate regression, polynomial regression, generalized linear regression, nonlinear regression, etc. Detecting changes in the regression coefficients is the main task. However, fitting parametric models of the in-control and out-of-control profile data is not always successful. Thus, some researchers proposed non-parametric profile models, including local kernel regression (LKR) [6, 7], functional data analysis (FDA) [8, 9], functional principal component analysis (FPCA) [10], gaussian process regression (GPR) [11], etc. Qiu et al. [6] introduced local linear kernel smoothing into an exponentially weighted moving average (EWMA) control chart, and used a non-parametric mixed-effect model to describe in-profile correlations. Zou et al. [7] combined multivariate exponential weighted moving average control chart (MEWMA) and generalized likelihood ratio test (GLRT) based on non-parametric regression to carry out online monitoring of changes in regression relations and profile variance. Functional data analysis including wavelet transform, B-spline transform, etc, regards profile data as a continuous function, and converts it into a certain feature space to obtains feature coefficients. Zhou [8] combined statistical process control with Haar wavelet transformation to not only detect process changes but estimate the amplitude of mean shift. Chang et al. [9] used discrete wavelet transform to separate the variance or noise from the profile, B-splines to define the shape of the profile, and combined with the Hoteling T2 control chart to monitor the mean shift or shape change. Colosimo et al. [10] investigated the application of principal component analysis in profile data analysis and explored which types of profile features were easy to obtain interpretable PCs. Due to its flexibility and conciseness, GPR shows good modeling performance. For example, Zhang et al. [11] used GPR to characterize correlations within profiles, which focused on Phase II monitoring of linear trends and inner profile correlations and established two multivariate Shewhart control charts. Referring the profile monitoring methods, existing power curve monitoring approaches compare the latest power curve with the reference power curve under normal condition either by transforming power curve to a scalar metric[12, 13], or comparing deviations between two power curves [14], or monitoring model parameters [Long, Wang 2015]. AEP [12] and the power coefficient [13] are common scalar metrics that measure the properties of power curves. If the AEP or power coefficient of one wind turbine is inferior to that of the reference power curve built under the normal working condition, then this wind turbine may suffer from performance degradation. Ding et al. [14] discussed wind turbine performance monitoring based on the power curves in both the spatial and temporal domain. Long et al. [15] applied the Hotelling T2 and generalized variance charts to monitor power curve parameters. Kusiak et al. [16] investigated three reference curves including the power curve, rotor curve and blade pitch curve, and applied the Hotelling T2 chart to monitor the skewness and kurtosis of these curves. However, due to the intermittency and fluctuation of wind speed, the wind speed range varies at different time periods (Fig. 1), which leads to significant differences in the shape of the power curves. The basic idea of the new sequential incomplete power curve directional monitoring approach consists of three main parts. Firstly, a new variational inference named Copula-based Variational Inference (CVI) is proposed to obtain more accurate posterior distribution of latent variables without severely underestimating posterior variance as Mean Field Variational Inference (MFVI). Secondly, a monotone power curve based on CVI, and Integrated Spline (I-Spline) is proposed to describe the wind power generation efficiency with incomplete samples for sequential data segment updating. Finally, an online power curve monitoring method for directional change of mean vector is proposed to detect performance degradation of wind turbines. Fig 1. Power curves for the same wind turbine in different time periods The proposed method enjoys the following innovation: 1) Unlike traditional profile construction methods that entails a large amount of data samples, the power curve of wind turbines can be sequentially updated with limited data samples accurately based on CVI. This solves the inaccuracy problem of Mean Field Variational Inference (MFVI) and can improve the response speed of performance degradation detection. 2) The power curve monitoring approach proposed is online and directional. It has lower computational burden and is more sensitive to directional changes in power curve parameters. The rest of the paper is organized as follows. Section 2 described the directional monitoring approach of sequential incomplete power curves based on CVI in detail. In Section 3, the performance analysis in a real-world dataset is presented. We finally conclude in Section 4.","This paper proposes a novel wind turbine performance degradation detection algorithm aimed at dealing with incomplete data sample problem. Copula-based variational inference is proposed to estimate parameters of sequential incomplete power curves and a new statistics called KLdivergence factor is proposed to detection performance degradation directionally. The prediction results of WTPC modeling and performance degradation monitoring are analyzed in two datasets. The main results are the followings: 1) Copula-based variational inference allows flexible modeling of posterior distribution and achieves stable sequential updating of model coefficient distributions given incomplete data samples. 2) WTPC modeling based on CVI and I-Spline fits data samples well and constructs reliable profiles compared with multivariate gaussian proposal because the latter underestimates distribution variance significantly. It also facilities subsequent degradation monitoring because the values of its parameters indicates WT performance. 3) The control chart generated by KL-divergence factor achieves the best monitoring performance. It can monitor wind turbines in performance degradation accurately. Moreover, it rarely triggers false alarms, which can reduce the number of unnecessary maintenances. In the future, the proposed directional sequential incomplete power curve monitoring scheme can be expanded in the following directions. Firstly, this paper focuses on incomplete power curve monitoring task of a single WT. Multiple power curve monitoring task of different WTs is valuable to investigate. Then, profile characteristics of different components of a single WT is valuable to investigate which can facilitate fault diagnosis of WT. It can also help the understanding of WT operation.","[1] Global Wind Energy Council. Global wind report. 2022. [2] Kang, L., & Albin, S. L. (2000). On-line monitoring when the process yields a linear profile. Journal of quality Technology, 32(4), 418-426. [3] Zou, C., Tsung, F., & Wang, Z. (2007). Monitoring general linear profiles using multivariate exponentially weighted moving average schemes. Technometrics, 49(4), 395-408. [4] Wang, F. K. (2016). Process yield analysis for multivariate linear profiles. Quality Technology & Quantitative Management, 13(2), 124-138. [5] Wang, Y. H. T., & Huwang, L. (2012). On the monitoring of simple linear Berkson profiles. Quality and Reliability Engineering International, 28(8), 949-965 [6] Qiu, P., Zou, C., and Wang, Z. (2010), “Nonparametric Profile Monitoring by Mixed Effects Modeling,” Technometrics, 52, 265–277. [7] Zou, C., Tsung, F., & Wang, Z. (2008). Monitoring profiles based on nonparametric regression methods. Technometrics, 50(4), 512-526. [8] Zhou, S., Sun, B., & Shi, J. (2006). An SPC monitoring system for cycle-based waveform signals using Haar transform. IEEE Transactions on Automation Science and Engineering, 3(1), 60-72. [9] Chang,S. I., and Yadama, S. (2010), “Statistical Process Control for Monitoring Non-Linear Profiles Using Wavelet Filtering and b-Spline Approximation,” International Journal of Production Research, 48, 1049–1068. [10] Colosimo, B.M. and Pacella, M. (2007) On the use of principal component analysis to identify systematic patterns in roundness profiles. Quality and Reliability Engineering International, 23, 707–725. [11] Zhang, Y., He, Z., Zhang, C. and Woodall, W.H. (2014) Control charts for monitoring linear profiles with within-profile correlation using Gaussian process models. Quality and Reliability Engineering International, 30(4), 487–501. [12] International Electrotechnical Commission (IEC), IEC TS 61400-12-1 Ed. 1, Wind Turbines – Part 12-1: Power Performance Measurements of Electricity Producing Wind Turbines, IEC, Geneva, Switzerland, 2005. [13] Kjellin J, Bülow F, Eriksson S, Deglaire P, Leijon M, Bernhoff H. Power coefficient measurement on a 12 kW straight bladed vertical axis wind turbine[J]. Renewable energy, 2011, 36(11): 3050-3053. [14] Ding Y, Kumar N, Prakash A, Kio A E, Liu X, Liu L, Li Q. A case study of space-time performance comparison of wind turbines on a wind farm[J]. Renewable Energy, 2021, 171: 735- 746. [15] Long H, Wang L, Zhang Z, Song Z, Xu J. Data-driven wind turbine power generation performance monitoring[J]. IEEE Transactions on Industrial Electronics, 2015, 62(10): 6627-6635. [16] Kusiak A, Verma A. Monitoring wind farms with performance curves[J]. IEEE transactions on sustainable energy, 2012, 4(1): 192-199. [17] Sklar, M., 1959. Fonctions de Répartition à n Dimensions et Leurs Marges. Université Paris 8. [18] Nelsen, R. B. (2006). An introduction to copulas. Springer. [19] J. Friedman, T. Hastie, R. Tibshirani, The Elements of Statistical Learning, vol. 1, Springer Series in Statistics, New York, 2001. [20] Liu, C., Rubin, D. B., and Wu, Y. (1998). Parameter expansion to accelerate EM: the PX-EM algorithm. Biometrika, 85(4), 755–770. [21] Liu, J. S. and Wu, Y. (1999). Parameter expansion for data augmentation. Journal of the American Statistical Association (JASA), 94(448), 1264–1274. [22] Han, S., Liao, X., Dunson, D., & Carin, L. (2016). Variational Gaussian copula inference. In Artificial Intelligence and Statistics (pp. 829-838). PMLR. [23] Guo, J., Yan, H., & Zhang, C. (2023). A Bayesian Partially Observable Online Change Detection Approach with Thompson Sampling. Technometrics, 65(2), 179-191. [24] Zhang, C., Chen, N., and Wu, J. (2020), “Spatial Rank-based High-Dimensional Monitoring through Random Projection,” Journal of Quality Technology, 52, 111–127. [25] Plumley, C. (2022). Penmanshiel Wind Farm Data (0.0.2) [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.5946808 [26] Guo P, Gan Y, Infield D. Wind turbine performance degradation monitoring using DPGMM and Mahalanobis distance[J]. Renewable Energy, 2022, 200: 1-9. [27] Wang, Y., Li, Y. (2020). Sparse heteroscedastic multiple spline regression models for wind turbine power curve modeling. IEEE Transactions on Sustainable Energy, 12(1), 191-201. [28] Honkela, A., Raiko, T., Kuusela, M., Tornio, M., and Karhunen, J. (2010). Approximate Riemannian conjugate gradient learning for fixed-form variational Bayes. Journal of Machine Learning Research (JMLR), 11, 3235–3268. [29] Y. Wang, Q. Hu, D. Srinivasan, and Z. Wang, “Wind power curve modeling and wind power forecasting with inconsistent data,” IEEE Trans. Sustain. Energy, vol. 10, no. 1, pp. 16–25, Jan. 2019. [30] S. Shokrzadeh, M. J. Jozani, and E. Bibeau, “Wind turbine power curve modeling using advanced parametric and nonparametric methods,” IEEE Trans. Sustain. Energy, vol. 5, no. 4, pp. 1262–1269, Oct. 2014. [31] Y. Wang, Q. Hu, L. Li, A. M. Foley, and D. Srinivasan, “Approaches to wind power curve modeling: A review and discussion,” Renew. Sustain. Energy Rev., vol. 116, 2019, Art. no. 109422. [32] Stetco A, Dinmohammadi F, Zhao X, Robu V, Flynn D, Barnes M, Keane J, Nenadic G. Machine learning methods for wind turbine condition monitoring: A review[J]. Renewable energy, 2019, 133: 620-635. [33] Jahani, S., Kontar, R., Veeramani, D., & Zhou, S. (2018). Statistical monitoring of multiple profiles simultaneously using Gaussian processes. Quality and Reliability Engineering International, 34(8), 1510-1529. [34] Fan, J., and Huang, L.-S. (2001), “Goodness-of-Fit Tests for Parametric Regression Models,” Journal of the American Statistical Association, 96, 640–652",04.11.2023,https://arxiv.org/abs/2311.02411,"[-5.07008694e-02  5.81610389e-02  1.73777696e-02  2.19040345e-02
+  5.80639914e-02  4.35756780e-02 -3.40602919e-02 -1.06388293e-02
+  5.25594465e-02  5.00384010e-02  6.12727478e-02 -7.51961768e-02
+  7.78220175e-03  2.98410915e-02 -2.92003676e-02  2.56231315e-02
+ -3.34906904e-03  8.47067237e-02  3.37002091e-02 -1.29112583e-02
+  3.65003906e-02 -5.01192063e-02  3.14880931e-03  3.38094719e-02
+  7.47070387e-02  1.28131099e-02 -2.72483565e-03  2.63809077e-02
+ -8.64057429e-03 -2.27985129e-01  2.26829126e-02 -1.49146188e-02
+  8.19427241e-03 -3.69843878e-02 -1.71531807e-03 -3.25031549e-04
+ -3.68182734e-02  3.48968781e-03  3.06805633e-02  1.93737317e-02
+ -1.24713313e-02  3.61814949e-04  2.33497750e-02  8.72575119e-03
+ -2.23792866e-02 -2.80700736e-02 -7.27788545e-03  2.68680626e-03
+ -4.58868667e-02 -4.33875471e-02  1.41823934e-02  2.08684546e-03
+ -5.92069793e-03 -1.62036084e-02  5.86250871e-02  6.04840107e-02
+ -1.38214305e-02  7.62473717e-02  5.68659045e-02  1.49745168e-02
+ -1.85115729e-02  2.38039922e-02 -1.86465442e-01 -7.71959138e-04
+  2.24037226e-02  3.36033106e-02  3.75421830e-02 -1.02889715e-02
+ -1.64137650e-02  7.78393587e-03 -2.45507844e-02  1.66134927e-02
+ -1.62262991e-02  1.88093353e-02 -3.02135516e-02  7.17114881e-02
+ -4.58147656e-03  4.33083102e-02 -7.36196414e-02  1.27740921e-02
+  5.45314886e-02 -1.74121670e-02 -5.69509082e-02 -4.37415838e-02
+ -3.91127914e-02 -5.41367475e-03  3.25974822e-02 -3.88430618e-02
+  7.91516379e-02  3.51805128e-02  1.18453251e-02 -1.75487958e-02
+  1.59942862e-02  2.49160659e-02 -2.52129436e-02  2.25018021e-02
+  2.69978959e-02  3.66889760e-02 -3.89153697e-02  3.29176277e-01
+ -8.83134548e-03  3.54754785e-03 -1.57431588e-02  2.15016659e-02
+ -1.63880419e-02 -4.30969894e-02 -3.95997763e-02 -6.59806058e-02
+  1.19841809e-03 -4.90806773e-02  9.96489637e-03  1.34699671e-02
+  3.65049280e-02  1.56564324e-03  2.39786003e-02 -8.74484479e-02
+ -9.15206503e-03  1.05932290e-02 -7.07884878e-03 -3.06749996e-03
+ -1.45854568e-02  3.40576507e-02  9.48757008e-02  3.75394784e-02
+  6.12031817e-02 -3.37606817e-02  1.49159404e-02  7.18670487e-02
+ -1.57325026e-02 -1.05757448e-04  2.07252577e-02  1.92894768e-02
+ -4.00865376e-02 -1.99496131e-02  9.29466169e-03 -1.42945657e-02
+  3.04461382e-02  1.12508843e-02  1.63287055e-02  5.28588407e-02
+ -4.26975489e-02  1.17595466e-02  9.52087417e-02 -3.66248824e-02
+ -5.66732436e-02  7.00730160e-02  1.26362881e-02 -2.65561510e-02
+ -7.37971514e-02 -5.36749475e-02  5.00920881e-03  6.33059591e-02
+  5.96582377e-03 -2.01652776e-02  7.48769790e-02  4.01026644e-02
+  3.00695724e-03  3.55265513e-02 -7.03778444e-03  1.47785991e-02
+ -7.71869719e-02 -1.39394207e-02 -4.94638085e-02  6.93907812e-02
+ -2.97049433e-02 -8.31907094e-02  2.08210088e-02  2.79569179e-02
+ -1.91737214e-04 -5.04083652e-03  3.29476185e-02  3.98996547e-02
+  1.21674426e-02 -4.01077643e-02  1.03298938e-02 -4.44876626e-02
+ -3.06389537e-02  1.36348987e-02  6.23442093e-03 -1.94008350e-02
+  4.39599752e-02  1.61820542e-04 -1.91619843e-02  1.72165670e-02
+  3.49541977e-02  1.16050364e-02 -5.06951064e-02 -4.49769869e-02
+ -1.06017869e-02 -2.10093297e-02 -1.16024949e-01 -4.47168723e-02
+ -7.80278910e-03  4.99680825e-02  3.26978937e-02 -6.65463880e-02
+ -5.05091995e-02  1.93769000e-02  2.15972625e-02 -6.25452921e-02
+  3.71156856e-02  3.22311446e-02  1.76171982e-03  2.64367182e-02
+ -4.70210016e-02  8.98062810e-02 -2.54353527e-02  5.35582379e-03
+ -8.74823518e-03  8.46687704e-02 -1.84801351e-02 -8.10401067e-02
+  5.06468825e-02  4.79252040e-02 -3.38966474e-02  8.92086700e-02
+  1.20828412e-02  1.59300398e-02 -7.81230303e-03  4.23274785e-02
+  1.20655149e-02 -2.13547889e-02 -4.81110439e-02 -3.11329842e-01
+ -7.53382891e-02 -2.48616431e-02  4.98701911e-03  1.07363731e-01
+ -5.72552159e-02  5.28874062e-03 -1.67558081e-02  3.40256542e-02
+ -1.19706448e-02  2.61174049e-02  4.18691561e-02 -3.05384230e-02
+ -5.42357154e-02 -1.70091260e-02  5.55543974e-02  5.04300871e-04
+ -5.45707531e-02 -1.39033735e-01  2.07294449e-02 -2.55039558e-02
+  5.48263500e-03  3.08149494e-03 -5.21923788e-02  1.47953648e-02
+  1.65966116e-02  8.81823227e-02 -1.38490438e-01  3.28096487e-02
+  2.85829883e-02 -2.87675317e-02  3.35313492e-02 -3.53088975e-02
+  5.97172230e-02  5.37869409e-02  3.37876864e-02  1.30160544e-02
+ -4.31426689e-02 -6.78334534e-02 -4.10824129e-03  4.55444157e-02
+ -1.98616413e-03 -2.02860738e-05 -6.52032718e-02 -7.21522495e-02
+ -4.61986810e-02 -5.75274753e-04  3.40672880e-02 -7.10137710e-02
+ -1.47894928e-02  5.37689589e-02  1.13861600e-03  1.35181975e-02
+ -3.21044326e-02  7.64933899e-02 -4.73175524e-03 -5.47196344e-02
+  6.55509010e-02 -3.48368920e-02 -3.08078304e-02  1.30501688e-02
+  7.45640229e-03 -3.02530918e-03 -2.99626328e-02  5.71138896e-02
+  9.30658076e-03  1.30018163e-02 -4.40105759e-02 -3.57500911e-02
+  8.47349316e-03 -5.14316037e-02  9.20535922e-02 -2.14209594e-02
+ -5.85243804e-03  3.62427607e-02  1.85348522e-02  9.84249637e-03
+ -1.80192180e-02 -7.01916814e-02 -3.15802135e-02  7.28985146e-02
+ -1.60368420e-02  4.35030311e-02  1.85956564e-02 -3.30072492e-02
+ -2.92690471e-02 -2.67016748e-03 -7.82344490e-03  1.84168629e-02
+ -9.84453876e-03  2.92800106e-02  8.26537050e-03  3.49380285e-03
+ -4.79634618e-03  7.03386292e-02  6.40816689e-02 -2.54287183e-01
+ -6.21242784e-02  6.18588470e-04 -1.84144806e-02 -4.51391116e-02
+  1.06313080e-03  5.14992476e-02 -3.11346464e-02 -9.05468222e-03
+  2.44479235e-02 -6.02033176e-02  5.46318069e-02  1.07822092e-02
+ -4.17357795e-02  3.09279654e-02 -1.32879522e-02  7.82874704e-04
+ -9.02104899e-02  5.65580539e-02 -4.03410979e-02  3.69427018e-02
+  6.51984848e-03  1.38485074e-01 -4.37076055e-02  5.54889292e-02
+ -5.47988750e-02 -2.12935619e-02  2.24671401e-02  1.68544520e-02
+ -4.54644114e-02 -1.51043404e-02 -2.04103179e-02  9.50659961e-02
+  2.41497182e-04  2.14187447e-02  5.86746633e-02 -5.22699319e-02
+  2.83531155e-02  2.57885400e-02 -5.75377122e-02  4.19856533e-02
+  4.98265103e-02  1.87953981e-03 -2.70119496e-02  8.03157687e-02
+  1.47714894e-02 -1.50903622e-02 -6.88570645e-03 -2.71644089e-02
+ -5.63487597e-03  1.54341962e-02  1.19326403e-02  1.11243669e-02
+  3.01787741e-02  3.12122051e-02  7.03727640e-03  4.78533544e-02
+ -2.48157587e-02 -1.49631798e-02 -6.08546622e-02  1.96878593e-02
+ -6.43822253e-02  1.24725588e-02 -4.22150940e-02  2.22441219e-02]"
+3,COMBINING DEEP LEARNING ON ORDER BOOKS WITH REINFORCEMENT LEARNING FOR PROFITABLE TRADING,"High-frequency trading is prevalent, where automated decisions must be made quickly to take advantage of price imbalances and patterns in price action that forecast near-future movements. While many algorithms have been explored and tested, analytical methods fail to harness the whole nature of the market environment by focusing on a limited domain. With the evergrowing machine learning field, many large-scale end-to-end studies on raw data have been successfully employed to increase the domain scope for profitable trading but are very difficult to replicate. Combining deep learning on the order books with reinforcement learning is one way of breaking down large-scale end-to-end learning into more manageable and lightweight components for reproducibility, suitable for retail trading. The following work focuses on forecasting returns across multiple horizons using order flow imbalance and training three temporal-difference learning models for five financial instruments to provide trading signals. The instruments used are two foreign exchange pairs (GBPUSD and EURUSD), two indices (DE40 and FTSE100), and one commodity (XAUUSD). The performances of these 15 agents are evaluated through backtesting simulation, and successful models proceed through to forward testing on a retail trading platform. The results prove potential but require further minimal modifications for consistently profitable trading to fully handle retail trading costs, slippage, and spread fluctuation.",High-frequency trading Automated decision-making Price imbalances Forecasting price movements Algorithmic trading Analytical methods in trading Market environment Machine learning in finance,"In 1992, an internet revolution [1, Chapter 1, Page 3] disrupted the financial trading industry when the first online brokerage service provider was launched, E*Trade. This quickly replaced traditional trading over the telephone due to its convenience and faster execution. Naturally, the skill ceiling rose as technology improved. Automated algorithmic trading at high speeds, High-Frequency Trading (HFT), was introduced and became popular in the mid-2000s. This trading method involves a bot constantly identifying tiny price imbalances and entering trades before valuations rapidly corrected themselves, ultimately accumulating small profits over time. In 2020, HFT represented 50% of the trading volume in US equity markets and between 24% and 43% of the trading volume in European equity markets while representing 58% to 76% of orders [2, Page 1]. Although the statistics reveal that HFT is very popular, it hides the fierce competition and immense difficulty- one cannot be perfect in this field. Someone is considered ahead if they find more promising opportunities sooner than their competitors, but these working strategies will not last forever. It is only temporary until a successor arrives, and soon, many will come to surpass. To stay relevant, you must always be the successor. Hence, the emphasis on quantitative research in financial institutions is extensive and in high demand. A portion of such research aims to identify profitable trading strategies arXiv:2311.02088v1 [q-fin.CP] 24 Oct 2023 Combining Deep Learning on Order Books with RL for Profitable Trading A PREPRINT that spot opportunities, enter trades, and manage those trades under a millisecond. Do these strategies exist? HFT started with hard-coded rules that overlooked the complex nature of financial markets. A justifiable urge to apply Deep Learning to HFT was later birthed, and as hardware improved along with an abundance of data, so did its potential. Lahmiri et al. [3] showcase Deep Learning accurately forecasting Bitcoin’s high-frequency price data. Kolm et al. [5] display remarkable results using Deep Learning to correctly predict high-frequency returns (alpha) at multiple horizons for 115 stocks traded on Nasdaq. However, only predicting returns is unlikely to help a desk trader because each trading signal’s validity would elapse before the trader could input their order. Reinforcement learning can be the key to interpreting this forecasting to execute high-frequency trades because it can learn an optimal strategy when given return predictions at multiple horizons. Independently, Bertermann [4] has trained a few profitable deep reinforcement learning agents using long, mid, and short-term mean-reverting signals as features. The following work combines Bertermann’s reinforcement learning with Kolm et al.’s alpha extraction and investigates its potential in a realistic retail trading setting. Section 1 describes the background and relevant literature to understand the problem while exploring different ideas. The fundamentals of supervised learning will be covered, including their typical pipeline, feed-forward networks, and many network architectures such as Recurrent Neural Networks, Convolutional Neural Networks, and Long Short-Term Memory. Next, reinforcement learning agents such as Q Learning, Deep Q Networks, and Double Deep Q Networks will be touched upon. Limit Order Markets will also be explained to provide more context of the problem. Furthermore, relevant literature will be reviewed which describes different order book feature extraction methods, recent works on using supervised learning and reinforcement learning on the order books, and finally a list of popular performance metrics used to evaluate trading agents. Concepts found in the related work that might not be used in the investigation are also covered in the background for completion and keeping the viewer up to speed. Section 2 discusses the design and implementation of the solutions, which first describes the data collection pipeline and evaluates the quality of the collected data. Next, the design of the supervised learning and reinforcement learning components (including the three agents) are covered while making certain modifications as recommended in the related work. Finally, the section ends with the testing methodology of the models using backtesting and forward testing. Section 3 covers the optimisation strategy for tuning the hyperparameters within the supervised learning and reinforcement learning components. All the parameters are explained here, and reasons for setting certain parameters’ values without tuning them are justified. Section 4 evaluates all 15 models after setting their best parameter values using the methodology proposed in Section 4. The performance of the supervised learning and reinforcement learning models are investigated independently through backtesting to support the claims and results observed by their original designers, although modifications were made to try to improve them. This Section also covers the evaluation after combining both models and compares them to a random agent benchmark using statistical testing. The best models were taken through to forward testing and the results are presented. Finally, the agents are explained using heatmaps to investigate what values of input lead to buying and selling behaviours. Section 5 concludes the findings showing potential and highlights the limitations of this work. Further improvements are proposed to have these algorithms overcome the difficulty of submitting high-frequency trades profitably at the retail level.","This research sets out to combine deep learning on the order books with reinforcement learning to break down large-scale end-to-end models into more manageable and lightweight components for reproducibility, suitable for retail trading. An alpha extraction model forecasts return over the next six timesteps, and a temporal-difference agent will use these values to provide trading signals (buy or sell). One alpha extraction model and three different temporal-difference agents were trained for five financial instruments, giving a total of 20 models and 15 trading bots. The results for the different components align with the related literature. The alpha extraction outcome aligns with Kolm et al.’s [5] results using the MLP architecture, and the rankings amongst the agents align with Bertermann’s [4] findings. Only ten weeks of order flow imbalance data were collected for each instrument and split into 8:1:1 for training, validation, and testing. Grid search was used to find optimal parameters for each model, and more likely than not, values suggested in the literature were found to be best. Overall, backtesting with retail costs produced promising results, with profitability achieved using Q Learning on GBPUSD and EURUSD, but failed to bring similar results during forward testing. This is due to long processing times of making predictions, sometimes skipping two timesteps, but this can be mitigated with a different infrastructure to using web applications as well as with more advanced hardware. The current setup uses Intel i9-9900K CPU @ 3.60GHz 16GB and Nvidia GeForce RTX 2080 Super 16GB. The agents, on top of learning expected patterns, also learned patterns that were undesirable due to exposure to receiving positive rewards beyond the horizon window despite observing contradictory forecasting values. This can be solved by increasing this horizon window to beyond six. Other improvements consist of collecting more data, using better hardware, using rate of return instead to standardise across multiple instruments, using another trading platform with lower commissions, and expanding the action space to also not be in any position (i.e. three actions of buy, sell, and not be in a position instead of just buy and sell). 31 Combining Deep Learning on Order Books with RL for Profitable Trading A PREPRINT With regard to legal, social, ethical, and professional considerations, the only concern is scraping raw limit order book data from the CTrader platform and storing it locally, which raises ethical issues. Scraping is not a legal violation of their EULA [40] as this work is for research and not commercial gain. This issue was minimised by only storing wanted order flow imbalance features inferred from the raw limit order book states instead of storing actual raw states directly.","[1] Jennifer Wu, Michael Siegel and Joshua Manion Online Trading: An Internet Revolution https://web.mit.edu/smadnick/www/wp2/2000-02-SWP%234104.pdf MIT, Cambridge, MA, June 1999 pages [2] Johannes Breckenfelder How does competition among high-frequency traders affect market liquidity? https://www.ecb.europa.eu/pub/economic-research/resbull/2020/html/ ecb.rb201215∼210477c6b0.en.pdf, European Central Bank, December 2020 pages [3] Salim Lahmiri and Stelios Bekiros Deep Learning Forecasting in Cryptocurrency High-Frequency Trading https://link.springer.com/article/10.1007/s12559-021-09841-w Springer, February 2021 pages [4] Arvind Bertermann Reinforcement Learning Trading Strategies with Limit Orders and High Frequency Signals, Imperial College London, September 2021 pages [5] Petter N. Kolm, Jeremy Turiel and Nicholas Westray Deep Order Flow Imbalance: Extracting Alpha at Multiple Horizons from the Limit Order Book, https://papers.ssrn.com/sol3/papers.cfm?abstract id=3900141 SSRN, August 2021 pages [6] Solveig Badillo, Balazs Banfai, Fabian Birzele, Iakov I. Davydov, Lucy Hutchinson, Tony Kam-Thong, Juliane Siebourg-Polster, Bernhard Steiert, and Jitao David Zhang, An Introduction to Machine Learning https://www.researchgate.net/publication/339680577 An Introduction to Machine Learning, Clinical Pharmacology & Therapeutics, Janurary 2020 pages [7] Rian Dolphin, LSTM Networks | A Detailed Explanation https://towardsdatascience.com/lstm-networks-a-detailed-explanation-8fae6aefc7f9, Medium Blogs, October 2020 pages 38 Combining Deep Learning on Order Books with RL for Profitable Trading A PREPRINT [8] Jonathan Johnson, Top Machine Learning Architectures Explained https://www.bmc.com/blogs/machine-learning-architecture/ BMC Blogs, September 2020 pages [9] Simeon Kostadinov, Understanding Backpropagation Algorithm https://towardsdatascience.com/understanding-backpropagation-algorithm-7bb3aa2f95fd, Medium Blogs, August 2019 pages [10] Jonathon Johnson, What’s a Deep Network? Deep Nets Explained https://www.bmc.com/blogs/deep-neural-network BMC Blogs, July 2020 pages [11] Aditi Mittal, Understanding RNN and LSTM https://aditi-mittal.medium.com/understanding-rnn-and-lstm-f7cdf6dfc14e Medium Blogs, October 2019 pages [12] Richard S. Sutton, Andrew G. Barto Reinforcement Learning: An Introduction The MIT Press, Cambridge, MA, 2018 pages [13] Reza Jafari, M.M. Javidi, Marjan Kuchaki Rafsanjani Using deep reinforcement learning approach for solving the multiple sequence alignment problem , Springer, June 2019 pages [14] Hado van Hasselt, Arthur Guez, and David Silver, Deep Reinforcement Learning with Double Q-Learningn, Proceedings of the AAAI conference on artificial intelligence, https://ojs.aaai.org/index.php/AAAI/article/view/10295, 2016 pages [15] Ben Hambly, Introduction to Limit Order Book Markets https://www.maths.ox.ac.uk/system/files/attachments/NUS-LOB19.pdf Oxford, 2019 pages [16] Mario Koppen ¨ The curse of dimensionality In 5th online world conference on soft computing in industrial applications (Vol. 1, pp. 4-8) https://www.class-specific.com/csf/papers/hidim.pdf, Sep 2000 pages [17] CMC Markets, Mean Reversion Trading Strategies, https://www.cmcmarkets.com/en-gb/trading-guides/mean-reversion, 2023 pages [18] CMC Markets, Momentum Trading Strategies, https://www.cmcmarkets.com/en-gb/trading-guides/momentum-trading, 2023 pages [19] Jaroslav Kohout, Volume Delta Reversal Trade Strategy, https://axiafutures.com/blog/volume-delta-reversal-trade-strategy/, 2022 pages [20] Rama Cont, Arseniy Kukanov, and Sasha Stoikov The Price Impact Of Order Book Events Journal Of Financial Econometrics 12.1, 2014 pages [21] Ymir Makinen, Juho Kanniainen, Moncef Gabbouj, and Alexandros Iosifidis, ¨ Forecasting Jump Arrivals In Stock Prices: New Attention-Based Network Architecture Using Limit Order Book Data, Quantitative Finance, 2019 pages 39 Combining Deep Learning on Order Books with RL for Profitable Trading A PREPRINT [22] Nikolaos Passalis, Anastasios Tefas, Juho Kanniainen, Moncef Gabbouj, and Alexandros Iosifidis, Temporal Logistic Neural Bag-Of-Features For Financial Time Series Forecasting Leveraging Limit Order Book Data Pattern Recognition Letters, 2020 pages [23] Dat Thanh Tran, Alexandros Iosifidis, Juho Kanniainen, and Moncef Gabbouj Temporal Attention-Augmented Bilinear Network For Financial Time-Series Data Analysis, IEEE Transactions On Neural Networks And Learning Systems, 2019 pages [24] Ruihong Huang and Tomas Polak, LOBSTER: Limit Order Book Reconstruction System https://papers.ssrn.com/sol3/papers.cfm?abstract id=1977207, 2011 pages [25] ByBitHelp, Introduction to TWAP Strategy https://www.bybithelp.com/en-US/s/article/Introduction-to-TWAP-Strategys, August 2023 pages [26] Micha¨el Karpe, Jin Fang, Zhongyao Ma, and Chen Wang, Multi-agent reinforcement learning in a realistic limit order book market simulation, Proceedings of the First ACM International Conference on AI in Finance, https://dl.acm.org/doi/abs/10.1145/3383455.3422570, 2020 pages [27] Thomas Spooner, John Fearnley, Rahul Savani, and Andreas Koukorinis, Market making via reinforcement learning https://arxiv.org/abs/1804.04216, Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018 pages [28] Mohammad Mani, Steve Phelps, and Simon Parsons, Applications of Reinforcement Learning in Automated Market-Making https://nms.kcl.ac.uk/simon.parsons/publications/conferences/gaiw19.pdf, Proceedings of the GAIW: Games, Agents and Incentives Workshops, Montreal, Canada, 2019 pages [29] Svitlana Vyetrenko, Shaojie Xu, Risk-Sensitive Compact Decision Trees for Autonomous Execution in Presence of Simulated Market Response https://arxiv.org/abs/1906.02312, Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, 2019 pages [30] Yuriy Nevmyvaka, Yi Feng, Micheal Kearns, Reinforcement learning for optimized trade execution https://dl.acm.org/doi/abs/10.1145/1143844.1143929, Proceedings of the 23rd international conference on Machine learning, 2006 pages [31] Auquan, Evaluating Trading Strategies https://medium.com/auquan/evaluating-trading-strategies-fe986062a96b Medium Blogs, January 2017 pages [32] Apoorva Singh and Rekhit Pachanekar Sharpe Ratio: Calculation, Application, Limitations https://blog.quantinsti.com/sharpe-ratio-applications-algorithmic-trading/#Sortino, QuantInsti Blogs, December 2019 pages [33] Ctrader FXPro, https://www.fxpro.com/, 2023 pages [34] TradingFX VPS, https://www.tradingfxvps.com/, 2023 pages 40 Combining Deep Learning on Order Books with RL for Profitable Trading A PREPRINT [35] Pytorch, https://pytorch.org/, 2023 pages [36] Nvidia CUDA, https://developer.nvidia.com/cuda-zone, 2023 pages [37] OpenAI Gym, https://www.gymlibrary.dev/index.html, 2023 pages [38] Flask, https://flask.palletsprojects.com/en/2.3.x/, 2023 pages [39] Paul Billiet, The Mann-Whitney U-test – Analysis of 2-Between-Group Data with a Quantitative Response Variable, https://psych.unl.edu/psycrs/handcomp/hcmann.PDF, University of Nebraska-Lincoln, 2003 pages [40] CTrader, End-User License Agreement, https://ctrader.com/eula/, 2023 pages",13.9.2023,https://arxiv.org/pdf/2311.02088.pdf,"[-3.58898379e-02 -5.33776879e-02 -2.55429316e-02  2.84039285e-02
+  2.72241086e-02  6.58263732e-03 -3.11707724e-02 -1.82692762e-02
+  8.77434760e-02 -2.73383856e-02  1.06630484e-02 -1.48623344e-02
+ -1.09670032e-02  1.20328143e-02  3.67650986e-02  1.25704035e-02
+ -5.73254414e-02 -5.67750819e-02  1.07580703e-02 -2.02700794e-02
+  9.31313075e-03 -4.28961776e-02 -6.04833886e-02 -3.95108461e-02
+  5.00195622e-02 -3.33409533e-02  5.61240222e-03 -6.48115501e-02
+ -5.86559288e-02 -2.33353674e-01  4.75808941e-02  1.76183432e-02
+  2.15035304e-02 -1.32141588e-02 -5.28522115e-03  1.66836660e-02
+ -5.98896444e-02  6.90797996e-03 -3.26753780e-03  3.61657739e-02
+ -2.10523661e-02  3.72767560e-02  1.41684283e-02  1.02322723e-03
+  2.36561447e-02 -6.75101057e-02 -3.23060229e-02 -3.33496556e-02
+ -3.77060771e-02  2.66567059e-02  1.79438107e-02 -8.80343467e-03
+  2.34405380e-02  3.81574929e-02 -9.68913781e-04  2.11267453e-02
+  3.49385925e-02  1.98871978e-02  3.47893313e-02  6.33673444e-02
+  6.73003122e-02  4.39956784e-02 -1.49234295e-01  1.90848112e-02
+ -8.70702788e-03  3.26061770e-02 -1.54862190e-02  1.69187356e-02
+ -3.02720908e-02  6.83670565e-02  4.86727320e-02  1.62364133e-02
+ -1.20569638e-03  7.59020168e-03  2.37923078e-02  2.23488454e-02
+  2.02239249e-02 -3.70624289e-02 -2.26665642e-02 -7.83751532e-03
+  4.43889350e-02 -3.83506966e-04 -2.32571028e-02 -3.99299935e-02
+ -2.50125099e-02 -3.97803681e-03  3.87530737e-02 -2.30096038e-02
+  4.93481830e-02 -3.39540951e-02  1.37016522e-02  4.06775624e-02
+ -3.25589329e-02 -5.57940360e-03 -2.61322763e-02  2.32754089e-02
+  1.60381403e-02  3.32678147e-02 -4.01084200e-02  3.58661711e-01
+  7.62746409e-02  1.32056437e-02 -6.80230260e-02  1.92432478e-02
+ -1.91671192e-03 -1.82670224e-02 -3.30956168e-02  7.01977918e-03
+  3.20898904e-03 -3.19437794e-02 -6.06969781e-02 -1.92617085e-02
+  6.57041743e-02 -1.28241600e-02 -6.10359944e-02  1.19143305e-02
+  3.08900923e-02 -3.16864513e-02  7.10067153e-02  2.43582595e-02
+  4.46530059e-03  3.97156142e-02  3.15851904e-02 -9.03386739e-04
+ -3.79530154e-02 -1.91280190e-02 -8.54462478e-03  1.11961991e-01
+ -1.75846834e-02  8.04519281e-03 -1.20702581e-02 -9.54439957e-03
+ -7.80741647e-02  2.41687000e-02  2.14980580e-02  4.92754467e-02
+ -4.35003079e-02  5.70351025e-03  4.16194234e-04  4.07962427e-02
+ -1.13951787e-01  5.40982895e-02  3.79552767e-02 -5.23680300e-02
+ -6.66915774e-02  8.21245089e-02  7.94074386e-02 -3.10575422e-02
+ -7.26141855e-02 -4.44884114e-02 -2.47712042e-02 -1.19814472e-02
+  2.52693482e-02 -9.54490751e-02 -1.65519677e-02  2.40258183e-02
+  4.06230018e-02  8.45823251e-03 -1.82637405e-02 -3.57611477e-02
+ -9.44696367e-02 -2.25388724e-02 -5.78881092e-02  1.33347929e-01
+ -4.35621142e-02 -1.00362629e-01 -5.47557211e-05 -2.12209672e-02
+ -4.41458412e-02 -2.71400139e-02 -7.82177318e-04  8.15521181e-03
+ -1.67750586e-02 -1.73472648e-03  2.67255716e-02 -1.80999120e-03
+ -6.97337463e-02 -6.46117926e-02 -4.39504301e-03  2.93286480e-02
+  3.33259925e-02 -4.57981713e-02 -1.40095698e-02  3.91147919e-02
+  1.15576815e-02 -1.43180392e-03  4.67553549e-03  5.25185559e-03
+ -2.86468472e-02 -1.93691999e-02 -6.20811991e-03 -2.95866337e-02
+ -5.86608909e-02  7.29514360e-02  5.67975268e-03  1.05342194e-02
+ -2.74191890e-02  6.38206825e-02 -1.82265155e-02  2.24396922e-02
+ -2.45025065e-02  1.23204086e-02 -1.68869440e-02 -3.63607667e-02
+ -9.09494422e-03  2.92189755e-02  1.03949113e-02 -1.55582540e-02
+  6.06591851e-02  5.05605014e-03  3.41233164e-02  4.01317999e-02
+  2.25101635e-02 -3.52942050e-02 -2.34668422e-02 -1.82601325e-02
+  9.83905979e-03  6.36212900e-02 -3.65521684e-02  3.40570584e-02
+  2.28887610e-02  9.28531960e-03  1.93429796e-03 -3.30470681e-01
+ -4.04177792e-02 -8.25882051e-03  3.73706943e-03  5.79107478e-02
+ -5.12422249e-02  6.02266788e-02 -2.80034617e-02  4.99390401e-02
+  8.52908418e-02  1.34487683e-02  1.85232684e-02 -1.91916153e-02
+ -1.03974557e-02  2.00157724e-02 -1.68027151e-02 -1.56246312e-02
+  5.33529604e-03 -2.27672942e-02  2.63348315e-02 -5.21637909e-02
+  2.07466930e-02 -1.55784637e-02 -1.15715951e-01  4.26853746e-02
+  4.35874946e-02  1.31072879e-01 -7.29633197e-02 -2.18337458e-02
+  1.67117314e-03 -2.17596497e-02  8.36110115e-03  8.92357668e-04
+  4.06081639e-02  5.70145808e-02 -5.32572437e-03  1.25175893e-01
+  3.78817250e-03 -1.56358033e-02 -4.18030024e-02 -6.04275949e-02
+ -2.14056727e-02  9.39918857e-04 -4.48889807e-02 -1.74497478e-02
+  2.45583132e-02 -1.30007081e-02  3.27470787e-02 -5.69302682e-03
+  7.95902088e-02  4.32186201e-02 -3.45218182e-02  1.94673184e-02
+ -2.35406961e-03 -1.98169202e-02 -4.06351201e-02 -5.86134754e-03
+  2.59226467e-02 -2.86197234e-02 -1.75581723e-02  4.08143504e-03
+ -5.30067459e-02  2.78865583e-02  1.55959651e-02  2.09401697e-02
+ -5.38686803e-03  6.00651801e-02 -4.17005196e-02 -8.99696536e-03
+  2.86269989e-02 -1.57700162e-02  7.55948052e-02 -2.44106911e-02
+  2.33221073e-02  3.50277461e-02  9.67894774e-03  4.69416678e-02
+ -3.90223972e-03 -6.48764372e-02  6.05231449e-02  8.82425979e-02
+  2.25122217e-02  4.70780879e-02  9.28306952e-02  2.11008396e-02
+ -5.96526719e-04  6.78121969e-02 -6.07702695e-02  4.91517521e-02
+  7.47089982e-02 -3.08466740e-02 -2.52588943e-04 -7.09590986e-02
+ -1.82078332e-02  3.42916809e-02  4.34014574e-02 -2.33193904e-01
+  1.35724749e-02 -3.16917039e-02  9.50877890e-02  2.87774857e-02
+ -7.34926993e-03  2.84588672e-02 -3.80876064e-02 -3.69575098e-02
+ -4.67893388e-03 -2.49566343e-02  2.00499296e-02  7.55597949e-02
+ -9.93568823e-03 -9.76644270e-03 -2.71699373e-02  6.77588359e-02
+ -4.49092872e-02  2.79073268e-02  2.63407417e-02  1.03901969e-02
+  5.68092503e-02  1.86281681e-01  1.30533325e-02 -2.96779582e-03
+ -8.59510526e-03 -2.77607441e-02 -7.00779706e-02  2.47703996e-02
+ -2.91284900e-02  1.57590546e-02 -2.10324465e-03  7.04173967e-02
+  7.23826746e-03  2.55985446e-02  7.36157820e-02 -8.98544956e-03
+  3.44030336e-02  9.55338031e-03  1.21542696e-04  2.69470010e-02
+ -3.47671856e-04  1.72700472e-02 -4.67677694e-03  2.88290028e-02
+  3.81487218e-04  4.98642493e-03 -4.20681909e-02 -5.49328104e-02
+  6.05684146e-03 -2.07430646e-02 -1.07949032e-02 -2.60938983e-02
+ -1.17423162e-02  7.07990839e-04 -4.54951217e-03 -4.18332918e-03
+ -1.44831575e-02 -8.96615721e-03 -3.38528305e-02  7.16599869e-03
+ -8.35594982e-02  9.91846249e-03 -5.15970886e-02 -2.87643299e-02]"
+4,Simulation schemes for the Heston model with Poisson conditioning,"Exact simulation schemes under the Heston stochastic volatility model (e.g., Broadie–Kaya and Glasserman–Kim) suffer from computationally expensive modified Bessel function evaluations. We propose a new exact simulation scheme without the modified Bessel function, based on the observation that the conditional integrated variance can be simplified when conditioned by the Poisson variate used for simulating the terminal variance. Our approach also enhances the lowbias and time discretization schemes, which are suitable for pricing derivatives with frequent monitoring. Extensive numerical tests reveal the good performance of the new simulation schemes in terms of accuracy, efficiency, and reliability when compared with existing methods.","(B) Finance, Heston model, exact simulation, Poisson conditioning, gamma expansion","The Heston (1993) stochastic volatility model relaxes the constant volatility assumption in the Black–Scholes (BS) model by taking the instantaneous variance to follow the square root diffusion process with mean reversion, commonly called the Cox–Ingersoll–Ross (CIR, Cox et al. (1985)) process. It is the most popular stochastic volatility model for market practitioners because of its analytic tractability in computing the prices of European options. Recently, the multifactor versions of the model have received attention as they can better explain the stylized facts observed in the derivative market (Da Fonseca et al., 2008; Christoffersen et al., 2009; Trolle and Schwartz, 2009; Jaber, 2019). Owing to the existence of a closed-form formula for the characteristic function of the log-asset price, model calibration of market-observable European option prices can be performed efficiently using the Fourier inversion algorithm (Lewis, 2000; Lord and Kahl, 2010). For pricing path-dependent options under the Heston model, Monte Carlo (MC) simulations are often used. However, the standard Euler and Milstein time discretization simulation schemes suffer from a high bias owing to the square root of the diffusion function in the variance process. Negative values of variance from the simulation must be heuristically set to zero before taking the square root of variance. In addition, the square root function violates the Lipschitz condition; therefore, the convergence properties of the discretization scheme may not be guaranteed. There have been numerous fixes to these issues to minimize the discretization biases. A comprehensive review of these discretization schemes using various fixes can be found in Lord et al. (2010). ∗Correspondence. Tel: +86-755-2603-0568, Address: Rm 755, Peking University HSBC Business School, University Town, Nanshan, Shenzhen 518055, China Email addresses: jaehyuk@phbs.pku.edu.cn (Jaehyuk Choi), maykwok@hkust-gz.edu.cn (Yue Kuen Kwok) Preprint submitted to EJOR October 31, 2023 arXiv:2301.02800v2 [q-fin.MF] 5 Nov 2023 A major breakthrough was made by Broadie and Kaya (2006) in the simulation of the Heston model from its exact distribution. Their “exact” simulation1 procedures consist of three steps: (i) sampling of the terminal variance conditional on the initial variance, (ii) sampling the integrated variance conditional on the initial and terminal variance values, and (iii) sampling the asset price process conditional on the variance and integrated variance. As the exact simulation approach avoids simulation bias, simulation errors remain inversely proportional to the square root of the computational time budget. However, the Broadie–Kaya exact simulation algorithm is not competitive in accuracy-speed comparison since it requires extensive computational time to sample the conditional integrated variance via the numerical inversion of the Laplace transform in each simulation path. To improve computational efficiency, one may use a caching technique to sample the terminal variance and conditional integrated variance via precomputation and interpolation of the appropriate inverse distribution functions (Van Haastrecht and Pelsser, 2010; Zeng et al., 2023). Despite its limitations, Broadie and Kaya (2006)’s pioneering work triggers the construction of exact simulation schemes for other stochastic volatility models, such as the stochastic-alpha-beta-rho (SABR) (Cai et al., 2017; Choi et al., 2019), 3/2 (Baldeaux, 2012; Zeng et al., 2023), Wishart (Kang et al., 2017), and Ornstein–Uhlenbeck-driven stochastic volatility model (Li and Wu, 2019; Choi, 2023). Based on Pitman and Yor (1982)’s decomposition of Bessel bridges, Glasserman and Kim (2011) show that conditional integrated variance can be expressed as gamma expansions (GE). Significant computational speedup can be achieved by sampling the conditional integrated variance via the sums of the mixtures of gamma random variates (with an approximation of the truncated terms). In a related study, Malham et al. (2021) construct a new series expansion for the conditional integrated variance in terms of double infinite weighted sums of independent random variables through a measure change and decomposition of squared Bessel bridges. There have been various efforts to improve the time discretization schemes. Tse and Wan (2013) propose a low-bias simulation algorithm by approximating the conditional integrated variance with an Inverse Gaussian (IG) variate via matching mean and variance. With its lower computational cost per time step, the IG scheme can be used as a multiperiod scheme for pricing path-dependent options. Andersen (2008) constructs a time discretization scheme, where the variance at discrete time points is simulated via the quadratic-exponential (QE) approximation with martingale correction. However, he only uses the trapezoidal rule to approximate the conditional integrated variance. Such an approximation is acceptable when the time step is small. There is no one-size-fits-all solution among the simulation approaches. Each simulation method has its own advantages depending on the monitoring frequency of the derivatives to price. Tse and Wan (2013) conclude that the decision among GE, IG, or QE schemes depends on the compromise between computational cost and bias. The exact GE scheme is widely accepted as the best choice for European-style derivatives. Despite the high computational cost per step, we simply need to simulate just one step up to expiry. For path-dependent derivatives with frequent monitoring, the time-discretized QE scheme may be a better choice owing to its low computational cost per step. When the number of monitoring instants is moderate, one may 1The term “exact” simulation was coined by Broadie and Kaya (2006). Despite the name, the overall simulation algorithm still requires numerical approximations such as the use of discrete Fourier inversion algorithm. They used the term “exact” to distinguish their algorithm from the usual Euler discretized schemes that involve discretization bias. The term “exact” has gained wide acceptance in the literature as the norm in several dozen papers that deal with simulation methods. Therefore, we also follow the common norm of “exact”. 2 choose to use the low-bias IG scheme as a compromise between exact and time discretization schemes. Despite the advances in the Heston simulation algorithms, the computational efficiency still has room for improvement. The simulation methods mentioned above, except for the QE scheme, involve computationally demanding evaluations of the modified Bessel function. This has been criticized as a bottleneck in computation. In particular, the GE scheme (Glasserman and Kim, 2011) involves the Bessel random variable, the sampling of which takes up a significant portion of the computation time for the same reason. We propose enhanced Heston simulation schemes in all ranges based on the key observation that the conditional integrated variance can be further simplified when conditioned by the Poisson variate involved in the terminal variance. Consequently, the computationally trivial Poisson variate replaces the Bessel variate in the GE scheme (Glasserman and Kim, 2011). In addition, by adopting the IG approximation (Tse and Wan, 2013) for series truncation, the Poisson-conditioned GE scheme significantly enhances both speed and accuracy. The IG scheme (Tse and Wan, 2013) is a special case of the new method, but the enhanced IG scheme no longer requires the modified Bessel function for the mean and variance calculations. Broadie and Kaya (2006)’s Laplace inversion scheme can also benefit from our approach since Poisson conditioning removes the modified Bessel function from the Laplace transform. We also propose a Poisson-conditioned time discretization method with the corresponding martingale correction method, which is highly efficient for pricing derivatives with frequent monitoring. Our new time discretization scheme compares favorably with Andersen (2008)’s QE scheme. The contributions of this study are summarized as follows. • We propose the Poisson-conditioned GE scheme, an enhanced exact simulation scheme that achieves significant computational speedup by simplifying the conditional integrated variance via Poisson conditioning, thus avoiding the simulation of the tedious Bessel random variable required in earlier GE schemes. • We improve the accuracy of the Poisson-conditioned GE by using the IG approximation (Tse and Wan, 2013) in the series truncation procedure. • We propose a Poisson-conditioned time discretization scheme for pricing discretely monitored path-dependent derivatives. • Our enhanced schemes can flexibly be applied to the multifactor Heston model when the variance factors are independent with each other. The remainder of this paper is organized as follows. In Section 2, we introduce the Heston model and its analytical properties. We then review existing simulation algorithms and discuss the intrinsic computational challenges. In Section 3, we show how the Poisson conditioning can enhance existing algorithms. In Section 4, we present extensive numerical tests that compare the performance of the Poisson-conditioned simulation schemes with existing simulation schemes for pricing European options and discretely monitored variance swaps. In Section 5, we extend our schemes to a class of the multifactor Heston models. Finally, we conclude this paper in Section 6.","Monte Carlo simulation under the Heston model has been a widely studied topic, and several approaches are available for different monitoring frequencies: exact (Broadie and Kaya, 2006; Glasserman and Kim, 2011), low-bias (Tse and Wan, 2013), and time discretization (Andersen, 2008) schemes. Among the existing simulation schemes, however, they suffer from computationally expensive evaluations of the modified Bessel function and/or Bessel random variables arising from the square-root variance process. Based on the observation that the conditional integrated variance can be simplified when conditioned by the Poisson variate used for simulating the terminal variance, we propose new simulation methods that enhance the existing methods in all spectra. Poisson-conditioned GE is an exact simulation scheme that enhances Glasserman and Kim (2011)’s GE scheme. It achieves significant speedup by expressing the conditional integrated variance without the Bessel random variable, which represents the computational bottleneck. The adoption of the IG approximation (Tse and Wan, 2013) for the truncation approximation improves numerical accuracy. A special case of the Poisson-conditioned GE scheme is reduced naturally to Tse and Wan (2013)’s low-bias scheme but without the Bessel function. As the Laplace transform of the conditional integrated variance can also be expressed without the Bessel function, Broadie and Kaya (2006)’s scheme can be expedited. The Poisson-conditioned time discretization scheme is proposed as an alternative to Andersen (2008)’s QE scheme. Our comprehensive numerical tests illustrate the strong competitiveness of our schemes in speedaccuracy comparison among existing schemes for pricing derivatives under the Heston model. In addition to numerical efficiency, our new Heston simulation schemes are simple and straightforward for practitioners to implement since they involve only the elementary functions and random variables. We conclude this paper with potential applications of our findings beyond the stochastic volatility model. Glasserman and Kim (2011) already mentioned that the simulation schemes for the integrated variance can be used in the interest rate term-structure model (Cox et al., 1985) and stochastic default intensity model. Our simulation methods with Poisson conditioning can further improve speed and accuracy in these applications as well. In particular, Dassios and Zhao (2017) criticize the GE method for its truncation error. As the IG approximation approximates the truncation accurately, we expect that the Poisson-conditioned GE can also be useful in the credit intensity simulation.","Andersen, L., 2008. Simple and efficient simulation of the Heston stochastic volatility model. Journal of Computational Finance 11, 1–42. doi:10.21314/JCF.2008.189. Baldeaux, J., 2012. Exact simulation of the 3/2 model. International Journal of Theoretical and Applied Finance 15, 1250032. doi:10.1142/S021902491250032X. Ball, C.A., Roma, A., 1994. Stochastic volatility option pricing. Journal of Financial and Quantitative Analysis 29, 589–607. doi:10.2307/2331111. Bernard, C., Cui, Z., 2014. Prices and aymptotics for discrete variance swaps. Applied Mathematical Finance 21, 140–173. doi:10.1080/1350486X.2013.820524. Broadie, M., Kaya, O., 2006. Exact simulation of stochastic volatility and other affine jump ¨ diffusion processes. Operations Research 54, 217–231. doi:10.1287/opre.1050.0247. Cai, N., Song, Y., Chen, N., 2017. Exact simulation of the SABR model. Operations Research 65, 931–951. doi:10.1287/opre.2017.1617. Choi, J., 2023. A New Exact Simulation of the Ornstein-Uhlenbeck Driven Stochastic Volatility Model Using the Karhunen-Lo`eve Expansions. Working Paper. Peking University HSBC Business School. Choi, J., Liu, C., Seo, B.K., 2019. Hyperbolic normal stochastic volatility model. Journal of Futures Markets 39, 186–204. doi:10.1002/fut.21967. Christoffersen, P., Heston, S., Jacobs, K., 2009. The shape and term structure of the index option smirk: Why multifactor stochastic volatility models work so well. Management Science 55, 1914–1932. doi:10.1287/mnsc.1090.1065. 27 Cox, J.C., Ingersoll, Jr, J.E., Ross, S.A., 1985. A theory of the term structure of interest rates. Econometrica 53, 385–407. doi:10.2307/1911242. Da Fonseca, J., Grasselli, M., Tebaldi, C., 2008. A multifactor volatility Heston model. Quantitative Finance 8, 591–604. doi:10.1080/14697680701668418. Dassios, A., Zhao, H., 2017. Efficient simulation of clustering jumps with CIR intensity. Operations Research 65, 1494–1515. doi:10.1287/opre.2017.1640. Glasserman, P., Kim, K.K., 2011. Gamma expansion of the Heston stochastic volatility model. Finance and Stochastics 15, 267–296. doi:10.1007/s00780-009-0115-y. Heston, S.L., 1993. A closed-form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies 6, 327–343. doi:10.1093/rfs/6. 2.327. Jaber, E.A., 2019. Lifting the Heston model. Quantitative Finance 19, 1995–2013. doi:10. 1080/14697688.2019.1615113. Kahl, C., J¨ackel, P., 2006. Fast strong approximation Monte Carlo schemes for stochastic volatility models. Quantitative Finance 6, 513–536. doi:10.1080/14697680600841108. Kang, C., Kang, W., Lee, J.M., 2017. Exact simulation of the Wishart multidimensional stochastic volatility model. Operations Research 65, 1190–1206. doi:10.1287/opre.2017. 1636. Kwok, Y.K., Zheng, W., 2022. Pricing Models of Volatility Products and Exotic Variance Derivatives. Chapman & Hall/CRC Financial Mathematics Series, New York. Lewis, A.L., 2000. Option Valuation under Stochastic Volatility: With Mathematica Code. Newport Beach, CA. Lewis, A.L., 2019. Heston Model Reference Prices. URL: https://financepress.com/2019/ 02/15/heston-model-reference-prices/. Li, C., Wu, L., 2019. Exact simulation of the Ornstein–Uhlenbeck driven stochastic volatility model. European Journal of Operational Research 275, 768–779. doi:10.1016/j.ejor.2018. 11.057. Lord, R., Kahl, C., 2010. Complex logarithms in Heston-like models. Mathematical Finance 20, 671–694. doi:10.1111/j.1467-9965.2010.00416.x. Lord, R., Koekkoek, R., Dijk, D.V., 2010. A comparison of biased simulation schemes for stochastic volatility models. Quantitative Finance 10, 177–194. doi:10.1080/ 14697680802392496. Malham, S.J.A., Shen, J., Wiese, A., 2021. Series expansions and direct inversion for the Heston model. SIAM Journal on Financial Mathematics 12, 487–549. doi:10.1137/19M126791X. Michael, J.R., Schucany, W.R., Haas, R.W., 1976. Generating random variates using transformations with multiple roots. The American Statistician 30, 88–90. doi:10.1080/00031305. 1976.10479147. 28 Pitman, J., Yor, M., 1982. A decomposition of Bessel Bridges. Zeitschrift f¨ur Wahrscheinlichkeitstheorie und Verwandte Gebiete 59, 425–457. doi:10.1007/BF00532802. Recchioni, M.C., Iori, G., Tedeschi, G., Ouellette, M.S., 2021. The complete Gaussian kernel in the multi-factor Heston model: Option pricing and implied volatility applications. European Journal of Operational Research 293, 336–360. doi:10.1016/j.ejor.2020.11.050. Trolle, A.B., Schwartz, E.S., 2009. A General Stochastic Volatility Model for the Pricing of Interest Rate Derivatives. Review of Financial Studies 22, 2007–2057. doi:10.1093/rfs/ hhn040. Tse, S.T., Wan, J.W.L., 2013. Low-bias simulation scheme for the Heston model by Inverse Gaussian approximation. Quantitative Finance 13, 919–937. doi:10.1080/14697688.2012. 696678. Van Haastrecht, A., Pelsser, A., 2010. Efficient, almost exact simulation of the Heston stochastic volatility model. International Journal of Theoretical and Applied Finance 13, 1–43. doi:10. 1142/S0219024910005668. Willard, G.A., 1997. Calculating prices and sensitivities for path-independent derivatives securities in multifactor models. Journal of Derivatives 5, 45–61. doi:10.3905/jod.1997.407982. Zeng, P., Xu, Z., Jiang, P., Kwok, Y.K., 2023. Analytical solvability and exact simulation in models with affine stochastic volatility and L´evy jumps. Mathematical Finance 33, 842–890. doi:10.1111/mafi.12387.",05.11.2023,https://arxiv.org/abs/2301.02800,"[-5.09821177e-02 -1.35054085e-02 -9.48712975e-03  3.36587466e-02
+  4.78569716e-02 -2.58619059e-03 -2.80675460e-02 -1.19766891e-02
+  7.04116970e-02 -5.26694756e-04  5.60693070e-02 -3.75298155e-03
+  4.70372736e-02  2.62934174e-02 -2.82275816e-03  1.85886007e-02
+ -5.67416474e-02 -4.13769744e-02 -6.72282279e-02  3.47725525e-02
+  2.45894678e-02 -3.32852378e-02 -1.05523486e-02 -2.53779124e-02
+  7.82500654e-02 -4.51345881e-03  5.29795848e-02  1.53493704e-02
+ -1.45872133e-02 -2.15389475e-01  3.36862877e-02 -3.12718153e-02
+  2.70005106e-03  1.94955543e-02 -1.63726183e-03 -2.46086391e-03
+ -1.09890234e-02 -1.34042613e-02  2.37773042e-02  1.70112699e-02
+ -4.05350849e-02 -2.43060496e-02 -1.77340731e-02 -2.05963533e-02
+ -3.85381049e-04 -9.50596780e-02 -3.93889919e-02  1.25280209e-02
+ -5.30202352e-02  1.58257112e-02 -1.18074166e-02  5.81320189e-03
+ -1.84478574e-02 -2.96489615e-02 -1.67585742e-02  6.90595210e-02
+  5.31852199e-03  4.08384278e-02  2.18552519e-02  5.23266383e-03
+  4.28207312e-03  7.06838742e-02 -1.26817301e-01  5.49811274e-02
+ -5.83766634e-03  9.85883269e-03  6.63108891e-03 -1.39008087e-04
+  6.61514187e-03  1.30514726e-02 -9.44222882e-03 -4.72740829e-03
+ -1.66066568e-02  4.01592851e-02  3.86281237e-02 -5.57668274e-03
+  1.61904506e-02 -2.60893051e-02 -4.37821262e-02  5.56339659e-02
+ -7.53402524e-03  1.18549606e-02 -1.72747076e-02  7.30427820e-03
+  2.86359456e-04 -3.41379307e-02  2.71060131e-02 -7.08563160e-03
+  1.57610290e-02 -1.89270582e-02  6.12405911e-02  4.34323847e-02
+  9.68521927e-03 -2.26089731e-02 -1.10863801e-02  5.65456860e-02
+  2.88303532e-02 -2.22222432e-02  2.32034009e-02  3.70165586e-01
+  2.51428708e-02  6.07229322e-02 -4.13294416e-03 -1.27307314e-04
+  1.55975670e-02 -3.64190340e-03 -6.99202418e-02  4.60453145e-03
+  2.73102149e-02 -2.46277526e-02 -2.45941076e-02 -4.80465479e-02
+  5.55078238e-02 -1.06259115e-01 -6.86814040e-02 -8.65607336e-03
+ -1.57892238e-02  5.68595203e-03  5.05205914e-02 -1.25722773e-02
+  3.21505219e-03  3.88614535e-02  1.79961547e-02  1.73773319e-02
+ -1.79500766e-02 -5.62294871e-02 -1.76817775e-02  3.99886481e-02
+ -3.14408243e-02 -8.97713006e-03  4.50656004e-02  3.47086079e-02
+ -3.58762629e-02  2.80226511e-03  9.49033070e-03  7.13094417e-03
+ -1.06565624e-01  3.33590410e-03 -8.68218206e-03  5.44923916e-02
+ -5.18327691e-02 -1.88694336e-02 -4.85838689e-02 -9.30495933e-02
+ -1.18760057e-02  3.27511542e-02  5.00010997e-02  1.43725928e-02
+ -3.25697474e-02 -4.54904214e-02  9.97947250e-03  3.85419130e-02
+  2.97865681e-02 -7.48015270e-02  7.82388151e-02 -4.76087183e-02
+  1.36151584e-02  4.02817987e-02 -2.82861143e-02  1.71882492e-02
+ -8.73255357e-02 -5.30168414e-02 -3.68741676e-02  1.34256437e-01
+ -6.73614070e-02 -5.15863970e-02 -4.52927826e-03  5.10839894e-02
+  1.42469276e-02 -3.35089117e-02  1.13493036e-02  3.91026139e-02
+ -4.72971015e-02 -2.19236799e-02  8.21424648e-02  3.53006290e-05
+ -1.59806088e-02 -5.08726649e-02  3.75302956e-02 -4.10324782e-02
+ -1.14393816e-03 -4.05234359e-02  9.19119921e-04  1.54124405e-02
+ -1.36863757e-02  3.94512638e-02 -6.40525063e-03 -1.78872235e-02
+ -1.35356542e-02 -6.60224073e-03 -2.72424575e-02 -3.81252654e-02
+  9.55000613e-03  6.97109774e-02  2.31108610e-02  2.49935091e-02
+ -5.41435778e-02  1.66165642e-02 -1.48169817e-02 -1.06731961e-02
+  1.72041189e-02 -1.12137981e-02 -7.69141912e-02  6.19674893e-03
+ -3.89233581e-03  4.15810645e-02  3.23873274e-02 -1.30664613e-02
+  4.02484909e-02  3.79544087e-02  1.20459101e-03 -2.12378912e-02
+  7.00535551e-02 -4.65394463e-03 -1.17746890e-02  7.57728517e-02
+  8.03271029e-03  4.20809090e-02 -2.71033440e-02  1.80698857e-02
+ -9.38429125e-03  2.26054881e-02  1.85461380e-02 -3.03061396e-01
+ -1.81349646e-02 -3.61048244e-02  1.33224092e-02  3.87633666e-02
+ -4.63302433e-02  1.52778411e-02 -4.01622355e-02  5.61201312e-02
+  2.23315228e-02 -5.85202081e-03  2.36607771e-02 -6.51714439e-03
+ -6.70602173e-02  3.57782021e-02 -1.89255942e-02 -8.60975962e-03
+ -2.45717764e-02 -6.36081323e-02  3.43333483e-02 -7.26573095e-02
+ -1.75217562e-03 -6.57994226e-02 -3.33440043e-02  7.26910233e-02
+  6.01814389e-02  9.47887748e-02 -9.48764160e-02  7.32124150e-02
+ -1.35032926e-02 -2.42340136e-02 -4.26346101e-02  6.22330643e-02
+  1.11680977e-01  4.82412167e-02  2.10307296e-02  8.99085701e-02
+  1.39133714e-03  6.53626071e-03 -4.29568179e-02  1.96808297e-02
+  3.47790532e-02 -1.49287246e-02 -5.83817624e-02 -6.72656447e-02
+  1.09089250e-02  4.09223251e-02  3.60010453e-02 -6.24052696e-02
+  1.22150537e-02  3.58239189e-02 -1.40053155e-02  9.93580669e-02
+ -7.22170854e-03  7.76709095e-02 -5.14003746e-02 -2.80759181e-03
+  1.26237003e-02 -1.02069937e-02 -1.49901425e-02 -4.53095436e-02
+ -3.69162746e-02  3.57985683e-02 -1.38430363e-02 -2.41767317e-02
+ -4.21855859e-02  9.00066718e-02 -3.32900956e-02 -3.34271528e-02
+ -3.89583781e-02 -1.24596795e-02  6.37752712e-02 -2.41468120e-02
+  7.97591433e-02  2.59732790e-02  7.39593757e-03  4.33608890e-02
+  8.07544566e-05 -6.87913671e-02  4.47142161e-02  1.97048858e-02
+ -4.08712216e-03  7.67912045e-02  8.01500157e-02  1.67301446e-02
+  4.57499316e-03  3.59622613e-02  1.33602610e-02 -1.81026030e-02
+  3.18310037e-02 -7.33727589e-02  2.04380918e-02 -1.28789134e-02
+  2.60504261e-02  2.38739583e-03 -2.40309071e-02 -2.64239013e-01
+ -1.05404248e-03 -2.69301087e-02  8.39932188e-02 -3.69562507e-02
+ -3.72165255e-02 -6.22282084e-03 -7.34572038e-02 -7.52181634e-02
+  2.20241696e-02 -8.05668682e-02  7.20558241e-02  8.39806646e-02
+  1.12477858e-02 -1.96799207e-02 -2.99601648e-02 -1.97930336e-02
+ -3.38383429e-02  4.85179126e-02 -3.09444349e-02  4.68052961e-02
+ -1.93246249e-02  1.77003220e-01 -5.55171072e-02 -1.60734355e-02
+ -6.50826935e-03 -5.97158738e-04  7.66664557e-03  1.57115748e-03
+ -3.93588506e-02 -3.13056307e-03  1.08175771e-02  4.69519272e-02
+ -3.29791084e-02 -1.30314007e-02  3.29509899e-02 -4.25277948e-02
+  6.55585527e-02  1.66604966e-02 -3.85643467e-02  3.18453535e-02
+  4.83646020e-02 -1.35940348e-03  2.44328659e-02  7.95961246e-02
+ -2.94624968e-03  8.22839327e-03 -2.40520053e-02  1.08903898e-02
+  6.36732355e-02 -1.34111708e-02  1.67142041e-02  3.72329243e-02
+ -6.11356199e-02  4.72791158e-02  3.49972807e-02 -1.22014573e-02
+ -2.00067256e-02 -9.08154715e-03 -5.23971915e-02 -4.38988283e-02
+ -1.26172919e-02 -2.40509305e-02 -4.50139493e-02  3.62042598e-02]"
+5,Robust Speech Recognition via Large-Scale Weak Supervision,"We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zeroshot transfer setting without the need for any finetuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.",Speech processing systems Predictive modeling Internet audio transcripts Multilingual supervision Multitask learning Zero-shot transfer,"Progress in speech recognition has been energized by the development of unsupervised pre-training techniques exemplified by Wav2Vec 2.0 (Baevski et al., 2020). Since these methods learn directly from raw audio without the need for human labels, they can productively use large datasets of unlabeled speech and have been quickly scaled up to 1,000,000 hours of training data (Zhang et al., 2021), far more than the 1,000 or so hours typical of an academic supervised dataset. When fine-tuned on standard benchmarks, this approach has improved the state of the art, especially in a low-data setting. These pre-trained audio encoders learn high-quality representations of speech, but because they are purely unsupervised they lack an equivalently performant decoder mapping those representations to usable outputs, necessitating a finetuning stage in order to actually perform a task such as speech recognition1 . This unfortunately limits their usefulness and impact as fine-tuning can still be a complex process requiring a skilled practitioner. There is an additional risk with requiring fine-tuning. Machine learning *Equal contribution 1OpenAI, San Francisco, CA 94110, USA. Correspondence to: Alec Radford , Jong Wook Kim . 1Baevski et al. (2021) is an exciting exception - having developed a fully unsupervised speech recognition system methods are exceedingly adept at finding patterns within a training dataset which boost performance on held-out data from the same dataset. However, some of these patterns are brittle and spurious and don’t generalize to other datasets and distributions. In a particularly disturbing example, Radford et al. (2021) documented a 9.2% increase in object classification accuracy when fine-tuning a computer vision model on the ImageNet dataset (Russakovsky et al., 2015) without observing any improvement in average accuracy when classifying the same objects on seven other natural image datasets. A model that achieves “superhuman” performance when trained on a dataset can still make many basic errors when evaluated on another, possibly precisely because it is exploiting those dataset-specific quirks that humans are oblivious to (Geirhos et al., 2020). This suggests that while unsupervised pre-training has improved the quality of audio encoders dramatically, the lack of an equivalently high-quality pre-trained decoder, combined with a recommended protocol of dataset-specific finetuning, is a crucial weakness which limits their usefulness and robustness. The goal of a speech recognition system should be to work reliably “out of the box” in a broad range of environments without requiring supervised fine-tuning of a decoder for every deployment distribution. As demonstrated by Narayanan et al. (2018), Likhomanenko et al. (2020), and Chan et al. (2021) speech recognition systems that are pre-trained in a supervised fashion across many datasets/domains exhibit higher robustness and generalize much more effectively to held-out datasets than models trained on a single source. These works achieve this by combining as many existing high-quality speech recognition datasets as possible. However, there is still only a moderate amount of this data easily available. SpeechStew (Chan et al., 2021) mixes together 7 pre-existing datasets totalling 5,140 hours of supervision. While not insignificant, this is still tiny compared to the previously mentioned 1,000,000 hours of unlabeled speech data utilized in Zhang et al. (2021). Recognizing the limiting size of existing high-quality supervised datasets, recent efforts have created larger datasets for speech recognition. By relaxing the requirement of goldstandard human-validated transcripts, Chen et al. (2021) and Galvez et al. (2021) make use of sophisticated automated arXiv:2212.04356v1 [eess.AS] 6 Dec 2022 Robust Speech Recognition via Large-Scale Weak Supervision 2 pipelines to scale weakly supervised speech recognition to 10,000 and 30,000 hours of noisier training data. This trade-off between quality and quantity is often the right call. Although understudied so far for speech recognition, recent work in computer vision has demonstrated that moving beyond gold-standard crowdsourced datasets such as ImageNet (Russakovsky et al., 2015) to much larger but weakly supervised datasets significantly improves the robustness and generalization of models (Mahajan et al., 2018; Kolesnikov et al., 2020). Yet these new datasets are only a few times larger than the sum of existing high-quality datasets and still much smaller than prior unsupervised work. In this work we close that gap, scaling weakly supervised speech recognition the next order of magnitude to 680,000 hours of labeled audio data. We call our approach Whisper2 . We demonstrate models trained at this scale transfer well to existing datasets zeroshot, removing the need for any dataset-specific fine-tuning to achieve high-quality results. In addition to scale, our work also focuses on broadening the scope of weakly supervised pre-training beyond English-only speech recognition to be both multilingual and multitask. Of those 680,000 hours of audio, 117,000 hours cover 96 other languages. The dataset also includes 125,000 hours of X→en translation data. We find that for sufficiently large models there is no drawback and even benefits to joint multilingual and multitask training. Our work suggests that simple scaling of weakly supervised pre-training has been underappreciated so far for speech recognition. We achieve these results without the need for the self-supervision or self-training techniques that have been a mainstay of recent large-scale speech recognition work. To serve as a foundation for further research on robust speech recognition, we release inference code and models at the following URL: https://github.com/openai/ whisper.",Whisper suggests that scaling weakly supervised pretraining has been underappreciated so far in speech recognition research. We achieve our results without the need for the self-supervision and self-training techniques that have been a mainstay of recent large-scale speech recognition work and demonstrate how simply training on a large and diverse supervised dataset and focusing on zero-shot transfer can significantly improve the robustness of a speech recognition system,"Alcorn, M. A., Li, Q., Gong, Z., Wang, C., Mai, L., Ku, W.- S., and Nguyen, A. Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4845–4854, 2019. Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Chen, J., Chrzanowski, M., Coates, A., Diamos, G., et al. Deep speech 2: end-to-end speech recognition in english and mandarin. arxiv. arXiv preprint arXiv:1512.02595, 2015. Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F. M., Robust Speech Recognition via Large-Scale Weak Supervision 15 and Weber, G. Common voice: A massively-multilingual speech corpus. arXiv preprint arXiv:1912.06670, 2019. Babu, A., Wang, C., Tjandra, A., Lakhotia, K., Xu, Q., Goyal, N., Singh, K., von Platen, P., Saraf, Y., Pino, J., et al. XLS-R: Self-supervised cross-lingual speech representation learning at scale. arXiv preprint arXiv:2111.09296, 2021. Baevski, A., Zhou, H., Mohamed, A., and Auli, M. wav2vec 2.0: A framework for self-supervised learning of speech representations. arXiv preprint arXiv:2006.11477, 2020. Baevski, A., Hsu, W.-N., Conneau, A., and Auli, M. Unsupervised speech recognition. Advances in Neural Information Processing Systems, 34:27826–27839, 2021. Bapna, A., Cherry, C., Zhang, Y., Jia, Y., Johnson, M., Cheng, Y., Khanuja, S., Riesa, J., and Conneau, A. mslam: Massively multilingual joint pre-training for speech and text. arXiv preprint arXiv:2202.01374, 2022. Barbu, A., Mayo, D., Alverio, J., Luo, W., Wang, C., Gutfreund, D., Tenenbaum, J., and Katz, B. Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. Advances in neural information processing systems, 32, 2019. Caruana, R. Multitask learning. Machine learning, 28(1): 41–75, 1997. Chan, W., Park, D., Lee, C., Zhang, Y., Le, Q., and Norouzi, M. SpeechStew: Simply mix all available speech recognition data to train one large neural network. arXiv preprint arXiv:2104.02133, 2021. Chen, G., Chai, S., Wang, G., Du, J., Zhang, W.-Q., Weng, C., Su, D., Povey, D., Trmal, J., Zhang, J., et al. Gigaspeech: An evolving, multi-domain asr corpus with 10,000 hours of transcribed audio. arXiv preprint arXiv:2106.06909, 2021. Chen, S., Wu, Y., Wang, C., Chen, Z., Chen, Z., Liu, S., Wu, J., Qian, Y., Wei, F., Li, J., et al. Unispeech-sat: Universal speech representation learning with speaker aware pre-training. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6152–6156. IEEE, 2022a. Chen, T., Xu, B., Zhang, C., and Guestrin, C. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174, 2016. Chen, Z., Zhang, Y., Rosenberg, A., Ramabhadran, B., Moreno, P., Bapna, A., and Zen, H. Maestro: Matched speech text representations through modality matching. arXiv preprint arXiv:2204.03409, 2022b. Child, R., Gray, S., Radford, A., and Sutskever, I. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P. Natural language processing (almost) from scratch. Journal of machine learning research, 12(ARTICLE):2493–2537, 2011. Conneau, A., Ma, M., Khanuja, S., Zhang, Y., Axelrod, V., Dalmia, S., Riesa, J., Rivera, C., and Bapna, A. Fleurs: Few-shot learning evaluation of universal representations of speech. arXiv preprint arXiv:2205.12446, 2022. Del Rio, M., Delworth, N., Westerman, R., Huang, M., Bhandari, N., Palakapilly, J., McNamara, Q., Dong, J., Zelasko, P., and Jette, M. Earnings-21: a practical bench- ´ mark for asr in the wild. arXiv preprint arXiv:2104.11348, 2021. Galvez, D., Diamos, G., Torres, J. M. C., Achorn, K., Gopi, A., Kanter, D., Lam, M., Mazumder, M., and Reddi, V. J. The people’s speech: A large-scale diverse english speech recognition dataset for commercial usage. arXiv preprint arXiv:2111.09344, 2021. Geirhos, R., Jacobsen, J.-H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., and Wichmann, F. A. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020. Ghorbani, B., Firat, O., Freitag, M., Bapna, A., Krikun, M., Garcia, X., Chelba, C., and Cherry, C. Scaling laws for neural machine translation. arXiv preprint arXiv:2109.07740, 2021. Griewank, A. and Walther, A. Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation. ACM Transactions on Mathematical Software (TOMS), 26(1):19–45, 2000. Gunter, K., Vaughn, C., and Kendall, T. Contextualizing/s/retraction: Sibilant variation and change in washington dc african american language. Language Variation and Change, 33(3):331–357, 2021. Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., Fernandez del ´ R´ıo, J., Wiebe, M., Peterson, P., Gerard-Marchant, P., ´ Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E. Array programming with NumPy. Nature, 585:357–362, 2020. doi: 10.1038/ s41586-020-2649-2. Robust Speech Recognition via Large-Scale Weak Supervision 16 Hendrycks, D. and Gimpel, K. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016. Hendrycks, D., Liu, X., Wallace, E., Dziedzic, A., Krishnan, R., and Song, D. Pretrained transformers improve out-ofdistribution robustness. arXiv preprint arXiv:2004.06100, 2020. Hernandez, F., Nguyen, V., Ghannay, S., Tomashenko, N. A., and Esteve, Y. Ted-lium 3: twice as much data and corpus ` repartition for experiments on speaker adaptation. In SPECOM, 2018. Hsu, W.-N., Bolte, B., Tsai, Y.-H. H., Lakhotia, K., Salakhutdinov, R., and Mohamed, A. Hubert: Selfsupervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3451–3460, 2021a. Hsu, W.-N., Sriram, A., Baevski, A., Likhomanenko, T., Xu, Q., Pratap, V., Kahn, J., Lee, A., Collobert, R., Synnaeve, G., et al. Robust wav2vec 2.0: Analyzing domain shift in self-supervised pre-training. arXiv preprint arXiv:2104.01027, 2021b. Huang, G., Sun, Y., Liu, Z., Sedra, D., and Weinberger, K. Q. Deep networks with stochastic depth. In European conference on computer vision, pp. 646–661. Springer, 2016. Jia, R. and Liang, P. Adversarial examples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328, 2017. Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Viegas, F., Wattenberg, M., Corrado, ´ G., et al. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5:339– 351, 2017. Kendall, T. and Farrington, C. The corpus of regional african american language. Version 2021.07. Eugene, OR: The Online Resources for African American Language Project. http://oraal.uoregon.edu/coraal, 2021. Accessed: 2022-09-01. Koenecke, A., Nam, A., Lake, E., Nudell, J., Quartey, M., Mengesha, Z., Toups, C., Rickford, J. R., Jurafsky, D., and Goel, S. Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences, 117(14):7684–7689, 2020. Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., and Houlsby, N. Big transfer (bit): General visual representation learning. In European conference on computer vision, pp. 491–507. Springer, 2020. Kuchaiev, O., Li, J., Nguyen, H., Hrinchuk, O., Leary, R., Ginsburg, B., Kriman, S., Beliaev, S., Lavrukhin, V., Cook, J., et al. Nemo: a toolkit for building ai applications using neural modules. arXiv preprint arXiv:1909.09577, 2019. Lake, B. M., Ullman, T. D., Tenenbaum, J. B., and Gershman, S. J. Building machines that learn and think like people. Behavioral and brain sciences, 40, 2017. Liao, H., McDermott, E., and Senior, A. Large scale deep neural network acoustic modeling with semi-supervised training data for youtube video transcription. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 368–373. IEEE, 2013. Likhomanenko, T., Xu, Q., Pratap, V., Tomasello, P., Kahn, J., Avidov, G., Collobert, R., and Synnaeve, G. Rethinking evaluation in asr: Are our models robust enough? arXiv preprint arXiv:2010.11745, 2020. Loshchilov, I. and Hutter, F. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. Luong, M.-T., Le, Q. V., Sutskever, I., Vinyals, O., and Kaiser, L. Multi-task sequence to sequence learning. arXiv preprint arXiv:1511.06114, 2015. Mahajan, D., Girshick, R., Ramanathan, V., He, K., Paluri, M., Li, Y., Bharambe, A., and Van Der Maaten, L. Exploring the limits of weakly supervised pretraining. In Proceedings of the European conference on computer vision (ECCV), pp. 181–196, 2018. Mauch, M. and Ewert, S. The audio degradation toolbox and its application to robustness evaluation. In Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR 2013), Curitiba, Brazil, 2013. accepted. McCann, B., Keskar, N. S., Xiong, C., and Socher, R. The natural language decathlon: Multitask learning as question answering. arXiv preprint arXiv:1806.08730, 2018. Meyer, J., Rauchenstein, L., Eisenberg, J. D., and Howell, N. Artie bias corpus: An open dataset for detecting demographic bias in speech applications. In Proceedings of the 12th Language Resources and Evaluation Conference, pp. 6462–6468, Marseille, France, May 2020. European Language Resources Association. ISBN 979-10-95546- 34-4. URL https://aclanthology.org/2020. lrec-1.796. Miller, J., Krauth, K., Recht, B., and Schmidt, L. The effect of natural distribution shift on question answering models. In ICML, 2020. Robust Speech Recognition via Large-Scale Weak Supervision 17 Mohamed, A.-r., Dahl, G., Hinton, G., et al. Deep belief networks for phone recognition. In Nips workshop on deep learning for speech recognition and related applications, volume 1, pp. 39, 2009. Narayanan, A., Misra, A., Sim, K. C., Pundak, G., Tripathi, A., Elfeky, M., Haghani, P., Strohman, T., and Bacchiani, M. Toward domain-invariant speech recognition via large scale training. In 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 441–447. IEEE, 2018. Panayotov, V., Chen, G., Povey, D., and Khudanpur, S. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5206–5210. IEEE, 2015. pandas development team, T. pandas-dev/pandas: Pandas, February 2020. URL https://doi.org/10. 5281/zenodo.3509134. Park, D. S., Chan, W., Zhang, Y., Chiu, C.-C., Zoph, B., Cubuk, E. D., and Le, Q. V. SpecAugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779, 2019. Pascanu, R., Mikolov, T., and Bengio, Y. On the difficulty of training recurrent neural networks. In International conference on machine learning, pp. 1310–1318. PMLR, 2013. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pp. 8024– 8035, 2019. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. Polyak, B. T. and Juditsky, A. B. Acceleration of stochastic approximation by averaging. SIAM journal on control and optimization, 30(4):838–855, 1992. Pratap, V., Sriram, A., Tomasello, P., Hannun, A. Y., Liptchinsky, V., Synnaeve, G., and Collobert, R. Massively multilingual asr: 50 languages, 1 model, 1 billion parameters. ArXiv, abs/2007.03001, 2020a. Pratap, V., Xu, Q., Sriram, A., Synnaeve, G., and Collobert, R. Mls: A large-scale multilingual dataset for speech research. arXiv preprint arXiv:2012.03411, 2020b. Press, O. and Wolf, L. Using the output embedding to improve language models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 157–163, Valencia, Spain, April 2017. Association for Computational Linguistics. URL https: //aclanthology.org/E17-2025. Provilkov, I., Emelianenko, D., and Voita, E. Bpe-dropout: Simple and effective subword regularization. arXiv preprint arXiv:1910.13267, 2019. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. Language models are unsupervised multitask learners. 2019. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020, 2021. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P. J., et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67, 2020. Ravanelli, M., Parcollet, T., Plantinga, P., Rouhe, A., Cornell, S., Lugosch, L., Subakan, C., Dawalatabad, N., Heba, A., Zhong, J., Chou, J.-C., Yeh, S.-L., Fu, S.-W., Liao, C.-F., Rastorgueva, E., Grondin, F., Aris, W., Na, H., Gao, Y., Mori, R. D., and Bengio, Y. SpeechBrain: A general-purpose speech toolkit, 2021. arXiv:2106.04624. Recht, B., Roelofs, R., Schmidt, L., and Shankar, V. Do ImageNet classifiers generalize to ImageNet? In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 5389–5400. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/ recht19a.html. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3): 211–252, 2015. Schultz, T. and Kirchhoff, K. Multilingual speech processing. Elsevier, 2006. Seide, F., Li, G., Chen, X., and Yu, D. Feature engineering in context-dependent deep neural networks for conversational speech transcription. In 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 24–29. IEEE, 2011. Robust Speech Recognition via Large-Scale Weak Supervision 18 Sennrich, R., Haddow, B., and Birch, A. Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909, 2015. Speer, R. ftfy. Zenodo, 2019. URL https://doi.org/ 10.5281/zenodo.2591652. Version 5.5. Sutskever, I., Vinyals, O., and Le, Q. V. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27, 2014. Taori, R., Dave, A., Shankar, V., Carlini, N., Recht, B., and Schmidt, L. Measuring robustness to natural distribution shifts in image classification. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 18583–18599. Curran Associates, Inc., 2020. URL https://proceedings. neurips.cc/paper/2020/file/ d8330f857a17c53d217014ee776bfd50-Paper. pdf. Torralba, A. and Efros, A. A. Unbiased look at dataset bias. CVPR 2011, pp. 1521–1528, 2011. Toshniwal, S., Sainath, T. N., Weiss, R. J., Li, B., Moreno, P. J., Weinstein, E., and Rao, K. Multilingual speech recognition with a single end-to-end model. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4904–4908, 2018. Valk, J. and Alumae, T. Voxlingua107: a dataset for spoken ¨ language recognition. In 2021 IEEE Spoken Language Technology Workshop (SLT), pp. 652–658. IEEE, 2021. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008, 2017. Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C. J., Polat, ˙I., Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., van Mulbregt, P., and SciPy 1.0 Contributors. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. doi: 10.1038/s41592-019-0686-2. Wang, C., Tang, Y., Ma, X., Wu, A., Okhonko, D., and Pino, J. fairseq s2t: Fast speech-to-text modeling with fairseq. arXiv preprint arXiv:2010.05171, 2020a. Wang, C., Wu, A., and Pino, J. Covost 2 and massively multilingual speech-to-text translation. arXiv preprint arXiv:2007.10310, 2020b. Wang, C., Riviere, M., Lee, A., Wu, A., Talnikar, C., Haziza, D., Williamson, M., Pino, J., and Dupoux, E. Voxpopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation. arXiv preprint arXiv:2101.00390, 2021. Wang, P., Sainath, T. N., and Weiss, R. J. Multitask training with text data for end-to-end speech recognition. arXiv preprint arXiv:2010.14318, 2020c. Watanabe, S., Mandel, M., Barker, J., Vincent, E., Arora, A., Chang, X., Khudanpur, S., Manohar, V., Povey, D., Raj, D., et al. Chime-6 challenge: Tackling multispeaker speech recognition for unsegmented recordings. arXiv preprint arXiv:2004.09249, 2020. Xu, Q., Baevski, A., Likhomanenko, T., Tomasello, P., Conneau, A., Collobert, R., Synnaeve, G., and Auli, M. Selftraining and pre-training are complementary for speech recognition. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3030–3034. IEEE, 2021. Zhang, Y., Qin, J., Park, D. S., Han, W., Chiu, C.-C., Pang, R., Le, Q. V., and Wu, Y. Pushing the limits of semisupervised learning for automatic speech recognition. arXiv preprint arXiv:2010.10504, 2020. Zhang, Y., Park, D. S., Han, W., Qin, J., Gulati, A., Shor, J., Jansen, A., Xu, Y., Huang, Y., Wang, S., et al. BigSSL: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition. arXiv preprint arXiv:2109.13226, 2021.",06.12.2022,https://arxiv.org/abs/2212.04356,"[-4.11544144e-02 -4.06497940e-02 -4.86175530e-02 -3.43188867e-02
+ -1.53247211e-02  1.50039447e-02 -6.00869954e-03  1.42020239e-02
+  3.37648839e-02 -1.96266063e-02  1.33258151e-02 -1.49526075e-02
+  3.67704183e-02  3.75086330e-02  5.69621697e-02  5.53406253e-02
+  1.07970349e-02  9.63043515e-03  2.16605272e-02  2.54943618e-03
+  5.08502759e-02 -1.05807884e-02 -3.21979425e-03 -7.02734478e-03
+ -1.07477214e-02 -2.82975584e-02 -2.98009794e-02 -6.90024719e-02
+ -7.62176374e-03 -1.90492868e-01  4.27021869e-02 -2.25104894e-02
+  6.10363372e-02  1.30791189e-02  9.21262708e-03 -3.23865889e-03
+ -2.59880312e-02 -2.46469118e-02 -1.06809381e-02  1.73510481e-02
+ -1.83381233e-03  4.88675851e-03  1.88970554e-03 -7.26697594e-02
+  1.13289170e-02 -4.92956936e-02 -3.35584842e-02 -2.76592281e-02
+ -7.96950907e-02 -1.13808429e-02 -4.38641310e-02 -6.28889501e-02
+  1.93289332e-02  4.68207411e-02  1.96617581e-02 -5.58264786e-03
+  8.79119858e-02  6.68229684e-02  1.28965471e-02  1.71095468e-02
+ -7.26848915e-02  2.51015201e-02 -1.29587337e-01  6.65853396e-02
+ -2.41109962e-03  7.30705634e-02 -3.09644174e-02  1.51220504e-02
+ -2.78927442e-02  1.39021948e-02  2.04701517e-02  1.31925158e-02
+  4.97440584e-02  1.69249717e-02  1.67512950e-02  4.13324200e-02
+  5.31679690e-02  1.87483206e-02 -4.50824108e-03  3.45450290e-03
+  6.13282137e-02 -4.86584641e-02 -1.97920371e-02 -2.24771202e-02
+ -3.22497524e-02 -9.34826769e-03 -3.07956357e-02 -1.80216581e-02
+ -4.68011312e-02 -2.79270746e-02 -9.08475146e-02  2.21486557e-02
+  1.84096675e-02  2.70227473e-02  9.54371132e-03 -8.79163016e-03
+  2.25003008e-02 -7.30540883e-03  6.79095928e-03  3.88468802e-01
+  3.92545247e-03 -9.04479437e-03  4.63694939e-03 -1.02968678e-01
+  2.81169042e-02 -2.94818748e-02 -3.55945006e-02 -2.44983286e-02
+ -1.17075955e-02  1.68953033e-03 -1.73908267e-02 -3.48353609e-02
+  7.29606766e-03  1.00689754e-02  6.10038042e-02  5.67661971e-02
+  5.77330217e-02 -1.56432558e-02  4.38452847e-02  8.96099303e-03
+ -3.25164683e-02  1.03442036e-02  1.61325969e-02 -8.10236856e-02
+  7.79783055e-02 -4.02737521e-02  4.19220217e-02  1.21823773e-01
+  3.93838994e-02  4.80642766e-02  1.52734648e-02  3.55124325e-02
+ -1.98463164e-02  3.26504037e-02  7.03402609e-02  4.50124498e-03
+  3.32224704e-02 -3.12786810e-02 -2.20656469e-02  1.97831430e-02
+ -3.05845086e-02 -4.65341210e-02  1.14026917e-02  1.99454129e-02
+ -8.07292387e-02  1.45679027e-01 -3.40353251e-02 -7.97339063e-03
+ -6.38915971e-02 -4.94622551e-02  9.69292421e-04  4.77327742e-02
+  4.77507189e-02 -7.79518485e-02  1.23608029e-02  1.42899472e-02
+  2.56241933e-02 -2.83671748e-02 -3.14955898e-02 -2.99279094e-02
+ -9.84309521e-03 -4.22681160e-02 -6.26524398e-03  6.53242469e-02
+ -6.75289100e-03 -5.99493682e-02 -2.52291262e-02 -3.33959125e-02
+  2.83789262e-02 -1.72519758e-02  2.81143282e-02 -6.39668852e-02
+  3.88533324e-02  1.41940825e-02 -6.35295883e-02 -4.90837079e-03
+ -1.06626794e-01 -4.93823364e-03 -1.72004923e-02  3.64567898e-02
+  1.50375273e-02 -4.31482494e-02  1.77001934e-02  1.48599474e-02
+ -3.29531766e-02 -5.46241179e-02  2.07856596e-02 -3.94051149e-02
+  2.22141668e-02  2.90451534e-02 -7.37899356e-03 -1.17964884e-02
+ -2.01390181e-02 -1.32611429e-03 -3.44001083e-03 -1.66373551e-02
+  2.46858168e-02  2.34836210e-02 -3.41398194e-02 -1.42190047e-02
+ -5.87832415e-03  5.43985283e-03 -1.86739992e-02  1.78223057e-03
+  3.52561623e-02 -1.86124388e-02 -3.75837609e-02  3.91806029e-02
+  2.80068591e-02  5.57841361e-02  5.40442625e-03 -7.82958716e-02
+  5.17432876e-02 -6.80813612e-03 -1.09946355e-02 -2.01755408e-02
+  3.17049026e-02  6.50201961e-02  1.96922161e-02  3.37465443e-02
+  1.28786284e-02  8.99663847e-03 -4.62496839e-02 -2.65224814e-01
+ -2.22007781e-02  3.51355821e-02  3.12280320e-02  4.99406122e-02
+ -6.56843111e-02  5.33012971e-02 -1.60397850e-02  7.77194947e-02
+  4.33055460e-02 -1.62320044e-02 -3.02192643e-02 -1.24234036e-02
+  2.67369412e-02 -1.11611821e-02  2.39656009e-02 -5.54819999e-04
+  1.52486293e-02 -3.21975388e-02  4.61247899e-02 -1.39149753e-02
+  1.28442170e-02 -3.10466886e-02 -5.93064763e-02  5.27880266e-02
+  4.63636965e-03  1.50491387e-01 -4.15758751e-02  4.31916900e-02
+ -3.61628504e-03  2.72392798e-02 -5.79615217e-03 -9.40870028e-03
+ -5.72309867e-02  5.06072417e-02  4.69373837e-02  7.58906454e-02
+ -1.70092247e-02  2.14648284e-02  1.36685912e-02 -5.04418500e-02
+ -2.97630746e-02  2.90118456e-02 -8.31832737e-02 -1.13354221e-01
+  1.91451497e-02 -6.12246431e-02 -5.45984022e-02 -6.92971349e-02
+  4.82553663e-03  1.15239853e-02  2.26308797e-02  5.44622391e-02
+ -2.89539732e-02 -6.55352101e-02 -1.57752950e-02 -4.87230830e-02
+  1.67639926e-02 -5.33474423e-02  3.86298858e-02  4.98998761e-02
+ -3.69816907e-02 -2.16875728e-02 -2.03916263e-02  1.91513172e-04
+  1.85276996e-02  3.58277559e-02 -3.44928429e-02 -5.97385690e-03
+ -1.32959033e-03 -5.22103831e-02  7.96051100e-02 -6.05453411e-03
+  4.12258543e-02  3.46500464e-02 -6.58358634e-03 -1.98653005e-02
+ -1.08118258e-01 -5.68699874e-02 -8.80822074e-03  1.22651562e-01
+  1.91062335e-02  5.66964671e-02  3.45754251e-03  4.59696017e-02
+  1.60282496e-02  8.16072449e-02  6.70179352e-03  2.65265107e-02
+  5.35507128e-02  1.97581332e-02 -1.20537505e-02 -4.33716811e-02
+ -2.45439876e-02  2.83988267e-02  2.09277030e-02 -2.56054670e-01
+ -1.21122003e-02  1.86292676e-03  5.45703806e-02  2.36867159e-03
+  5.95008954e-03  1.27986912e-02 -3.22379097e-02 -3.89509499e-02
+  5.38296476e-02 -3.95833999e-02  4.56644334e-02  1.78078711e-02
+  9.45415907e-03  1.15826745e-02  3.87747660e-02  1.23295508e-01
+  1.03473412e-02  5.24660386e-02 -5.92058012e-03  1.61260308e-03
+  6.76905438e-02  1.32173955e-01 -2.42267046e-02  7.10469261e-02
+ -3.85007448e-02 -3.19234282e-02 -6.98786974e-02  1.27168689e-02
+ -8.36511850e-02 -5.47270253e-02 -7.32725312e-04  6.53312132e-02
+ -6.50579063e-03 -8.72561783e-02  4.47847284e-02 -4.97743860e-02
+  4.66327276e-03  3.92714003e-03  1.03287911e-02  1.72548238e-02
+ -8.85907654e-03  5.29526472e-02 -6.68108836e-02  1.87795423e-02
+  3.71216424e-02  3.15136909e-02 -3.48515846e-02 -4.08652462e-02
+ -1.64221097e-02  2.58124471e-02  8.95543396e-03  6.52154982e-02
+ -9.77384299e-03  3.11825250e-04  7.14752674e-02  6.84073707e-03
+ -4.06912081e-02 -3.96710373e-02 -5.19163124e-02  4.13655043e-02
+ -3.49972546e-02 -6.64386293e-03  3.64989825e-02 -5.43406196e-02]"
+6,SLICED DENOISING: A PHYSICS-INFORMED MOLECULAR PRE-TRAINING METHOD,"While molecular pre-training has shown great potential in enhancing drug discovery, the lack of a solid physical interpretation in current methods raises concerns about whether the learned representation truly captures the underlying explanatory factors in observed data, ultimately resulting in limited generalization and robustness. Although denoising methods offer a physical interpretation, their accuracy is often compromised by ad-hoc noise design, leading to inaccurate learned force fields. To address this limitation, this paper proposes a new method for molecular pre-training, called sliced denoising (SliDe), which is based on the classical mechanical intramolecular potential theory. SliDe utilizes a novel noise strategy that perturbs bond lengths, angles, and torsion angles to achieve better sampling over conformations. Additionally, it introduces a random slicing approach that circumvents the computationally expensive calculation of the Jacobian matrix, which is otherwise essential for estimating the force field. By aligning with physical principles, SliDe shows a 42% improvement in the accuracy of estimated force fields compared to current state-of-the-art denoising methods, and thus outperforms traditional baselines on various molecular property prediction tasks.",Molecular pre-training Drug discovery Physical interpretation Learned representation Generalization and robustness Denoising methods,"Let f : X ÝÑ X be a surjective morphism on a projective variety X defined over a number field K. For a closed subscheme Y Ă X, the iterated preimages f ´npY q tend to become complicated as n approaches 8, and so it is reasonable to expect that the K-rational points on f ´npY q might be sparse in a certain sense. Such a kind of question is investigated in [2], motivated by a question proposed in [10]. The central problem is the following ([10, Question 8.4(1)],[2, Question 1.1]). Question 1.1 (Preimages Question (PIQ)). Let K be a number field. Let X be a projective variety over K (i.e. integral projective scheme over K) and f : X ÝÑ X 2020 Mathematics Subject Classification. Primary: 37P55; Secondary: 37P15, 14G05. Key words and phrases. Self-morphisms on projective varieties, rational points, preimages of subvarieties. 1 2 YOHSUKE MATSUZAWA AND KAORU SANO a surjective morphism over K. Let Y Ă X be an f-invariant closed subscheme (i.e. f|Y factors through Y ). Then is there s0 P Zě0 such that for all s ě s0, we have ` f ´s´1 pY q r f ´s pY q ˘ pKq “ H ? Let us give some comments on the assumptions. We cannot remove the projectivity of X as there is a counterexample [2, section 6.1]. The f-invariance of Y is more mysterious. We can pose a similar question for Y that is not necessarily f-invariant (see Question 4.1). While there are counterexamples (cf. Section 4), it seems plausible to assume that certain additional conditions on f or Y might yield a positive answer. The f-invariance of Y can be seen as one such condition, but we anticipate that other types of conditions exist. Currently, we do not comprehend what causes the set of rational points on f ´npY q to be sparse. Let us also give a comment on other ground fields. Obviously, the question is nonsense if K is algebraically closed. A slightly non-trivial fact is that it has a counterexample over p-adic fields in general (see [2, section 6.2]). However, for a specific type of morphisms, PIQ is true over p-adic fields or finitely generated fields over Q (cf. [2, Proof of Theorem 3.1 Step 2], Theorem 9.1). Remark 1.2. The statement of Question 1.1 makes sense for arbitrary self-map f : X ÝÑ X on a set X and its f-invariant subset Y Ă X. We use the terminology Preimages Question or PIQ for such situations flexibly. For example, PIQ is true for pX, f, Y q means there is s0 P Zě0 such that for all s ě s, we have f ´s´1 pY q r f ´s pY q “ H. In this terminology, Question 1.1 is equivalent to PIQ for f : XpKq ÝÑ XpKq and subset Y pKq Ă XpKq. Remark 1.3. For a map f : X ÝÑ X on a set X and an f-invariant subset Y Ă X, the following are equivalent. (1) PIQ is true for pX, f, Y q. (2) PIQ is true for pX, f n, Y q for some n ě 1. (3) There is s0 ě 0 with the following property: for every x P X, if f s pxq P Y for some s ě 0, then f s0 pxq P Y . Back to Question 1.1, the original [10, Question 8.4(1)] was the following slightly stronger question. Question 1.4. Let K be a number field. Let X be a projective variety over K and f : X ÝÑ X a surjective morphism over K. Let Y Ă X be an f-invariant closed subscheme. Let d P Zě0. Then is there a s0 P Zě0 such that for all s ě s0, we have ` f ´s´1 pY q r f ´s pY q ˘ pLq “ H for all field extensions K Ă L with rL : Ks ď d. It turns out: Proposition A (Proposition 2.1). Question 1.1 for all K, X, f, and Y is equivalent to Question 1.4 for all K, X, f, Y , and d. We verify Question 1.4 for the following cases. Theorem B (Theorem 5.1 and Proposition 3.1). The answer to Question 1.4 is affirmative if either dim Y “ 0, or f is ´etale. PREIMAGES QUESTION 3 Note that Question 1.1 was proven for ´etale morphisms in [2]. In [2], the authors proved that Question 1.1 is true for self-morphisms on P 1 ˆP 1 of the form f ˆ f and the diagonal subvariety. We generalized this theorem to arbitrary self-morphisms on P 1 ˆP 1 and any invariant closed subscheme (as well as to finitely generated fields). Theorem C (Theorem 9.1). Let K be a finitely generated field over Q. Let ϕ: P 1 K ˆK P 1 K ÝÑ P 1 K ˆK P 1 K be a surjective morphism over K. Let Y Ă P 1 K ˆK P 1 K be a ϕ-invariant closed subscheme. Then there is s0 ě 0 such that for all s ě s0, we have ` ϕ ´s´1 pY q r ϕ ´s pY q ˘ pKq “ H. We have mentioned that the invariance of Y in Question 1.1 cannot be removed in general. But we have the following particular case that the answer is positive. Theorem D (Theorem 4.5). Let K be a number field. Let f, g : P 1 K ÝÑ P 1 K be polynomial maps of the same degree d ě 2. Consider the product morphism f ˆ g : P 1 K ˆ P 1 K ÝÑ P 1 K ˆ P 1 K and the diagonal subvariety ∆ Ă P 1 K ˆ P 1 K. There is s0 P Zě0 such that for all s ě s0, we have ˆ pf ˆ gq ´s´1 p∆q r ď 0ďiďs pf ˆ gq ´i p∆q ˙ pKq “ H. We have seen that the answer to Question 1.1 is positive for some classes of morphisms and varieties. It is then natural to ask how much extent the (minimum of the) number s0 in Question 1.1 depends on f, Y , and X. When X is an abelian variety, we prove the following uniformity on s0. Theorem E (Theorem 6.4). Let X be an abelian variety over a number field K. Then there exists s0 P Zě0 depending only on X with the following property. For any surjective morphism f : X ÝÑ X and any reduced closed subscheme Y Ă X with fpY q Ă Y , we have pf ´s0´1 pY q r f ´s0 pY qqpKq “ H. Remark 1.5. In the course of the proof of Theorem E, we prove a uniform bound of the periods of periodic abelian subvarieties under isogenies Lemma 6.2. While our proof heavily relies on several special properties of abelian varieties, it is worth noting the similarity of this and the problem raised in [10, Question 8.4 (2)]. Remark 1.6. Under the assumption of general uniform boundedness conjecture on the torsion points of abelian varieties, we obtain a stronger version of Theorem E, subject to the condition that Y is geometrically irreducible. Refer to Proposition 6.6 for details.","This paper proposes a novel pre-training method, called sliced denoising, for molecular representation learning. Theoretically, it harbors a solid physical interpretation of learning force fields on molecular samples. The sampling distribution and regression targets are derived from classical mechanical molecular potential, ensuring more realistic input conformations and precise force field estimation than other denoising methods. Empirically, SliDe has shown significant improvements in force field estimation accuracy and various downstream tasks, including QM9 and MD17, as compared with previous supervised learning and pre-training methods.","Intra- and Intermolecular Potentials in Simulations, chapter 3, pp. 39–71. John Wiley & Sons, Ltd, 2020. ISBN 9783527699452. doi: https://doi.org/10.1002/9783527699452.ch3. URL https: //onlinelibrary.wiley.com/doi/abs/10.1002/9783527699452.ch3. Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications. In International Conference on Learning Representations, 2021. URL https: //openreview.net/forum?id=i80OPhOCVH2. Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–1828, 2013. doi: 10.1109/TPAMI.2013.50. Camille Bilodeau, Wengong Jin, Tommi Jaakkola, Regina Barzilay, and Klavs F. Jensen. Generative models for molecular discovery: Recent advances and challenges. WIREs Computational Molecular Science, 12(5):e1608, 2022. doi: https://doi.org/10.1002/wcms.1608. URL https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wcms.1608. Ludwig Boltzmann. Studien uber das gleichgewicht der lebenden kraft. Wissenschafiliche Abhandlungen, 1:49–96, 1868. Simon Boothroyd, Pavan Kumar Behara, Owen C. Madin, David F. Hahn, Hyesu Jang, Vytautas Gapsys, Jeffrey R. Wagner, Joshua T. Horton, David L. Dotson, Matthew W. Thompson, Jessica Maat, Trevor Gokey, Lee-Ping Wang, Daniel J. Cole, Michael K. Gilson, John D. Chodera, Christopher I. Bayly, Michael R. Shirts, and David L. Mobley. Development and benchmarking of open force field 2.0.0: The sage small molecule force field. Journal of Chemical Theory and Computation, 19(11):3251–3275, 2023. doi: 10.1021/acs.jctc.3c00039. URL https://doi.org/10.1021/acs.jctc.3c00039. PMID: 37167319. Stefan Chmiela, Alexandre Tkatchenko, Huziel E Sauceda, Igor Poltavsky, Kristof T Schutt, and ¨ Klaus-Robert Muller. Machine learning of accurate energy-conserving molecular force fields. ¨ Science advances, 3(5):e1603015, 2017. 10 Preprint. Under review. Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Muller, and Alexandre Tkatchenko. Towards ex- ¨ act molecular dynamics simulations with machine-learned force fields. Nature Communications, 9(1), sep 2018. doi: 10.1038/s41467-018-06169-2. URL https://doi.org/10.1038% 2Fs41467-018-06169-2. Xiaomin Fang, Lihang Liu, Jieqiong Lei, Donglong He, Shanzhuo Zhang, Jingbo Zhou, Fan Wang, Hua Wu, and Haifeng Wang. Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, 4:127 – 134, 2021. URL https: //api.semanticscholar.org/CorpusID:235417265. Yin Fang, Qiang Zhang, Haihong Yang, Xiang Zhuang, Shumin Deng, Wen Zhang, Ming Qin, Zhuo Chen, Xiaohui Fan, and Huajun Chen. Molecular contrastive learning with chemical element knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 3968–3976, 2022. Shikun Feng, Yuyan Ni, Yanyan Lan, Zhi-Ming Ma, and Wei-Ying Ma. Fractional denoising for 3D molecular pre-training. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 9938–9961. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/feng23c.html. Zhangyang Gao, Cheng Tan, Jun Xia, and Stan Z. Li. Co-supervised pre-training of pocket and ligand. In Danai Koutra, Claudia Plant, Manuel Gomez Rodriguez, Elena Baralis, and Francesco Bonchi (eds.), Machine Learning and Knowledge Discovery in Databases: Research Track, pp. 405–421, Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-43412-9. Johannes Gasteiger, Shankari Giri, Johannes T. Margraf, and Stephan Gunnemann. Fast and ¨ uncertainty-aware directional message passing for non-equilibrium molecules. In Machine Learning for Molecules Workshop, NeurIPS, 2020a. Johannes Gasteiger, Janek Groß, and Stephan Gunnemann. Directional message passing for molec- ¨ ular graphs. In International Conference on Learning Representations (ICLR), 2020b. Jonathan Godwin, Michael Schaarschmidt, Alex Gaunt, Alvaro Sanchez-Gonzalez, Yulia Rubanova, Petar Velivckovi’c, James Kirkpatrick, and Peter W. Battaglia. Simple gnn regularisation for 3d molecular property prediction and beyond. In International Conference on Learning Representations, 2021. URL https://api.semanticscholar.org/CorpusID:247450503. Tim Hsu, Tuan Anh Pham, Nathan Daniel Keilbart, Stephen E. Weitzner, James Chapman, Penghao Xiao, S. Roger Qiu, Xiao Chen, and Brandon C. Wood. Efficient and interpretable graph network representation for angle-dependent properties applied to optical spectroscopy. npj Computational Materials, 8:1–9, 2022. URL https://api.semanticscholar.org/ CorpusID:250535082. Rui Jiao, Jiaqi Han, Wenbing Huang, Yu Rong, and Yang Liu. Energy-motivated equivariant pretraining for 3d molecular graphs. Proceedings of the AAAI Conference on Artificial Intelligence, 37(7):8096–8104, Jun. 2023. doi: 10.1609/aaai.v37i7.25978. URL https://ojs.aaai. org/index.php/AAAI/article/view/25978. Bowen Jing, Gabriele Corso, Jeffrey Chang, Regina Barzilay, and Tommi Jaakkola. Torsional diffusion for molecular conformer generation. In Advances in Neural Information Processing Systems, volume 35, pp. 24240–24253. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/ file/994545b2308bbbbc97e3e687ea9e464f-Paper-Conference.pdf. Johannes Klicpera, Florian Becker, and Stephan Gunnemann. Gemnet: Universal directional graph neural networks for molecules. In Neural Information Processing Systems, 2021. URL https: //api.semanticscholar.org/CorpusID:235446323. Greg Landrum et al. Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum, 2013. 11 Preprint. Under review. Jie Li, Oufan Zhang, Seokyoung Lee, Ashley Namini, Zi Hao Liu, Joao M. C. Teixeira, Julie D. ˜ Forman-Kay, and Teresa Head-Gordon. Learning correlations between internal coordinates to improve 3d cartesian coordinates for proteins. Journal of Chemical Theory and Computation, 19 (14):4689–4700, 2023. doi: 10.1021/acs.jctc.2c01270. URL https://doi.org/10.1021/ acs.jctc.2c01270. PMID: 36749957. Shuangli Li, Jingbo Zhou, Tong Xu, Dejing Dou, and Hui Xiong. Geomgcl: Geometric graph contrastive learning for molecular property prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 4541–4549, 2022. Shengchao Liu, Hongyu Guo, and Jian Tang. Molecular geometry pretraining with SE(3)-invariant denoising distance matching. In The Eleventh International Conference on Learning Representations, 2022a. Shengchao Liu, Weitao Du, Zhi-Ming Ma, Hongyu Guo, and Jian Tang. A group symmetric stochastic differential equation model for molecule multi-modal pretraining. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 21497–21526. PMLR, 23–29 Jul 2023. URL https: //proceedings.mlr.press/v202/liu23h.html. Yi Liu, Limei Wang, Meng Liu, Yu-Ching Lin, Xuan Zhang, Bora Oztekin, and Shuiwang Ji. Spherical message passing for 3D molecular graphs. In International Conference on Learning Representations, 2022b. URL https://api.semanticscholar.org/CorpusID:251649072. Shengjie Luo, Tianlang Chen, Yixian Xu, Shuxin Zheng, Tie-Yan Liu, Liwei Wang, and Di He. One transformer can understand both 2D & 3D molecular data. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum? id=vZTp1oPV3PC. Maho Nakata and Tomomi Shimazaki. Pubchemqc project: a large-scale first-principles electronic structure database for data-driven chemistry. Journal of chemical information and modeling, 57 (6):1300–1308, 2017. Julien Rabin, Gabriel Peyre, Julie Delon, and Marc Bernot. Wasserstein barycenter and its applica- ´ tion to texture mixing. In Scale Space and Variational Methods in Computer Vision, pp. 435–446, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. ISBN 978-3-642-24785-9. Raghunathan Ramakrishnan, Pavlo O Dral, Matthias Rupp, and O Anatole Von Lilienfeld. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1):1–7, 2014. Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information Processing Systems, 33:12559–12571, 2020. Antonio Saggion, Rossella Faraldo, and Matteo Pierno. The Fundamental Relation and the Thermodynamic Potentials, pp. 55–79. Springer International Publishing, Cham, 2019. ISBN 978- 3-030-26976-0. doi: 10.1007/978-3-030-26976-0 4. URL https://doi.org/10.1007/ 978-3-030-26976-0_4. Vıctor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E (n) equivariant graph neural networks. In International conference on machine learning, pp. 9323–9332. PMLR, 2021. Kristof Schutt, Oliver Unke, and Michael Gastegger. Equivariant message passing for the prediction ¨ of tensorial properties and molecular spectra. In International Conference on Machine Learning, pp. 9377–9388. PMLR, 2021. Kristof T Schutt, Huziel E Sauceda, P-J Kindermans, Alexandre Tkatchenko, and K-R M ¨ uller. ¨ Schnet–a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148(24):241722, 2018. Hannes Stark, Dominique Beaini, Gabriele Corso, Prudencio Tossou, Christian Dallago, Stephan ¨ Gunnemann, and Pietro Li ¨ o. 3D infomax improves gnns for molecular property prediction. In ` International Conference on Machine Learning, pp. 20479–20502. PMLR, 2022. 12 Preprint. Under review. Qiming Sun, Xing Zhang, Samragni Banerjee, Peng Bao, Marc Barbry, Nick S. Blunt, Nikolay A. Bogdanov, George H. Booth, Jia Chen, Zhi-Hao Cui, Janus J. Eriksen, Yang Gao, Sheng Guo, Jan Hermann, Matthew R. Hermes, Kevin Koh, Peter Koval, Susi Lehtola, Zhendong Li, Junzi Liu, Narbe Mardirossian, James D. McClain, Mario Motta, Bastien Mussard, Hung Q. Pham, Artem Pulkin, Wirawan Purwanto, Paul J. Robinson, Enrico Ronca, Elvira R. Sayfutyarova, Maximilian Scheurer, Henry F. Schurkus, James E. T. Smith, Chong Sun, Shi-Ning Sun, Shiv Upadhyay, Lucas K. Wagner, Xiao Wang, Alec White, James Daniel Whitfield, Mark J. Williamson, Sebastian Wouters, Jun Yang, Jason M. Yu, Tianyu Zhu, Timothy C. Berkelbach, Sandeep Sharma, Alexander Yu. Sokolov, and Garnet Kin-Lic Chan. Recent developments in the PySCF program package. The Journal of Chemical Physics, 153(2):024109, 07 2020. ISSN 0021-9606. doi: 10.1063/5.0006074. URL https://doi.org/10.1063/5.0006074. Philipp Tholke and Gianni De Fabritiis. Equivariant transformers for neural network based molec- ¨ ular potentials. In International Conference on Learning Representations, 2022. URL https: //openreview.net/forum?id=zNHzqZ9wrRB. Pascal Vincent. A connection between score matching and denoising autoencoders. Neural Computation, 2011. Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. International Conference on Learning Representations, 2008. Xu Wang, Huan Zhao, Weiwei Tu, and Quanming Yao. Automated 3D pre-training for molecular property prediction. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023. URL https://api.semanticscholar.org/CorpusID: 259144785. Yuyang Wang, Jianren Wang, Zhonglin Cao, and Amir Barati Farimani. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, 4(3):279– 287, 2022. Sheheryar Zaidi, Michael Schaarschmidt, James Martens, Hyunjik Kim, Yee Whye Teh, Alvaro Sanchez-Gonzalez, Peter Battaglia, Razvan Pascanu, and Jonathan Godwin. Pre-training via denoising for molecular property prediction. 2022. Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie C. Lozano, Payel Das, ´ and Jian Tang. Protein structure representation learning by geometric pretraining. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview. net/forum?id=to3qCB3tOh9. Liangzhen Zheng, Jingrong Fan, and Yuguang Mu. Onionnet: a multiple-layer intermolecularcontact-based convolutional neural network for protein–ligand binding affinity prediction. ACS Omega, 4(14):15956–15965, 2019. doi: 10.1021/acsomega.9b01997. URL https://doi. org/10.1021/acsomega.9b01997. PMID: 31592466. Gengmo Zhou, Zhifeng Gao, Qiankun Ding, Hang Zheng, Hongteng Xu, Zhewei Wei, Linfeng Zhang, and Guolin Ke. Uni-mol: A universal 3D molecular representation learning framework. In The Eleventh International Conference on Learning Representations, 2023. URL https: //openreview.net/forum?id=6K2RM6wVqKu. Kun Zhou and Bo Liu. Chapter 2 - potential energy functions. In Molecular Dynamics Simulation, pp. 41–65. Elsevier, 2022. ISBN 978-0-12-816419-8. doi: https://doi.org/10.1016/ B978-0-12-816419-8.00007-6. URL https://www.sciencedirect.com/science/ article/pii/B9780128164198000076. Jinhua Zhu, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, and TieYan Liu. Unified 2D and 3D pre-training of molecular representations. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2626–2636, 2022. Vladimir A. Zorich. The Differential Calculus of Functions of Several Variables, pp. 427–543. Springer Berlin Heidelberg, Berlin, Heidelberg, 2015. ISBN 978-3-662-48792-1. doi: 10.1007/ 978-3-662-48792-1 8. URL https://doi.org/10.1007/978-3-662-48792-1_8. 13 Preprint. Under review. Vladimir A. Zorich. *Integration of Differential Forms on Manifolds, pp. 313–362. Springer Berlin Heidelberg, Berlin, Heidelberg, 2016. ISBN 978-3-662-48993-2. doi: 10.1007/ 978-3-662-48993-2 7. URL https://doi.org/10.1007/978-3-662-48993-2_7.",03.11.2023,https://arxiv.org/pdf/2311.02124.pdf,"[-9.52281207e-02 -4.98076528e-02  3.48549746e-02  4.17407006e-02
+  9.21140239e-03  1.63870733e-02 -3.19624394e-02 -1.01759722e-02
+  2.81560104e-02  5.49204228e-03  3.82263288e-02 -3.70738804e-02
+  8.29000305e-03  1.79643072e-02 -1.92298740e-03 -3.05142272e-02
+  5.91807393e-03  2.14340817e-02 -2.26346701e-02  2.61514506e-04
+  2.76333708e-02 -3.76295485e-02  3.75365168e-02 -3.06496378e-02
+  2.91230474e-02  6.54757544e-02  6.46953844e-03  5.71706751e-03
+ -3.41164730e-02 -2.50661612e-01  5.17949276e-02  5.14166728e-02
+  3.10213454e-02 -2.41783652e-02  2.21592169e-02  6.13938970e-03
+ -8.48901644e-03  4.04225253e-02 -8.26809928e-02  2.94462591e-02
+  1.21666230e-02 -7.47290766e-03 -2.90482379e-02 -6.47658715e-04
+  4.50732335e-02 -6.19361438e-02 -3.71194705e-02 -6.84948713e-02
+  5.59680490e-03 -2.54589431e-02 -3.58987153e-02 -3.82991582e-02
+ -2.23623496e-02  1.04309572e-02  3.06250658e-02  3.28180864e-02
+  3.85875162e-03  4.31953259e-02 -3.64923337e-03  4.20069844e-02
+  1.87707823e-02  6.23608828e-02 -9.15281549e-02  8.67154822e-02
+  4.73918617e-02 -1.14615178e-02  7.24555273e-03 -5.34740612e-02
+  1.26727829e-02  7.73918182e-02  1.29860789e-02  1.20016737e-02
+  2.68291198e-02  2.95965616e-02  2.46348716e-02  6.80324156e-04
+  1.26280496e-02  4.53515397e-03  3.62035595e-02 -4.02182117e-02
+ -6.86580548e-03 -4.99270372e-02  2.26309001e-02 -4.22441103e-02
+ -6.55017495e-02  2.57208887e-02  3.38367224e-02 -5.55799231e-02
+  4.70263697e-02  4.67288792e-02  9.47923679e-03 -3.70476358e-02
+ -1.92536730e-02 -1.43551147e-02 -1.08158067e-01  5.05385697e-02
+  2.25676149e-02  3.81525010e-02 -1.39687909e-02  3.86824012e-01
+ -2.09936760e-02  3.81328352e-02 -5.83517589e-02 -2.47276351e-02
+ -1.59508660e-02 -7.56764561e-02 -2.89363563e-02 -1.73995027e-03
+  3.13835293e-02 -1.25204129e-02 -8.34105071e-04 -1.59381796e-02
+  1.24344807e-02 -7.06844628e-02 -1.11292722e-02 -1.57231453e-03
+  4.71962243e-02  2.57890932e-02 -2.09861379e-02 -8.37362080e-04
+ -6.45161793e-02 -4.91284824e-04  3.15966345e-02  4.83674593e-02
+  4.28557545e-02 -4.87287380e-02 -9.86913741e-02  8.15754831e-02
+  2.01969165e-02 -5.55004319e-03  1.93368103e-02  2.92493217e-02
+ -5.51189631e-02 -1.02734473e-02  3.63730788e-02 -1.23017551e-02
+ -5.03377654e-02 -4.05721692e-03  3.38958241e-02  4.69068363e-02
+ -3.44627202e-02  4.20580171e-02 -2.65479162e-02 -3.05245980e-03
+ -1.06995575e-01  3.25955674e-02 -4.38606702e-02  5.18197007e-02
+ -5.21055013e-02 -2.68781576e-02  3.41427512e-02  6.88022822e-02
+ -2.69983839e-02  3.03843897e-02  8.46665055e-02  1.63013153e-02
+  3.68179311e-03 -3.61704417e-02 -5.73476069e-02  1.42513243e-02
+ -1.01464577e-01 -5.56079671e-03 -2.27126796e-02  1.05342694e-01
+ -3.07464302e-02 -3.10030654e-02 -2.03380696e-02 -8.41660611e-03
+  5.12310909e-03  8.95424373e-03  2.76480354e-02 -1.07134264e-02
+ -1.77513268e-02  2.40047704e-02 -6.47999495e-02  3.63155641e-02
+ -6.03997223e-02 -2.07913034e-02  2.49628443e-02 -3.39256451e-02
+  1.61770880e-02 -1.94498543e-02 -1.39473274e-03  4.05802615e-02
+  1.90066546e-02 -3.04980613e-02 -2.04264801e-02  2.70342082e-02
+  2.79276874e-02  2.69147269e-02 -5.11371484e-03  9.58012976e-03
+ -3.57581265e-02  7.11485371e-02 -7.34073594e-02  4.62164031e-03
+ -3.67279612e-02  1.32498443e-02 -1.04961917e-01 -1.79743823e-02
+  4.84220795e-02  1.42033184e-02 -7.32598128e-03  5.42382486e-02
+ -1.37718981e-02  2.00588740e-02  2.01678686e-02 -2.06511319e-02
+  7.19877565e-03  2.71590427e-02  3.67465951e-02 -5.76814413e-02
+ -1.40524404e-02 -6.78185597e-02 -4.13301736e-02  1.11710830e-02
+  4.94883023e-02  5.94368465e-02 -7.45444093e-03  4.79591079e-03
+ -1.33351691e-03 -2.32017748e-02 -4.44100201e-02 -3.04038256e-01
+  1.05709489e-02  5.15469443e-03  4.56899069e-02  9.20297056e-02
+ -2.91675907e-02  7.70790130e-02 -4.35412042e-02  6.67924210e-02
+ -1.39224473e-02 -3.02956775e-02  5.69235086e-02 -9.88806877e-03
+ -7.51372352e-02 -4.99704219e-02  2.99847405e-02  6.60625398e-02
+ -1.83018725e-02 -6.12504035e-02  7.06188604e-02  2.42670141e-02
+  3.56371514e-02 -2.29837596e-02 -1.81850828e-02  2.04540025e-02
+ -7.01371208e-03  1.33006901e-01 -3.31991240e-02  1.04491995e-03
+  3.54081020e-02 -2.81659029e-02  1.33357467e-02 -4.13096994e-02
+ -4.49993983e-02  6.38467446e-03  1.33791612e-02  9.54682380e-02
+ -2.20580772e-02 -4.47825389e-03 -4.41653766e-02  7.89711997e-03
+ -7.39230867e-03  2.88748927e-02 -5.64875714e-02 -4.86885533e-02
+ -2.03269925e-02 -4.46875505e-02  2.61011962e-02  8.12101364e-03
+  1.72951724e-02  7.39856735e-02 -4.12592199e-03  7.54417181e-02
+ -6.49908185e-02  4.57818620e-03 -5.40406164e-03 -3.65862623e-02
+  2.90300734e-02  5.04147261e-04 -9.18032695e-03  1.36782601e-02
+ -7.53551424e-02 -8.83555412e-03 -1.54043688e-02 -3.96408029e-02
+  8.72071739e-03 -2.23419983e-02 -2.31976323e-02  3.02839540e-02
+ -4.58535440e-02 -1.54118687e-02  5.79111353e-02  1.58019289e-02
+  6.68232217e-02  5.73910996e-02  2.59916782e-02 -8.56624171e-03
+ -3.71750668e-02 -9.78488475e-02  2.60017607e-02  5.97496741e-02
+  6.56024739e-02  5.03829271e-02 -2.21172795e-02 -3.16432603e-02
+ -2.31753401e-02  6.31773397e-02  3.25600766e-02  5.14553972e-02
+  1.48770856e-02 -5.56578301e-03 -1.56001206e-02 -4.41424251e-02
+ -1.13166133e-02  4.05502506e-02  2.43563820e-02 -2.29095891e-01
+ -9.29406844e-03  4.02206033e-02  1.01005808e-01  7.53273629e-03
+  2.84773745e-02  8.17001089e-02 -7.29492009e-02 -3.17935012e-02
+ -4.58475202e-02 -5.87501302e-02  4.60582450e-02  2.83971373e-02
+  2.15396518e-03 -5.14683779e-03 -3.25418077e-02  4.43685129e-02
+ -6.35187253e-02  3.05659305e-02 -4.93390784e-02  1.36983697e-03
+ -5.97995985e-03  1.70133650e-01 -7.64667336e-03  4.72950237e-03
+  6.31688461e-02 -2.68553663e-02 -5.19319698e-02 -1.21370768e-02
+ -3.40475887e-02 -9.73249692e-03  2.79264525e-02  4.01530741e-03
+ -4.47924361e-02  3.22274044e-02  4.79562357e-02 -3.38048078e-02
+ -2.66431011e-02  4.43824800e-03 -3.55763324e-02  4.46975492e-02
+ -1.30795210e-03  3.71381221e-03 -1.15448255e-02  6.48285672e-02
+ -6.19656295e-02  8.20548565e-04  1.65642202e-02  1.55596603e-02
+  1.66474620e-03 -2.35039573e-02  3.81922610e-02  4.59123924e-02
+ -5.22176251e-02  1.89801082e-02  1.43344710e-02  3.96850072e-02
+ -1.34801213e-02  3.43604386e-02 -4.40485869e-03  1.93561753e-03
+ -9.30230133e-03 -1.77652184e-02  1.91452447e-02  1.32114831e-02]"
+7,Communicative Agents for Software Development,"Software engineering is a domain characterized by intricate decision-making processes, often relying on nuanced intuition and consultation. Recent advancements in deep learning have started to revolutionize software engineering practices through elaborate designs implemented at various stages of software development. In this paper, we present an innovative paradigm that leverages large language models (LLMs) throughout the entire software development process, streamlining and unifying key processes through natural language communication, thereby eliminating the need for specialized models at each phase. At the core of this paradigm lies CHATDEV, a virtual chat-powered software development company that mirrors the established waterfall model, meticulously dividing the development process into four distinct chronological stages: designing, coding, testing, and documenting. Each stage engages a team of agents, such as programmers, code reviewers, and test engineers, fostering collaborative dialogue and facilitating a seamless workflow. The chat chain acts as a facilitator, breaking down each stage into atomic subtasks. This enables dual roles, allowing for proposing and validating solutions through context-aware communication, leading to efficient resolution of specific subtasks. The instrumental analysis of CHATDEV highlights its remarkable efficacy in software generation, enabling the completion of the entire software development process in under seven minutes at a cost of less than one dollar. It not only identifies and alleviates potential vulnerabilities but also rectifies potential hallucinations while maintaining commendable efficiency and cost-effectiveness. The potential of CHATDEV unveils fresh possibilities for integrating LLMs into the realm of software development. Our code is available at https://github.com/OpenBMB/ChatDev.",Software engineering decision-making Deep learning in software development Large language models (LLMs) Waterfall model in software engineering,"“Collaboration allows us to know more than we are capable of knowing by ourselves. It empowers us to think differently, access information we wouldn’t have otherwise, and combine ideas as we work together towards a shared goal.” —–Paul Solarz Software engineering entails a methodical and disciplined approach to the development, operation, and maintenance of software systems [4]. However, the complexity of software intelligence often leads to decisions based on intuition and limited consultation with senior developers [14]. Recent advancements in deep learning techniques have prompted researchers to explore their application in software engineering, aiming to improve effectiveness, efficiency, and cost reduction . Prior studies in deep learning-based software engineering have addressed various tasks, categorized as software requirements, design, implementation, testing, and maintenance [34; 29]. The software development process involves multiple roles, including organizational coordination, task allocation, code writing, system testing, and documentation preparation. It is a highly complex and intricate activity that demands meticulous attention to detail due to its long development cycles [17; 4]. In recent years, large language models (LLMs) have achieved significant milestones in the field of natural language processing (NLP) [5] and computer vision (CV) [35]. After training on massive corpora using the “next word prediction” paradigm, LLMs have demonstrated impressive performance on a wide range of downstream tasks, such as context-aware question answering, machine translation, and code generation. In fact, the core elements involved in software development, namely codes and documents, can both be regarded as “language” (i.e., sequences of characters) [7]. From this perspective, this paper explores an end-to-end software development framework driven by LLMs, encompassing requirements analysis, code development, system testing, and document generation, aiming to provide a unified, efficient, and cost-effective paradigm for software development. Directly generating an entire software system using LLMs can result in code hallucinations to a certain extent, similar to the phenomenon of hallucination in natural language knowledge querying [2]. These hallucinations include incomplete implementation of functions, missing dependencies, and potential undiscovered bugs. Code hallucinations arise primarily due to two reasons. Firstly, 2 the lack of task specificity confuses LLMs when generating a software system in one step. Granular tasks in software development, such as analyzing user/client requirements and selecting programming languages, provide guided thinking that is absent in the high-level nature of the task handled by LLMs. Secondly, the absence of cross-examination in decision-making poses significant risks [9]. Individual model instances propose a diverse range of answers, throwing the requirements to debate or examine the responses from other model instances to converge on a single and more accurate common answer [12], such as code peer-review and suggestion feedbacks. To address the aforementioned challenges, we “establish” a virtual✿✿✿✿ chat-powered software ✿✿✿ technology company – CHATDEV. It follows the classic waterfall model [3] and divides the process into four phases: designing, coding, testing, and documenting. At each phase, CHATDEV recruits multiple agents with different roles, such as programmers, reviewers, and testers. To facilitate effective communication and collaboration, CHATDEV utilizes a proposed chat chain that divides each phase into atomic subtasks. Within the chat chain, each node represents a specific subtask, and two roles engage in context-aware, multi-turn discussions to propose and validate solutions. This approach ensures that client requirements are analyzed, creative ideas are generated, prototype systems are designed and implemented, potential issues are identified and addressed, debug information is explained, appealing graphics are created, and user manuals are generated. By guiding the software development process along the chat chain, CHATDEV delivers the final software to the user, including source code, dependency environment specifications, and user manuals. The experiment analyzed all the software produced by CHATDEV in response to 70 user requirements. On average, CHATDEV generated 17.04 files per software, alleviated potential code vulnerabilities caused by code hallucinations 13.23 times, had a software production time of 409.84 seconds, and incurred a manufacturing cost of $0.2967. Discussions between a reviewer and a programmer led to the identification and modification of nearly twenty types of code vulnerabilities, while discussions between a tester and a programmer resulted in the identification and resolution of more than ten types of potential bugs. In summary, our main contributions are as follows: • We propose CHATDEV, a chat-based software development framework. By merely specifying a task, CHATDEV sequentially handles designing, coding, testing, and documenting. This new paradigm simplifies software development by unifying main processes through language communication, eliminating the need for specialized models at each phase. • We propose the chat chain to decompose the development process into sequential atomic subtasks. Each subtask requires collaborative interaction and cross-examination between two roles. This framework enables multi-agent collaboration, user inspection of intermediate outputs, error diagnoses, and reasoning intervention. It ensures a granular focus on specific subtasks within each chat, facilitating effective collaboration and promoting the achievement of desired outputs. • To further alleviate potential challenges related to code hallucinations, we introduce the thought instruction mechanism in each independent chat process during code completion, reviewing, and testing. By performing a “role flip”, an instructor explicitly injects specific thoughts for code modifications into instructions, thereby guiding the assistant programmer more precisely. • The experiments demonstrate the efficiency and cost-effectiveness of CHATDEV’s automated software development process. Through effective communication, proposal, and mutual examination between roles in each chat, the framework enables effective decision-making.","In this study, we have presented CHATDEV, a chat-based end-to-end software development framework that leverages LLMs to facilitate effective communication and collaboration among multiple roles involved in the software development process. By decomposing the development process into sequential atomic subtasks through the use of the chat chain, CHATDEV enables granular focus and promotes desired outputs for each subtask. Additionally, the thought instruction mechanism alleviates challenges related to code hallucinations by guiding programmers through specific code modifications during code completion, reviewing, and testing. Our experimental results demonstrate the efficiency and cost-effectiveness of the automated software development process driven by CHATDEV. By employing multiple agents with different roles, we have proposed a new paradigm in generating software systems, alleviating code vulnerabilities, and identifying and resolving potential bugs. The collaborative interactions and mutual examination between roles within each chat have contributed to effective decision-making for each subtask. Moving forward, further research can focus on refining the communication protocols and optimizing the interaction dynamics within each chat to enhance the performance and effectiveness of CHATDEV. Additionally, exploring the integration of other emerging technologies, such as reinforcement learning and explainable AI, could provide valuable insights into addressing challenges and improving the overall software development process. Our research will persist in exploring enhancements and advancements in CHATDEV agents, workflow, and development environments. The overarching objective is to achieve even greater efficiency in software production by improving various characteristics, such as reducing the length of chat chains or optimizing subtask solving logic and strategies, ultimately leading to more streamlined and effective software production processes. We hope the potential of the proposed natural-language-to-software framework can illuminate fresh possibilities for integrating LLMs into software development and mark the dawn of a new frontier in the field of natural language processing, software engineering, and collective intelligence.","[1] Mohammad Alahmadi, Abdulkarim Khormi, Biswas Parajuli, Jonathan Hassel, Sonia Haiduc, and Piyush Kumar. Code localization in programming screencasts. Empir. Softw. Eng., 25(2):1536–1572, 2020. [2] Razvan Azamfirei, Sapna R Kudchadkar, and James Fackler. Large language models and the perils of their hallucinations. Critical Care, 27(1):1–2, 2023. [3] Youssef Bassil. A simulation model for the waterfall software development life cycle. arXiv preprint arXiv:1205.6904, 2012. [4] Jorge Biolchini, Paula Gomes Mian, Ana Candida Cruz Natali, and Guilherme Horta Travassos. Systematic review in software engineering. System engineering and computer science department COPPE/UFRJ, Technical Report ES, 679(05):45, 2005. [5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. [6] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021. [7] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021. [8] Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128, 2023. [9] Roi Cohen, May Hamri, Mor Geva, and Amir Globerson. Lm vs lm: Detecting factual errors via cross examination. arXiv preprint arXiv:2305.13281, 2023. [10] Juan de Vicente Mohino, Javier Bermejo Higuera, Juan Ramón Bermejo Higuera, and Juan Antonio Sicilia Montalvo. The application of a new secure software development life cycle (s-sdlc) with agile methodologies. Electronics, 8(11):1218, 2019. [11] Yihong Dong, Xue Jiang, Zhi Jin, and Ge Li. Self-collaboration code generation via chatgpt, 2023. [12] Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. CoRR, abs/2305.14325, 2023. 15 [13] Saad Ezzini, Sallam Abualhaija, Chetan Arora, and Mehrdad Sabetzadeh. Automated handling of anaphoric ambiguity in requirements: A multi-solution study. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022, pages 187–199. ACM, 2022. [14] Peter Freeman, Donald J. Bagert, Hossein Saiedian, Mary Shaw, Robert Dupuis, and J. Barrie Thompson. Software engineering body of knowledge (SWEBOK). In Proceedings of the 23rd International Conference on Software Engineering, ICSE 2001, 12-19 May 2001, Toronto, Ontario, Canada, pages 693–696. IEEE Computer Society, 2001. [15] Yao Fu, Hao Peng, Tushar Khot, and Mirella Lapata. Improving language model negotiation with self-play and in-context learning from AI feedback. CoRR, abs/2305.10142, 2023. [16] Sa Gao, Chunyang Chen, Zhenchang Xing, Yukun Ma, Wen Song, and Shang-Wei Lin. A neural model for method name generation from functional description. In 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24-27, 2019, pages 411–421. IEEE, 2019. [17] Robert L. Glass, Iris Vessey, and Venkataraman Ramesh. Research in software engineering: an analysis of the literature. Information and Software technology, 44(8):491–506, 2002. [18] Fred J Heemstra. Software cost estimation. Information and software technology, 34(10):627– 639, 1992. [19] Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension, ICPC 2018, Gothenburg, Sweden, May 27-28, 2018, pages 200–210. ACM, 2018. [20] Xing Hu, Xin Xia, David Lo, Zhiyuan Wan, Qiuyuan Chen, and Thomas Zimmermann. Practitioners’ expectations on automated code comment generation. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022, pages 1693–1705. ACM, 2022. [21] Magne Jorgensen and Martin Shepperd. A systematic review of software development cost estimation studies. IEEE Transactions on Software Engineering, 33(1):33–53, 2007. [22] Rafiq Ahmad Khan, Siffat Ullah Khan, Habib Ullah Khan, and Muhammad Ilyas. Systematic literature review on security risks and its practices in secure software development. ieee Access, 10:5456–5481, 2022. [23] Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for"" mind"" exploration of large scale language model society. arXiv preprint arXiv:2303.17760, 2023. [24] Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. CAMEL: communicative agents for ""mind"" exploration of large scale language model society. CoRR, abs/2303.17760, 2023. [25] Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, and Shuming Shi. Encouraging divergent thinking in large language models through multi-agent debate. CoRR, abs/2305.19118, 2023. [26] Ruibo Liu, Ruixin Yang, Chenyan Jia, Ge Zhang, Denny Zhou, Andrew M. Dai, Diyi Yang, and Soroush Vosoughi. Training socially aligned language models in simulated human society. CoRR, abs/2305.16960, 2023. [27] Cuauhtémoc López Martín and Alain Abran. Neural networks for predicting the duration of new software projects. J. Syst. Softw., 101:127–135, 2015. [28] Nadia Nahar, Shurui Zhou, Grace A. Lewis, and Christian Kästner. Collaboration challenges in building ml-enabled systems: Communication, documentation, engineering, and process. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022, pages 413–425. ACM, 2022. 16 [29] Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. Codegen: An open large language model for code with multi-turn program synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. [30] Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022. [31] Mohd. Owais and R. Ramakishore. Effort, duration and cost estimation in agile software development. In 2016 Ninth International Conference on Contemporary Computing (IC3), pages 1–5, 2016. [32] Joon Sung Park, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442, 2023. [33] Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. CoRR, abs/2304.03442, 2023. [34] Florian Pudlitz, Florian Brokhausen, and Andreas Vogelsang. Extraction of system states from natural language requirements. In 27th IEEE International Requirements Engineering Conference, RE 2019, Jeju Island, Korea (South), September 23-27, 2019, pages 211–222. IEEE, 2019. [35] Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022. [36] Leonard Salewski, Stephan Alaniz, Isabel Rio-Torto, Eric Schulz, and Zeynep Akata. In-context impersonation reveals large language models’ strengths and biases. CoRR, abs/2305.14930, 2023. [37] Yashar Talebirad and Amirhossein Nadiri. Multi-agent collaboration: Harnessing the power of intelligent LLM agents. CoRR, abs/2306.03314, 2023. [38] Hannes Thaller, Lukas Linsbauer, and Alexander Egyed. Feature maps: A comprehensible software representation for design pattern detection. In 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24-27, 2019, pages 207–217. IEEE, 2019. [39] Chengcheng Wan, Shicheng Liu, Sophie Xie, Yifan Liu, Henry Hoffmann, Michael Maire, and Shan Lu. Automated testing of software that uses machine learning apis. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022, pages 212–224. ACM, 2022. [40] Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and Philip S. Yu. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, pages 397–407. ACM, 2018. [41] Lei Wang, Jingsen Zhang, Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, and Ji-Rong Wen. Recagent: A novel simulation paradigm for recommender systems. CoRR, abs/2306.02552, 2023. [42] Song Wang, Taiyue Liu, and Lin Tan. Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016, pages 297–308. ACM, 2016. [43] Song Wang, Nishtha Shrestha, Abarna Kucheri Subburaman, Junjie Wang, Moshi Wei, and Nachiappan Nagappan. Automatic unit test generation for machine learning libraries: How far are we? In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021, pages 1548–1560. IEEE, 2021. 17 [44] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022. [45] Jimmy Wei, Kurt Shuster, Arthur Szlam, Jason Weston, Jack Urbanek, and Mojtaba Komeili. Multi-party chat: Conversational agents in group settings with humans and models. CoRR, abs/2304.13835, 2023. [46] Jonas Winkler, Jannis Grönberg, and Andreas Vogelsang. Predicting how to test requirements: An automated approach. In Software Engineering 2020, Fachtagung des GI-Fachbereichs Softwaretechnik, 24.-28. Februar 2020, Innsbruck, Austria, volume P-300 of LNI, pages 141– 142. Gesellschaft für Informatik e.V., 2020. [47] Tianming Zhao, Chunyang Chen, Yuanning Liu, and Xiaodong Zhu. GUIGAN: learning to generate GUI designs using generative adversarial networks. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021, pages 748–760. IEEE, 2021.",28.08.2023,https://arxiv.org/pdf/2307.07924v3.pdf,"[-5.26805744e-02  1.49254221e-02 -1.15404138e-02 -5.26406281e-02
+ -1.08615530e-03 -3.67267653e-02 -1.61434244e-02  1.70401018e-02
+  2.55113877e-02 -3.59741598e-02 -2.24803407e-02 -1.99661707e-04
+  8.46015587e-02 -2.50165686e-02  5.50233312e-02  3.53655852e-02
+ -3.62872519e-02 -1.33363036e-02  1.73239708e-02 -3.84067558e-02
+  6.12606220e-02 -2.24968735e-02  1.55269150e-02 -2.80079432e-02
+ -1.83478240e-02  4.21741083e-02 -3.32810096e-02 -5.08184507e-02
+  1.55543480e-02 -1.92005053e-01  6.48277020e-03  6.28972724e-02
+  9.42717567e-02 -9.11735697e-04 -7.01434631e-03  7.28679672e-02
+  1.34950718e-02 -5.60396537e-02 -8.77747033e-03  2.29747724e-02
+ -8.98475759e-03  3.47814374e-02 -2.61826012e-02 -3.92778926e-02
+  1.64464135e-02 -7.09874257e-02 -1.19637363e-02 -2.14705542e-02
+ -8.94996971e-02 -1.71361696e-02 -3.69089767e-02 -6.62527755e-02
+  3.23762698e-03  1.36040039e-02  1.32820243e-02  5.39686717e-02
+  5.10339960e-02  5.47603108e-02 -1.98663455e-02 -6.96983375e-03
+  3.62400599e-02 -8.83111451e-03 -1.04018241e-01  2.93957721e-02
+ -1.46957310e-02  3.85797396e-02 -5.47024310e-02 -7.31780892e-03
+  2.34583113e-02  1.85709912e-02  4.54106368e-02 -2.16473872e-03
+ -5.93170384e-03  1.80308651e-02  3.98788452e-02  4.78425473e-02
+  9.29071754e-03 -3.31351720e-03  5.57226576e-02 -2.24985871e-02
+  1.11855557e-02 -9.35544632e-03 -2.34650113e-02 -2.12303698e-02
+ -5.48440479e-02 -2.03182958e-02  4.88552824e-02  7.84067996e-03
+  3.17678377e-02  1.50555577e-02 -5.59551381e-02  2.55847326e-03
+  3.69739928e-03 -1.92596745e-02 -2.37553511e-02  2.06556767e-02
+ -8.61715153e-03 -6.72819559e-03 -4.86845039e-02  4.46890742e-01
+ -3.10641658e-02 -3.29004377e-02 -1.97194908e-02 -3.84422466e-02
+  7.08866213e-03  3.27229989e-03 -1.48651786e-02 -5.26397862e-02
+  2.11986024e-02  2.63531804e-02 -4.43633571e-02 -9.94402543e-03
+  2.28868425e-02  8.68307147e-03 -1.01249116e-02 -1.68537593e-03
+  3.82796824e-02 -7.22632883e-03 -3.97728896e-03  1.96205489e-02
+ -9.10536852e-03  4.75231223e-02  1.93533543e-02 -5.49765863e-02
+  2.03051530e-02 -6.75693005e-02 -1.51470937e-02  6.73041195e-02
+ -8.06114171e-03  8.66076257e-03  4.80806045e-02  6.09424338e-02
+ -6.58174232e-02  1.19868992e-02  5.96882217e-02 -5.72815863e-03
+ -6.30669221e-02 -2.43505519e-02 -4.66277711e-02  6.16223644e-03
+ -3.75240259e-02  6.60690665e-02  4.19983640e-02 -2.15697084e-02
+ -5.90530671e-02  7.32894242e-02  2.37922296e-02 -1.80932637e-02
+ -5.43892421e-02 -3.84620316e-02  3.02893464e-02  3.63164209e-02
+  1.46869558e-03 -1.99306402e-02  5.42305969e-02 -1.27598355e-02
+  2.79508289e-02  2.08238941e-02 -1.03933863e-01  7.93333724e-03
+  5.86332195e-03  1.33194923e-02 -3.91569622e-02  1.12910748e-01
+ -3.04032881e-02 -5.61766997e-02 -1.16569744e-02 -1.98651813e-02
+  2.71164812e-02 -1.45081133e-02  3.93055007e-03  3.00845653e-02
+  1.07321320e-02 -1.27499672e-02 -4.74868715e-02 -2.68921368e-02
+ -1.05373375e-01 -1.37324017e-02  1.80980656e-02  1.22903911e-02
+  3.97977866e-02 -1.40362522e-02  2.96582635e-02  1.18424240e-02
+ -3.03074718e-02 -8.83822367e-02  4.78760153e-02 -3.79837379e-02
+ -2.75352411e-02 -2.92525981e-02  7.00289500e-04  7.14385584e-02
+  1.78848580e-02  4.58659939e-02  2.26873476e-02 -7.15202605e-03
+ -9.21560172e-03  9.14040953e-03 -4.65268865e-02 -2.56501138e-02
+ -4.36573848e-02  5.96551076e-02 -3.15942317e-02 -1.12717180e-02
+ -3.64686400e-02 -1.95322223e-02  2.78805173e-03  1.73114501e-02
+  2.57314630e-02  4.20704968e-02 -5.14619946e-02 -1.24821784e-02
+  4.60489467e-02 -1.33880973e-02  2.68288702e-02  1.95314232e-02
+  2.64697410e-02  6.49154931e-02 -1.94600075e-02  1.33621404e-02
+ -3.72210182e-02  3.27991024e-02 -9.38648300e-04 -2.97288507e-01
+ -1.78520810e-02  4.20501754e-02 -3.27874012e-02  1.35782536e-03
+ -9.15264059e-03  1.81423072e-02 -5.73144928e-02  3.91423889e-02
+  2.89113000e-02  5.88155612e-02  1.43361166e-02 -4.46056537e-02
+ -2.06542946e-03  5.33559825e-03  1.04976436e-02  8.45556962e-04
+  3.33869755e-02 -8.97726342e-02 -3.78289795e-03  2.87842825e-02
+  1.78141578e-03  6.12249635e-02 -1.16428569e-01 -3.28830853e-02
+  3.08200009e-02  1.08972013e-01 -4.64725681e-02  5.38690463e-02
+  3.66255571e-03  2.43740864e-02  4.06528078e-02  3.19700241e-02
+ -1.00119866e-01  5.42924590e-02  1.74020138e-02  8.26845318e-02
+  1.78644527e-02  3.30204740e-02  1.05318362e-02 -2.28580018e-03
+ -1.12585295e-02 -5.11589013e-02 -9.74582061e-02 -2.64504571e-02
+ -1.72887109e-02 -4.47727703e-02 -8.14257264e-02 -1.52312294e-02
+ -1.04554528e-02 -6.58942107e-03  1.37517601e-02  4.99892496e-02
+  4.44582440e-02 -5.53186648e-02 -4.92273234e-02 -4.01468314e-02
+  1.84463188e-02  1.53490494e-03 -5.69082014e-02  2.72274879e-03
+ -3.59393447e-03 -3.59207466e-02 -1.39886187e-02  1.49683114e-02
+  3.41245253e-03  1.14044091e-02 -2.71399096e-02  2.42176354e-02
+ -2.77264677e-02 -3.72100584e-02  9.43726823e-02 -5.35175465e-02
+ -1.77638885e-02 -4.95821238e-03  2.93516647e-02  1.79560818e-02
+ -2.00286154e-02 -4.31405753e-02 -6.75887452e-04  6.26909435e-02
+  1.40726147e-02  5.70426919e-02  3.16286311e-02  6.20095409e-04
+  2.27880245e-03  1.64637493e-03 -3.98822092e-02  3.69764604e-02
+  3.91020104e-02 -5.12478054e-02 -1.79421306e-02 -3.29025015e-02
+ -7.73771629e-02  4.84086163e-02 -9.71570984e-03 -2.11354494e-01
+  6.70223637e-03  5.80657681e-04  3.08652781e-02 -5.78367971e-02
+  5.42705432e-02  4.26185094e-02 -7.91340098e-02 -1.47240432e-02
+  6.95841992e-03  1.93569027e-02  2.49280725e-02  2.68850196e-02
+ -2.48350836e-02  7.77200609e-02  1.65928025e-02  9.08885524e-02
+ -7.28870556e-02  4.53878902e-02 -1.24986172e-02  8.75123311e-03
+  3.19225490e-02  1.39846653e-01 -4.03174758e-02  7.70871788e-02
+  2.18198560e-02 -8.07158370e-03 -5.59662236e-03  4.67291996e-02
+  5.39216213e-02  9.25984234e-03 -3.80870025e-03  1.34920299e-01
+  2.80519910e-02 -4.44405787e-02  5.98387085e-02  4.04407494e-02
+ -4.76785470e-03  3.99158895e-02 -5.53678768e-03  6.09988756e-02
+  7.70595670e-03  7.82703534e-02 -1.04530547e-02  2.74339430e-02
+  6.84990883e-02  1.03104636e-02 -7.35142231e-02 -1.10786417e-02
+ -2.30045319e-02 -2.29249690e-02 -3.02304421e-02  6.07110746e-03
+  2.26619188e-02  6.69698641e-02 -4.39456813e-02  1.48348138e-02
+ -1.63328834e-02 -3.42050642e-02 -6.33088425e-02  2.98220431e-03
+ -4.86878268e-02  6.71002418e-02  2.18169075e-02 -3.06819230e-02]"