It has been shown that deep neural networks of a large enough width are universal approximators but they are not if the width is too small. There were several attempts to characterize the minimum width wmin enabling the universal approximation property; however, only a few of them found the exact values. In this work, we show that the minimum width for Lp approximation of Lp functions from [0, 1]dx to ℝdy is exactly max{dx, dy, 2} if an activation function is RELU-LIKE (e.g., RELU, GELU, SOFTPLUS). Compared to the known result for RELU networks, wmin = max{dx + 1, dy} when the domain is ℝdx, our result first shows that approximation on a compact domain requires smaller width than on Rdx. We next prove a lower bound on wmin for uniform approximation using general activation functions including RELU: wmin ≥ dy + 1 if dx < dy ≤ 2dx. Together with our first result, this shows a dichotomy between Lp and uniform approximations for general activation functions and input/output dimensions.
NK and SP were supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-00079, Artificial Intelligence Graduate School Program, Korea University) and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2022R1F1A1076180).