

RIKEN

# Large-scale FPGA Simulator for Coherent Ising Machines

## Farad Khoyratee<sup>1,2</sup>, Timothee Leleu<sup>3,4</sup>, Satoshi Kako<sup>5</sup>, Franco Nori<sup>1,2</sup> and Yoshihisa Yamamoto<sup>5</sup>

<sup>1</sup>Theoretical Quantum Physics Laboratory, RIKEN Cluster for Pioneering Research, Wako-shi, Saitama 351-0198, Japan <sup>2</sup>Physics Department, University of Michigan, Ann Arbor, Michigan 48109-1040, USA <sup>3</sup>Institute of industrial Science, University of Tokyo, Japan <sup>4</sup>International Research Center for Neurointelligence, University of Tokyo, Japan <sup>5</sup>Physics & Informatics Laboratories, NTT Research Inc., 1950 University Ave., #600, East Palo Alto, CA94303, USA



Figure 5: a) The brain can process complex functions at low frequency and low power consumption (~20 W); b) Various neuromorphic technologies that have been developed to provide efficient computation. FPGA is a good compromise between an Application Specified Integrated Circuit (ASIC) and digital computing.

Figure 3: Principle of FPGA from the National Instrument website (https://www.ni.com/)

LOGIC BLOCKS

$$\frac{dx_i}{dt} = f_{i(x_i)} + \beta(t) \sum_i w_{ij} g_{j(x_j)} + \sigma \eta_i$$

$$V_b = -\sum_i \int_0^{y_i} f_i\left(g_i^{-1}(y)\right) dy$$
$$H = -\sum_i w_{ij} y_i y_j$$

a XCVU13P FPGA provided by Xilinx and integrated into a board provided by Bittware. The current implementation works using different frequencies to compute the terms shown in figure 6.a). Here, we proposed a neuromorphic architecture to solve Ising problems with several features:

- The use of several frequencies and time steps to reduce power consumption;
- Highly parallelized and pipelined neurons (for a total of 40K neurons);
- Hierarchical organization to optimize communication between neurons;
- All-to-all connection.

Figure 6: Chaotic amplitude equations computed at different frequencies where  $f_n$ =200 MHz,  $f_x$ =600 MHz and  $f_e$ =400 MHz; b) Proposed FPGA architecture implementation based on the nervous system organization.

Here we show the results of the current implementation (XUPVPP design A) and an estimation of different scenarios of the implementation, showing that increasing the computation capacity of one dot product leads to better results.



Figure 7. Benchmark of the state-of-the-art and different scenarios of the current design at different sizes of the matrix computation per clock cycle. Two hypothesis are shown here for the scaling of the Time-To-Solution (TTS) according to the problem size N; a) The TTS scales as  $e^N$ ; b) the TTS scales as  $e^{\sqrt{N}}$ .

One of the advantages of such architecture is its adaptability. We plan to extend this implementation with the addition of a truncated Wigner, quantum Gaussian model, real weight value, and Zeeman terms. Also, the presented neuromorphic architecture will be implemented into a rack of 8 FPGAs to achieve higher speed and larger problem size ( $N=10^5$  to  $10^6$ ).

Finally, most Ising machines do not scale when N > 2000, and CPU heuristics then become unreliable. Having the fastest machine would allow us to estimate the scaling for larger system sizes according to the Replica Symmetry Breaking (RSB) theory of Parisi (Boettcher, 2020).



### 4. Results

## 5. Perspectives