The future is now. Quantum Machine Learning in the Cloud!

## Backends

*This example is written for all Inspire backends, the emulator backend, Spin-2 backend and starmon-5 backend.*

## What does it do?

Classification is a form of machine learning in which labels are assigned to data, often with respect to other data. For instance, we can classify between animals having $2$ or $4$ legs or a different number. Often we classify new data based on already classified (learned) data. We classify this new data point by representing all data points by features or characteristics and compute the distance between the features of the new data point and the features of the learned data points. The new data point is assigned the label to which it has the shortest distance.

This is an example of $k$-nearest neighbor classification with $k=N$, which classically has a complexity of $\mathcal{O}(NM)$, with $N$ the number of data points and $M$ the number of features. A quantum version of this classifier was proposed by Schuld et al. , see ref.1, and has a constant complexity $\mathcal{O}(1)$.

## What is it used for?

Classification has wide-spread use in everyday life and we often classify objects without fully realizing we do it. Classically, naively we would have to compute the distance between the new data point and all other data points separately, after which we combine the results. This approach can be approved upon, but we are still left with a complexity of $\mathcal{O}(NM)$.

With a quantum-based approach, we do the same computations, but now only requiring one single-qubit operation and two measurements. Hence, the quantum-based approach is in theory better than the classical one. A caveat is however that the result is probabilistic, hence, multiple measurement rounds must be needed. Furthermore, a specific quantum state is assumed to begin with. Without something like a quantum RAM or a quantum process that produces the input state efficiently, we are left with the same complexity as the classical approach.

## How does classification work?

Let our data points be given by ${(x^i, y^i)}_{i=0}^{N-1}$, with $x^i$ the data point and $y^i$ the corresponding label and let $(x^*, y^*)$ be the unlabeled new data point. Furthermore, there are only two labels $y^i=\pm 1$. The new label is then given by evaluating

$\space\space\space\space\ (1) \space\space\space\space\begin{gathered} y^* = {\rm sgn}\left(\sum_{i=0}^{N-1} y^i \left[1-\frac{1}{4N}{\vert{x^*}-x^i\vert}^2\right]\right) \end{gathered}$

If both classes have the same number of data points, we can simplify the above function to

$\space\space\space\space\ (2) \space\space\space\space\begin{gathered} y^* = {\rm sgn}\left(\sum_{i=0}^{N-1} y^i {\vert{x^*} + x^i\vert}^2\right) \end{gathered}$

## Quantum implementation

### Preparation

If we normalize our data set, we can encode every data point into a quantum state

$\space\space\space\space\ (3) \space\space\space\space\begin{gathered} x^i = (x^i_0,...,x^i_{M-1})^T \Rightarrow \vert x^i \rangle = \sum_{j=0}^{M-1} x^i_j \vert j \rangle \end{gathered}$

where $\vert j \rangle$ is the $j$-th computational basis state. Suppose now that we start with the following quantum state

$\space\space\space\space\ (4) \space\space\space\space\begin{gathered} \frac{1}{\sqrt{2N}} \sum_{i=0}^{N-1} \vert i \rangle ( \vert 0 \rangle \vert x^* \rangle + \vert 1 \rangle \vert x^i \rangle) \vert y^i \rangle \end{gathered}$

The first register $\vert i \rangle$ is the $i$-th computational basis state to keep track of the $i$-th training point. The second register is an ancilla qubit entangled with the training set and the new test point. The third register is a combination of the new test point and the training set. The fourth register $\vert y^i \rangle$ encodes the labels of the training set. In the case of only two classes we need only one qubit to encode the labels and we identify $y^i=1$ with $\vert y^i \rangle = \vert 1 \rangle$ and $y^i=-1$ with $\vert y^i \rangle = \vert 0 \rangle$.

*NB this is in contrast with the convention in quantum mechanics, where the state $\it{ \vert 0 \rangle }$ corresponds with 1, and $\it{ \vert 1 \rangle }$ corresponds with -1.*

### The algorithm

Given this initial state, the quantum algorithm consists of only three operations. First, we apply a Hadamard on the second register. This gives

$\space\space\space\space\ (5) \space\space\space\space\begin{gathered} \frac{1}{2\sqrt{N}} \sum_{i=0}^{N-1} \vert i \rangle (\vert 0 \rangle (\vert x^* \rangle + \vert x^i \rangle) + \vert 1 \rangle (\vert x^* \rangle - \vert x^i \rangle )) \vert y^i \rangle \end{gathered}$

We then measure the second register and conditionally continue if we measured the $0$-state. This gives

$\space\space\space\space\ (6) \space\space\space\space\begin{gathered} \frac{1}{2\sqrt{Np_{0}}} \sum_{i=0}^{N-1} \vert i \rangle \vert 0 \rangle(\vert x^* \rangle + \vert x^i \rangle) \vert y^i \rangle = \ \frac{1}{2\sqrt{Np_{0}}} \sum_{i=0}^{N-1} \vert i \rangle \vert 0 \rangle (\sum_{j=0}^{M-1} x^*$

with $p_{0}$ the probability of obtaining $0$ after measuring the second register and $x^i_j$ the $j$-th feature of the $i$-th data point.

Now measuring the last register which encodes the label gives the probability of each class. For example

$\space\space\space\space\ (7) \space\space\space\space\begin{gathered} P(y^* = \alpha) = \frac{1}{4Np_{0}}\sum_{i \vert y^i=\alpha} \vert x^* + x^i \vert ^2 \end{gathered}$

where in our case $\alpha$ can be -1 or 1. It can be easily verified that this classifier will yield the same result as its classical counterpart formula 2. The difference is that formula 7 gives only the chance of a certain classification. The measurement should therefore be done numerous times to get an accurate result, the more the better. Also, when a test point lays close to the border of another class, the chance of a wrong classification becomes bigger.

Note that the two measurement rounds can be combined by classically post-processing the results.

## Example code

Let us consider a simple implementation of the above quantum distance-based classifier. As we have no quantum RAM, we will explicitly prepare the quantum state. Furthermore, we will consider only two data points, each with two features.

For normalized and standardized data, we can assume without loss of generality that

$\begin{matrix} \begin{aligned} &\vert x^0 \rangle = \cos(\theta/2)\vert 0 \rangle - \sin(\theta/2)\vert 1 \rangle \ &\vert x^1 \rangle = \cos(\phi/2)\vert 0 \rangle - \sin(\phi/2)\vert 1 \rangle \ &\vert x^* \rangle = \cos(\omega/2)\vert 0 \rangle - \sin(\omega/2)\vert 1 \rangle \end{aligned} &&&&& \begin{aligned} & y^0 = -1 \ & y^1 = 1 \ &\space \end{aligned} \end{matrix}$

where in this example we use

$\begin{matrix}
\begin{aligned}
&\space\space\space x_0 = (1,0)\
&\space\space\space x_1=(-0.9929,0.1191)\
&\space\space\space x^* = (0.9939, 0.1103)\
\end{aligned}
&&
\text{resulting in}
&&
\begin{aligned}
&\theta = 0 \
&\phi = −6.0445 \
&\omega = −0.2210 \
\end{aligned}
\end{matrix}$

The code is then given by

` ````
version 1.0
qubits 4
prep_z q[0:3]
H q[0:1]
.encode_new_data_point
#encode the new data point, this implements a cRy(omega)
CNOT q[1], q[2]
Ry q[2], 0.1105 # (Ry q[2], -omega/2)
CNOT q[1], q[2]
Ry q[2], -0.1105 # (Ry q[2], omega/2)
X q[1]
.encode_first_training_point
# encode the first data point, this implements a ccRy(theta)
toffoli q[0],q[1],q[2]
CNOT q[0],q[2]
Ry q[2], 0 # (Ry q[2], theta/4)
CNOT q[0],q[2]
Ry q[2], 0 # (Ry q[2], -theta/4)
toffoli q[0],q[1],q[2]
CNOT q[0],q[2]
Ry q[2], 0 # (Ry q[2], -theta/4)
CNOT q[0],q[2]
Ry q[2], 0 # (Ry q[2], theta/4)
X q[0]
.encode_second_training_point
# encode the second data point, this implements a ccRy(phi)
toffoli q[0],q[1],q[2]
CNOT q[0],q[2]
Ry q[2], -1.511125 # (Ry q[2], phi/4)
CNOT q[0],q[2]
Ry q[2], 1.511125 # (Ry q[2], -phi/4)
toffoli q[0],q[1],q[2]
CNOT q[0],q[2]
Ry q[2], 1.511125 # (Ry q[2], -phi/4)
CNOT q[0],q[2]
Ry q[2], -1.511125 # (Ry q[2], phi/4)
.labels
# encode the labels
CNOT q[0], q[3]
.algorithm
# The actual algorithm
H q[1]
measure_z q[1,3]
```

encode_new_data_point | encode_first_training_point | encode_second_training_point | labels | algorithm | ||||||||||||||||||||||||||||

q[0] | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

q[1] | ||||||||||||||||||||||||||||||||

q[2] | ||||||||||||||||||||||||||||||||

q[3] |

Note that we must discard the measurement results where the measurement of the second qubit gives $1$, so we should only look at results 1000 and 0000. Hence, we see that $P(y^*=-1) > P(y^*=1)$, as we would expect.

## Using the spin-2 quantum chip

Following N. Neumann in [3] we can implement the classifier in the spin-2 quantum chip. This is a non-trivial reduction to a two-qubit system, working only for the case of only two features and only two training data points.

The following variables are proposed

$t = \frac{ cos^2 ( \frac{\omega}{4}) }{ cos^2 (\frac{\omega - \phi}{4}) } \ \space\ \text{and}\ \space\ \omega ' = \text{arctan} \Bigg( \frac{1-\sqrt{t}}{1+\sqrt{t}} \Bigg)$

in combination with this algorithm

By measuring the lower (second) qubit and continuing only if the result is 0, we end up with the final state

$c * \space \Big( cos(\omega'/2) \vert 00 \rangle + \big( 1 + sin(\omega'/2) \big) \vert 01 \rangle \Big)$

where c is some normalizing constant.

Finally measuring the upper (first) qubit will give the class for the test data point, with the same probability as the original 4-qubit script.

Looking at our example we find

$\begin{matrix} \begin{aligned} & t &= \ & \omega'&=\ \end{aligned} & \begin{aligned} & 75.8202\ -&2.68422\ \end{aligned} \end{matrix}$

resulting in the following cqasm code

` ````
version 1.0
qubits 2
H q[0]
Ry q[1], 1.34211 # (Ry q[1], -theta/2
CNOT q[0], q[1]
Ry q[1], -1.34211 # (Ry q[1], theta/2
H q[0]
```

q[0] | ||||||
---|---|---|---|---|---|---|

q[1] |

The above result is obtained with a single shot optimized run in the emulator. If we discard the results where the first qubit gives 1, and only keep states 00 and 10, we see the same probability of a correct classification of 98.7% as for the 4 qubit version.

The above result is obtained by running the algorithm on the spin-2 quantum hardware chip with 1024 shots. We see the result is less accurate, but is still accurate enough to give the right classification.

## Want to know more?

For more information see the article by Schuld et al., the notebooks on this algorithm on our GitHub or this article [2], where the practicalities of the state preparation are given.

[1] Implementing a distance-based classifier with a quantum interference circuit, M. Schuld, M. Fingerhuth, and F. Petruccione, EPL (Europhysics Letters), Volume 119, Number 6, 2017

[2] R. Wezeman, N. Neumann, and F. Phillipson. “Distance-based classifier on the Quantum Inspire”. In: Digitale Welt4.1 (Jan. 2020), pp. 85–91

3 N. Neumann "Classification using a two-qubit quantum chip"