# Review- Tensorflow quantum(TFQ) whitepaper — Part2

In this second part, we will look into the fundamental details of the Hybrid Quantum classical machine learning.

Before you get into the details of this article, I would suggest you guys to brush up the fundamentals of quantum vector operations. You can refer to this awesome series on quantum computing to get started with.

# Overview

• In general quantum neural networks can be expressed as the product of the parameterized unitary matrices. Samples i.e. data points or the data distributions and expectation values i.e. you can say as the softmax output are defined by expressing the loss function as an inner product.
• Once we have the QNN defined and the expectation value generated , we can define the gradients then. We finally combine quantum and classical neural networks and formalize hybrid quantum-classical backpropagation.

# Quantum Neural Networks

• Let’s first hypothesis a quantum neural network representation function.

here the Lth layer of the QNN consists of product of V^l i.e a non parametric unitary and U(Θ^l) a unitary with variational parameters. The multi-parameter unitary can comprise of multiple unitaries applied in parallel.
If you don’t understand what a unitary is, you should better refer to the quantum series blogs as mentioned as I mentioned above.

Each of the unitaries can be expressed as the exponential of some generator as shown in the above forumula, which itself can be any Hermitian operator on n qubits. Here P subscript k denotes a Paulis on n-qubits and Beta parameter belongs to real number for all k, j, l.

In diagram (a) we can clearly see that, H is a Hadamard gate which is a constant, non parameterized gate and there is a parameterized gate at the bottom in the 3rd qubit lane.
(b) We see single parameterized gates makes up one unit W(Θ), which makes up the function V(Θ) and the product of these multiple composite functions V(Θ) makes up multiple unitaries Vl that generates the quantum model U(Θ).

# Sampling and Expectations

• To optimize the parameters of an ansatz from equation (1), we need a cost function to optimize. In the case of standard variational quantum algorithms, this cost function is most often chosen to be the expectation value of a cost Hamiltonian, where |Ψ0i is the input state to the parameterized circuit. In general, the cost Hamiltonian can be expressed as a linear combination of operators.

where we defined a vector of coefficients α ∈ R N and a vector of N operators hˆ. Often this decomposition is chosen such that each of these sub-Hamiltonians is in the n-qubit Pauli group hˆ k ∈ Pn.

• Even assuming a perfect fidelity of quantum computation, sampling measurement outcomes of eigenvalues of observables from the output of the quantum computer to estimate an expectation will have some non-negligible variance for any finite number of samples. To get an estimate of the expected value within an accuracy range, one needs to measure samples of the scale of exponents.
• Estimating gradients of quantum neural networks on quantum computers involves the estimation of several expectation values of the cost function for various values of the parameters. One trick that was recently pointed out and has been proven to be successful both theoretically and empirically to estimate such gradients is the stochastic selection of various terms in the quantum expectation estimation. This can greatly reduce the number of measurements needed per gradient update.

# Gradients of Quantum Neural Networks

## Stochastic Parameter Shift Gradient Estimation

• For each component of the gradient vector, for the jth component of the lth layer, the 2Kjl parameter-shifted expectation values to evaluate, thus in total there are 2*KjlMl parameterized expectation values of the cost of the hamiltonian to evaluate.
• As the cost of accurately estimating all these terms one by one and subsequently linearly combining the values such as to yield an estimate of the total gradient may be prohibitively expensive in terms of numbers of runs, instead, one can stochastically estimate this sum, by randomly picking terms according to their weighting.
• In principle, one could go one step further, and per iteration of gradient descent, randomly sample indices representing subsets of parameters for which we will estimate the gradient component, and set the non-sampled indices corresponding gradient components to 0 for the given iteration.
• In TFQ, all three of the stochastic averaging methods above can be turned on or off independently for stochastic parameter-shift gradients.