corCTF 2023 - oil spill

2023-07-30 | ::ctf ::crypto ::mq

For corCTF this year, I wrote the challenge oil spill, which is based on forging a signature in the unbalanced oil and vinegar signature scheme. It turned out to be much harder than intended, getting 3 solves total during the competition.

Unbalanced Oil and Vinegar

Unbalanced oil and vinegar is a multivariate quadratic based signature scheme. It has now been submitted to the new NIST post quantum signatures competition, along with other schemes such as MAYO. The signature scheme Rainbow, which attacked and broken and was a finalist in the previous NIST competition, also relies on UOV.

Overview

So how do UOV and multivariate quadratics work? A quadratic form $p(\bm{x})$ is given by $p(\bm{x}): \mathbb{F}^n_q \to \mathbb{F}_q = \bm{x}^T \bm{A}\bm{x}$

Here, $\bm{A}$ is an upper triangular matrix. Each element $a_{ij}$ represents the coefficient of the term $x_ix_j$ . The polar form of the following quadratic is $p'(\bm{x}, \bm{y}) = p(\bm{x} + \bm{y}) - p(\bm{x}) - p(\bm{y})$

Notice that $p'(\bm{x}, \bm{y}) = \bm{x}^T(\bm{A} + \bm{A}^T)\bm{y}$ . This is true as $p(\bm{x} + \bm{y}) = \bm{x}^T \bm{A}\bm{x} + \bm{x}^T (\bm{A} + \bm{A}^T)\bm{y} + \bm{y}^T \bm{A}\bm{y}$

Therefore, the polar form is symmetric and linear in both $\bm{x}$ and $\bm{y}$ .

A multivariate quadratic map $\mathcal{P}: \mathbb{F}_q^{n} \to \mathbb{F}_q^{m}$ is then defined as

$\mathcal{P} = \left[p_1(\bm{x}), p_2(\bm{x}), \ldots, p_m(\bm{x})\right]$

The multivariate quadratic map is a vector of $m$ multivariate quadratic polynomials in $n$ variables.

Signatures

To sign some vector $\bm{t} \in \mathbb{F}^{m}_q$ we must find $s \in \mathbb{F}^{n}_q$ such that $\mathcal{P}(s) = t$ .

To find such an $\bm{s}$ , we need some secret knowledge (the secret key). If we know a subspace $O \subset \mathbb{F}_q^{n}$ and $\dim O = m$ such that $\mathcal{P}(\bm{o}) = \bm{0}\,\,\,\,\,\,\, \forall \,\bm{o} \in O$

we can find $\mathcal{P}(\bm{s}) = \bm{t}$ for any $\bm{t}$ quickly.

To do this, we pick a random vector $\bm{v}$ and instead solve for a vector $\bm{o} \in O$ such that $\mathcal{P}(\bm{v} + \bm{o}) = \bm{t}$ .

We can rewrite $\mathcal{P}(\bm{v} + \bm{o})$ as $\mathcal{P}(\bm{v} + \bm{o}) = \mathcal{P}(\bm{v}) + \mathcal{P'}(\bm{v}, \bm{o}) + \mathcal{P}(\bm{o}) = \bm{t}$

We know both $\mathcal{P}(\bm{v})$ and $\mathcal{P}(\bm{o})$ . Since $\mathcal{P}'$ is linear in $\bm{o}$ and $\dim O = m$ , we can use gaussian elimination to solve for our vector $\bm{o}$ if we have a basis of $O$ .

Therefore, to sign a message $\bm{t}$ , we must find $\bm{s}$ such that $\mathcal{P}(\bm{s}) = \bm{t}$ , which is easy given $O$ .

The challenge

In the challenge, we are provided a server which generates a multivariate quadtratic map $\mathcal{P}: \mathbb{F}^{106}_q \to \mathbb{F}^{44}_q$ (slightly smaller than NIST level 1 parameters) and a basis for $O$ .

In addition to this, we are provided two "oil spills" or hints.

We are given a vector $\bm{g} + \bm{o}$ , where $\bm{g}$ is a vector of $0$ and $1$ elements (hint hint lattice) and $\bm{o} \in O$ .
We are also given the vector $\mathcal{P}(\bm{g})$ .

The solution

First, let $\hat{\bm{o}}$ be the vector "close" to $\bm{o}$ . Then note that

$\mathcal{P}(\hat{\bm{o}} - \bm{g}) = \mathcal{P}(\hat{\bm{o}}) + \mathcal{P}(\bm{g}) - \mathcal{P}'(\hat{\bm{o}}, \bm{g})$

Notice that since we know $\hat{\bm{o}}$ , we also know its map $\mathcal{P}(\hat{\bm{o}})$ . Let this value be $k$ .

Since $\hat{\bm{o}} - \bm{g} = \bm{o}$ ,

$\bm{0} = \mathcal{P}(\hat{\bm{o}}) + \mathcal{P}(\bm{g}) - \mathcal{P}'(\hat{\bm{o}}, \bm{g})$ $\mathcal{P}'(\hat{\bm{o}}, \bm{g}) = k + \mathcal{P}(\bm{g})$

We know the vector on the right hand side, so what's left is the left. If we consider a singular equation $p_i'(x, y)$ , we see that

$\hat{\bm{o}}^T(\bm{P}_i + \bm{P}_i^T)\bm{g} = k_i + \bm{g}^T\bm{P}_i\bm{g}$

Since $\bm{g}$ is a $0/1$ vector, this is basically an instance of the subset sum problem!

Lattices, after all

Consider the lattice spanned by the rows of the following matrix:

$\bm{B} = \begin{bmatrix} 1 & 0 & 0 & \ldots & \hat{\bm{o}}(\bm{P}_0 + \bm{P}_0^T)[0] & \ldots & \hat{\bm{o}}(\bm{P}_m + \bm{P}_m^T)[0] & \\ 0 & 1 & 0 & \ldots & \hat{\bm{o}}(\bm{P}_0 + \bm{P}_0^T)[1] & \ldots & \hat{\bm{o}}(\bm{P}_m + \bm{P}_m^T)[1] & \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \\ 0 & 0 & 0 & \ldots & q & 0 & 0 & \\ 0 & 0 & 0 & \ldots & 0 & q & 0 \\ 0 & 0 & 0 & \ldots & 0 & 0 & \ddots \\ 0 & 0 & 0 & \ldots & k_0 + \bm{g}^T\bm{P}_0\bm{g} & \ldots & k_m + \bm{g}^T\bm{P}_m\bm{g} \\ \end{bmatrix}$

Consider the linear combination $\bm{t} = (\bm{g}\, ||\, w_0, w_1, w_2, \ldots || -1)$ (here $w_i$ is the coefficient of $q$ when doing the modular arithmetic under $q$ )

This linear combination gives us the short vector $\bm{t}\bm{B} = (\bm{g} || \bm{0})$

However, the density of the problem is too high for LLL, and typically LLL does not actually recover our vector $\bm{t}\bm{B}$ . We can just run BKZ instead, with progressively higher block sizes until it finds the vector. I was able to recover it with a block size of 15 consistently.

Section 2: Recovering the full basis

Now we have the vector $\bm{g}$ . Since we have $\bm{g}$ , we also have a single vector $\bm{o} \in O$ .

Once we have a single vector in the oil basis, it is much easier to recover the full basis. There are two methods:

Brute force Groebner Basis approach

We want to recover another vector $\bm{o}_1 \in O$ . What constraints do we have on our vector? We know that $\mathcal{P}(\bm{o}_1) = \bm{0}$ . This gives us $44$ polynomial equations in $106$ variables under $\mathbb{F}_q$ . We can use a Groebner basis to solve this system, but it will run forever.

However, we know that $\dim O = m$ . This means that we can fix $m$ variables in $\bm{o}_1$ as constants, and then solve the reduced $n - m$ variable system.This is still not very doable.

Since we know $\bm{o} \in O$ , we know that $\mathcal{P}'(\bm{o}, \bm{o}_1) = 0$ . Therefore, we get another $m$ linear equations. So with all the information, we have $n - m$ variables, $m$ quadratic equations, and $m$ linear equations. This can be solved relatively fast with a groebner basis algorithm such as F4.

Finally, once we have found $\bm{o}_1$ , instead of $m$ linear equations, we have $2m$ linear equations and $n - m$ variables. This gives us more linear equations than variables, so we can recover the rest of the basis relatively easily.

Kipnis-Shamir like intersection approach

If we know one vector $\bm{o}$ , consider the vector $(\bm{o}(P_1 + P_1^T)\bm{x}, \bm{o}(P_2 + P_2^T)\bm{x}, \ldots)$ . Note that $O$ is a subspace of the vector, as we know $P'(\bm{o}, \bm{x}) = 0$ . We can compute the basis $K$ of this kernel. From here, we can perform a Kipnis Shamir attack by finding the common eigenvectors of each $P_i + P_i^T$ , which will allow us to recover the full basis.

Bringing it all together

Once we have the full basis, we can just forge a signature to the random message sent, and we get the flag.

Some extra notes

I (was thinking of) but did not add a proof of work, so one participant solved by reconnecting until an LLL-able instance is found, instead of using BKZ. I didn't really expect this, though there might have been some load on the server due to this. In hindsight I probably should have added a POW.
This paper was released after I had written the challenge but before the CTF started. It details and provides an implementation of an algorithm to recover the full basis from a single vector. It is similar to the Kipnis Shamir attack, and was also used during the competition. I didn't really want my challenge to become an "implement the paper" crypto challenge, but unfortunately I got sniped. Both KS and Groebner are pretty well known attacks so I expected participants would use those. Hellman also pointed me to another paper after the CTF which also does a similar basis recovery and was published much earlier (thank you!).
Hopefully the challenge was fun, I really enjoyed writing it 🙂 (get ready for oil spill 2 🙃)