## Vector Encoding Schemes
In (key-policy) attribute-based encryption (ABE) and predicate encryption (PE) a ciphertext is associated with an attribute `$a$` and the ciphertext can only be decrypted if a party has a secret key for a predicate `$f$` such that `$f(a) = 1$`. In the former, the attribute is public whereas in the latter, the attribute is hidden. Inner product predicates are a powerful class of predicates that are supported by several schemes, starting with the PE scheme of Katz, Sahai and Waters !cite{conf/eurocrypt/KatzSW08}. With inner product predicates, an attribute `$a$` corresponds to a vector `$\mathbf{x}_a \in \mathbb{Z}_q^n$` and a predicate `$f$` corresponds to a vector `$\mathbf{y}_f \in \mathbb{Z}_q^n$` and it holds that `$f(a) = 1$` if and only if `$\langle\mathbf{x}_a, \mathbf{y}_f \rangle \equiv 0 \mod{q}$` where the notation `$\langle \cdot, \cdot \rangle$` denotes the inner product of the two vectors. Now this turns out to be quite an expressive notion as many families of predicates can be realized. For each class of predicates, an encoding must be devised to map an attribute `$a$` to a vector `$\mathbf{x}_a \in \mathbb{Z}_q^n$` for some dimension `$n$` and modulus `$q$`. In addition the encoding specifies how a policy `$f$` is encoded to a vector `$\mathbf{y}_f \in \mathbb{Z}_q^n$`. In this post, I examine such vector encodings. An interesting paper to read concerning vector encodings for various classes of predicates is !cite{journals/iacr/BauerVW18}. This paper studies the complexity of `$n$` and `$q$` for several classes of widely-used predicates. We now formalize the notion of a vector encoding for inner product predicates. From now on, we omit the mod `$q$` and assume that all arithmetic is performed in `$\mathbb{Z}_q$`.
!definition{def:ve}{
A *vector encoding* scheme `$\mathcal{E}_{\mathbb{F}, \mathbb{A}}$` for a class of predicates `$\mathbb{F}$` and a set of attributes `$\mathbb{A}$` is a tuple of probabilistic polynomial time (PPT) algorithms defined as folows:
- `$\mathbb{\mathsf{Setup}(1^\lambda)}$`: On input a security parameter `$\lambda$`, output parameters `$\mathsf{params}$` which includes `$n$`, the dimension of the vectors for the scheme, and `$q$`, the modulus. The parameters `$\mathsf{params}$` are passed as an implicit input to the remaining algorithms.
- `$\mathbb{\mathsf{EncodeA}(a)}$`: On input an attribute `$a \in \mathbb{A}$`, output an encoding `$\mathbb{x}_a \in X_a \subset X \subset \mathbb{Z}_q^{n}$` where `$X \subset \mathbb{Z}_q^n$` is the set of all attribute encodings and `$X_a \subset X$` is the subset of encodings for attribute `$a$`.
- `$\mathbb{\mathsf{EncodeP}(f)}$`: On input a predicate `$f \in \mathbb{F}$`, output an encoding `$\mathbb{y}_f \in Y_f \subset Y \subset \mathbb{Z}_q^{n}$` where `$Y\subset \mathbb{Z}_q^n$` is the set of all predicate encodings and `$Y_f \subset Y$` is the subset of encodings for predicate `$f$`.
}
!definition{def:ve_equiv}{*Encoding Equivalence*:
Two encodings `$\mathbb{x} \in X$` and `$\mathbb{x'} \in X$` are said to be *equivalent*, denoted by `$\mathbb{x} \equiv \mathbb{x'}$`, if and only if for all `$f \in \mathbb{F}$` and all `$\mathbb{y}_f \in Y_f$`, it holds that
`$$ \langle \mathbb{x}, \mathbb{y}_f \rangle = 0 \iff \langle \mathbb{x'}, \mathbb{y}_f \rangle = 0.$$`
}
### Properties
For any attribute `$a \in \mathbb{A}$` and encoding `$\mathbb{x}_a \in X_a$`, and any predicate `$f \in \mathbb{F}$` and encoding `$\mathbb{y}_f \in Y_f$`, the following properties hold:
1. `$f(a) \iff \langle \mathbb{x}_a, \mathbb{y}_f \rangle = 0$`
2. For any `$r \in \mathbb{Z}_q$`, we have `$r \cdot \mathbb{x}_a \in X_a$` and `$r \cdot \mathbb{y}_f \in Y_f$`.
3. `$(X, +)$` is an additive group which has `$(X_a, +)$` as a subgroup.
4. For any `$a' \in \mathbb{A}$` and `$\mathbb{x}_{a'} \in X_{a'}$`. it holds that
`$$ \mathbb{x}_a + \mathbb{x}_{a'} \equiv \mathbb{x}_{a \odot a'} $$`
where `$\mathbb{x}_{a \odot a'} \in X_{a \odot a'}$` is an encoding of `$a \odot a'$` where `$(\mathbb{A}, \odot)$` is a semilattice.
Note that property 4 is of relevance only to homomorphic schemes - if this is of interest, see !cite{ClearM17} and my [previous post](https://scss.tcd.ie/~mclear/blog?post=ghe).
## Matching Words over a "Large" Alphabet
I'm not sure if the encoding below has already been described somewhere else; I have personally not come across it. It is quite simple and natural, and may be useful.
Let `$\Sigma$` be an alphabet of size `$m = |\Sigma|$`. For the encoding we present below, we allow `$m$` to be superpolynomial in `$\lambda$`. Consider attributes that are words in `$\Sigma^\ast$` of length at most `$\ell$`. Therefore, we define the set of attributes as
`$$\mathbb{A} := \{w \in \Sigma^\ast : |w| \leq \ell\}.$$`
A predicate is associated with a *pattern* which consists of a mix of symbols from `$\Sigma$` and wildcards or "don't cares", signified with the special symbol `$\ast$`. At some position `$i$`, a symbol from `$\Sigma$` in the pattern at that position matches only that exact symbol in the attribute word occurring at that position whereas if the wildcard symbol appears at position `$i$` in the pattern, it matches any symbol in the attribute word occurring at that position. Formally, the set of predicates is thus defined
`$$ \mathbb{F} := \{\mathsf{match}_p : p \in (\Sigma \cup \{\ast\})^\ell\}$$` where the predicate `$\mathsf{match}_p$` is defined as follows
`$$\mathsf{match}_p(w) = \bigwedge_{i \in [\ell]} (p_i = \ast) \lor (i \leq |w| \land p_i = w_i).$$`
Now we give the vector encoding scheme for such a pair `$(\mathbb{F}, \mathbb{A})$`.
- `$\mathbb{\mathsf{Setup}(1^\lambda)}$`: On input a security parameter `$\lambda$`, set `$n \gets \ell + 1$` and choose the smallest prime `$q > \mathsf{max}(m, 2^\lambda)$`. Choose an efficiently computable injective map `$\psi : \Sigma \to \mathbb{Z}^\ast_q$`. Output `$\mathsf{params} := (n, q, \psi)$`.
- `$\mathbb{\mathsf{EncodeA}(w)}$`: On input an attribute `$w \in \mathbb{A}$`, compute `$t_i \gets \psi(w_i) \in \mathbb{Z}_q^\ast$` for all `$i \in [|w|]$`, then set the vector `$\mathbf{x} \gets (t_1, \ldots, t_{|w|}, r_1, \ldots, r_{\ell - |w|}, 1) \in \mathbb{Z}_q^n$` where `$r_j \stackrel{\$}{\gets} \mathbb{Z}_q$` for every `$j \in [\ell - |w|]$`. Output encoding `$\mathbf{x} \in \mathbb{Z}_q^n$`.
- `$\mathbb{\mathsf{EncodeP}(\mathsf{match}_p)}$`: On input a predicate `$\mathsf{match}_p \in \mathbb{F}$` where `$p \in (\Sigma \cup \{\ast\})^\ell$`. Set `$k \gets 0$`. For every `$i \in [\ell]$`, if `$p_i = \ast$` then set `$y_i \gets 0$`; else set `$y_i \gets \psi(p_i)^{-1} \in \mathbb{Z}_q^\ast$` and set `$k \gets k + 1$`. Output encoding `$\mathbf{y} := (y_1, \ldots, y_\ell, -k)$`.
Note that in the above encoding for predicates, the statement `$y_i \gets \psi(p_i)^{-1}$` means that we are setting `$y_i$` to the inverse element in `$\mathbb{Z}_q$` of the element `$\psi(p_i)$`; in particular, it does not refer to inverting the map `$\psi$` (which we would alternately write as `$\psi^{-1}$`). Furthermore, the variable `$k$` signifies the number of non-wildcard symbols in the pattern `$p$`.
So that's it for now. I dont' have much else to say about vector encodings for the moment.