simd8 classify(simd8 input, simd8 previous_input) {
auto prev1 = input.prev<1>(previous_input);
auto byte_1_high = prev1.shift_right <4>().lookup_16(table1);
auto byte_1_low = (prev1 & 0x0F).lookup_16(table2);
auto byte_2_high = input.shift_right <4>().lookup_16(table3);
return (byte_1_high & byte_1_low & byte_2_high);
}
Is this coming from an existing library? If not, would it be generally possible to write something like this rather than dealing with some garbage like _mm_storeu_si128.
There's actually a comment in the paper about what it does:
> shift the input by 1 byte, shifting in the last byte of the previous input
I'm unsure that I think the naming or parameter order is very good, but being able to lift the code into something nicer than intrinsics seemed nice to me.