C++ and Java implementations of fastpfor + varbyte unable to decode each others output

I've been investigating using the FastPFor algorithm for integer columns in Druid, following up on the discussion and experimentation mentioned [in this issue](https://github.com/apache/incubator-druid/issues/4080). Initially, I was solely using your [Java port of this library](https://github.com/lemire/JavaFastPFOR), and it's the basis of a [PR I currently have open](https://github.com/apache/incubator-druid/pull/6016). However, curiosity got the better of me and I wanted to compare performance with the c++ implementation, sooo, I wrote a JNI wrapper to allow me to call this library from java, [resulting in this branch](https://github.com/clintropolis/druid/compare/shapeshift...clintropolis:shapeshift-native) which is a sort of alternate implementation of the PR. That's the short of how I encountered the issue mentioned in the title - seeing what happened when I fed one into the other. I have not had a chance to dig in to determine if the issue is in fact in this library or the other one, but it can be replicated by modifying the example program to write encoded data to a file, then loading that into a bytebuffer in java and attempting to decode. 

```
#include <iostream>
#include <fstream>
#include <cstdlib>
#include <ctime>

#include "codecfactory.h"
using namespace std;
using namespace FastPForLib;

int main()
{
  IntegerCODEC &codec = *CODECFactory::getFromName("simdfastpfor256");
  size_t N = 10000;
  std::vector<uint32_t> mydata(N);

  srand (time(NULL));

  for (uint32_t i = 0; i < N; i++)
    mydata[i] = rand() % 100;

  std::vector<uint32_t> compressed_output(N + 1024);
  size_t compressedsize = compressed_output.size();
  codec.encodeArray(mydata.data(), mydata.size(), compressed_output.data(),
                    compressedsize);
  ofstream out("numbers.bin", ios::out | ios::binary);
  if(!out) {
    cout << "Cannot open file.";
    return 1;
   }

  out.write((char *)compressed_output.data(), N * sizeof(uint32_t));

  out.close();

  cout << "length " << compressedsize << "\n";
  return 0;
}
```

and then adjust the `encodedSize` variable to match the output of the c++ program in the following java

```
import com.google.common.io.Files;
import me.lemire.integercompression.FastPFOR;
import me.lemire.integercompression.IntWrapper;
import me.lemire.integercompression.SkippableComposition;
import me.lemire.integercompression.SkippableIntegerCODEC;
import me.lemire.integercompression.VariableByte;

...
    int buffSize = 1 << 16;

    int numValues = 10000;
    int maxNumValues = buffSize >> 2;

    SkippableIntegerCODEC codec = new SkippableComposition(new FastPFOR(), new VariableByte());
    int encodedSize = 2214;

    ByteBuffer encodedValues = Files.map(new File("numbers.bin"));

    int[] valueArray = new int[numValues];
    int[] encodedValueArray = new int[encodedSize];

    // copy encoded buffer to int array
    for (int i = 0; i < encodedSize; i++) {
      encodedValueArray[i] = encodedValues.getInt();
    }

    // decode with java

    IntWrapper outPos = new IntWrapper(0);
    codec.headlessUncompress(encodedValueArray, new IntWrapper(0), encodedSize, valueArray, outPos, numValues);
    // explodes before we get here
    assert (numValues == outPos.get());
```

The exception in this case was 
```
java.lang.ArrayIndexOutOfBoundsException: 2555904
	at me.lemire.integercompression.FastPFOR.decodePage(FastPFOR.java:239)
	at me.lemire.integercompression.FastPFOR.headlessUncompress(FastPFOR.java:229)
	at me.lemire.integercompression.SkippableComposition.headlessUncompress(SkippableComposition.java:55)
```

but during experimentation I recall seeing exceptions coming from the `VariableByte` implementation as well, so the exact error may be dependent on length and composition of the input. I don't have one of these exceptions handy unfortunately, I will attempt to trigger it again and update the ticket. The JNI wrapper version can correctly read and decode the file data generated by the example program.

Of potential interest: the size of output varies slightly between the implementations, with c++ being a handful of int32 sized values larger. The full set of compatibility tests I threw together can be found [here](https://github.com/clintropolis/druid/blob/61a859f845899018eb1bd4292556639eb85b4da7/processing-native/src/test/java/io/druid/processing/codecs/NativeFastPForJavaFastPForCompatibilityTest.java) with the [c++ snippet above, here](https://github.com/clintropolis/druid/blob/61a859f845899018eb1bd4292556639eb85b4da7/processing-native/src/test/native/nativeWriteEncodedFile.cpp). These tests that pass for me are java -> java, jni -> jni, external native encoded file -> jni, and the failing tests are jni -> java, java -> jni, external native encoded file -> java.

Another thing, which might be potentially related and indicative of a bug here (or a bug somewhere in my code), I experience an `assert` failure at the end of decodeArray function when using simd versions of fastpfor that only popped up whenever I would attempt to decode arbitrary offsets of a bytebuffer (from a memory mapped file). I did not experience this issue during initial testing of my JNI wrapper which was using newly allocated buffers populated with data. I explored briefly; commenting out the `assert` statement in the header and rebuilding the library resulted in the output decoded without exploding, but the final value would be incorrect when I went to validate the output. Since I experienced this issue only when decoding the mapped buffer where an encoded chunk could be located at arbitrary offsets, I speculated it was an issue with alignment, and indeed [copying the values to a 16 byte aligned chunk has caused the assert failure to not appear again](https://github.com/clintropolis/druid/compare/shapeshift...clintropolis:shapeshift-native#diff-fca1f30408bb73133dd4acbd69dd7b48R95). That said, I haven't had the opportunity to try to replicate this behavior purely in c++ yet, so it is possible this issue is one of my own rather than this library, but wanted to throw it out there in the event it helps isolate why the java and c++ outputs are incompatible.

In general I've tested my branches with both my mac os laptop with
```
Apple LLVM version 10.0.0 (clang-1000.10.25.5)
g++-8 (Homebrew GCC 8.2.0) 8.2.0
```
and ubuntu linux with
```
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
```
but will double check that both issues happen on linux as soon as possible, and will attempt to dig into this deeper in general whenever I get the chance.

I'm quite excited to get this stuff worked into Druid, and using this version of the library is currently my preference since it's benchmarking faster and makes the implementation of the java part of my PR a bit more straightforward, but I think it's probably important to have the ability to fallback to a java only implementation to make the decision a little less permanent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++ and Java implementations of fastpfor + varbyte unable to decode each others output #42

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

C++ and Java implementations of fastpfor + varbyte unable to decode each others output #42

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions