I’ve been trying to port an mxnet yolo model that I’ve trained and have had little success.
It would appear that darknyet yolo orders the conv_pred output channels for each cell: [x,y,w,h, obj-score, [cats]]
Whereas mxnet yolo formats the channels as: [[cats], obj-score, x, y, w, h].
I’ve thus reordered the weights and biases in the conv_pred layer to match the expected output format, however the output is not as expected.
From looking at the source code in topi for yolo_region I see that the layer is only performing the activation functions for the region layer, thus to debug I look at what I expect to be the activated obj-score feature map. When I run my model in mxnet I have an example image with a detection with obj-score 0.16. However if I check the obj-score feature maps out of tvm and I find none with a similar obj-score.
Does my approach sound correct? Does anyone have ideas where else to look into?
Next I plan on looking at an unmodified conv_pred output to see if the results are the same as from mxnet.