I was a part of the team sent in by the University of Cambridge to take part in RMRC 2018, specifically focussing on the computer vision tasks.

One of the more interesting tasks was of doing hazard sign detection. A 2x2 grid of 4 different hazard signs would be provided (out of a total possible 12), and the robot had to navigate to this grid and identify the 4 signs in the correct location.

The first method I experimented with was simple template matching. While this gave excellent results in ideal scenarios, any form of transformation resulted in greatly reduced performance. It was found that the method was especially sensitive to any form of shearing. Hence, I concluded that this was not robust enough, as there was no guarantee that the robot would be able to extract a perfecly aligned image.

I then experimented with SIFT. I was pretty excited to try this out because we had learned about SIFT in one of the modules I had chosen to take in my 2nd year, and this would be a great chance to apply that. Unfortunately, the SIFT descriptors created were not distinguishing enough. This made sense, because many of the signs had similar features (like flames). SIFT descriptors also neglected the colours present in the signs, which were actually one of the most defining characteristics.

I also tried using the popular You Only Look Once (YOLO) algorithm, along with its lighter tinyYOLO version. This was actually a great technique, and would have given great results. However, at this point in the year, I was approaching my final Tripos exams. I did not really have the time to do the necessary prepatory work (YOLO required manual labelling and localisation of all the images, which would be time consuming for 12 classes).

I finally settled on using a simple 12 class Neural Network on TensorFlow, trained with roughly 170 images for each class (obtained using a simple OpenCV webcam script I had written previously). The grid would be extracted, and each of the 4 images would be sliced out. The NN would then run inference on each image, combining the output labels into a 2x2 matrix. Here is a small snippet from the final code that did the slicing and results:

if __name__ == "__main__":
  query_path=str(sys.argv[1])
  img=cv2.imread(query_path)
  rows, cols, channels = img.shape
  row4=int(rows/2)
  col4=int(cols/2)
  label_matrix=[]

  for i in range(2):
      for j in range(2):
          new = img[i*row4:(i+1)*row4, j*col4:(j+1)*col4]
          cv2.imwrite("single"+str(i)+str(j)+".jpg", new)
          label=predict("single"+str(i)+str(j)+".jpg")
          label_matrix.append(label)
          
  label_matrix=np.array(label_matrix)
  label_matrix=label_matrix.reshape((2,2))
  print (label_matrix)

Overall, the RMRC experience was really fun and enriching. Aside from all the vision tasks, I also had the chance to help out with the mechanical and electrical components. I particularly enjoyed building the channel for the robot camera to communicate with the workstation using a nifty little module that I found named Motion. I look forward to working on more robotics projects in 2019. We have started work for the Rescue Robot League at RoboCup 2019, and I am looking into autonomous mapping, localisation and navigation (would be great if I could build something lightweight, perhaps in Python, that avoids bulky middleware like ROS).