NLP and Computer Vision research is increasingly interrelated, with common computational approaches and the increasing availability of text + image/video benchmarks. Likewise, multi-modal domains such as social media and news stories are of increasing interest for applications such as disinformation detection, attribution, and characterization. In addition to this, projects at Kitware require understanding written instructions, including natural language rules.