Veronika Solopova, Lucas Schmidt, Vera Schmitt, Dorothea Kolossa - In International Symposium on Intelligent Data Analysis
As the quality of deepfakes increasingly matches authentic video, reliable detection requires automated video classifiers. Yet these models often operate as black boxes, making it hard to assess their trustworthiness in high-stakes security and forensic settings. We introduce VIBA (Video Information Bottleneck Attribution), an explainable video classification method that extends Information Bottleneck Attribution (IBA) to jointly capture spatial and temporal dependencies. VIBA is post-hoc and model-agnostic, producing relevance and optical flow maps that reveal manipulated regions and motion inconsistencies across frames. We apply VIBA to deepfake detection with three architectures Xception for spatial features, a VGG11-based optical flow model for motion dynamics, and a CViT visual transformer for long-range temporal reasoning. Across models, VIBA yields more temporally stable and spatially precise explanations than Grad-CAM, and aligns more closely with expert human annotations. By making deepfake detector outputs easier to analyse and interpret, VIBA supports secure, transparent deployment of video analysis systems in digital forensics and misinformation monitoring.