Autore: Luigi Cinque
ReViT: Enhancing vision transformers with residual attention
Vision Transformer (ViT) self-attention mechanism is characterized by feature collapse in deeper layers, resulting in the vanishing of low-level visual features. However, such features can be helpful to accurately represent and identify elements within an image and increase the accuracy and robustness of vision-based recognition systems. Following this rationale, we propose a novel residual attention…
S-GEAR: Semantically Guided Representation Learning for Action Anticipation (ECCV2024)
Action anticipation is forecasting future activity from a partially observed sequence of events. However, this task is exposed to intrinsic future uncertainty and the difficulty of reasoning upon interconnected actions. Unlike previous works that focus on extrapolating better visual and temporal information, we concentrate on learning action representations that are aware of their semantic interconnectivity…
Representation Learning and Multimodal Alignment
Representation learning and multimodal alignment are two pivotal concepts at the heart of advancing AI’s ability to understand and interact with the world in a more comprehensive and human-like manner. Representation learning focuses on developing techniques that allow AI models to automatically discover and learn meaningful representations of data, enabling them to capture the underlying…
Video Understanding
Video understanding stands as a critical pillar of AI research, encompassing the multifaceted challenge of interpreting the rich visual and temporal information embedded in videos. Within this field, two key branches emerge: egocentric video understanding and exocentric video understanding, each posing unique challenges and opportunities. Egocentric video understanding involves interpreting footage captured from a first-person…
Flagship 6 – Rome Technopole
Artificial intelligence, virtual reality and digital twin for advanced engineering and aerospace Lead industry: Thales Alenia Space S.p.A.Universities and EPR: Sapienza Università di Roma, Università di Roma Tor Vergata, Università degli Studi Roma Tre, Università degli Studi di Cassino e del Lazio Meridionale, Università degli Studi della Tuscia, Università LUISS, Università Campus Bio-Medico di Roma, CNR…
DISEGNO + SMARTPHONE = MUSICA
DESCRIPTION AND OBJECTIVES OF THE PROJECT The project was born from the idea of two deserving students of the Computer Vision course at the Department of Computer Science at Sapienza University of Rome. The initial development involved using a computer and a few different cameras (one color and one depth) to allow a user to…