Autore: Luigi Cinque
Computational Efficiency
Among the many areas of expertise within our research laboratory, we also focused on the efficient implementation of neural networks and their training using advanced compiler techniques. Our work includes optimizing CPU and GPU kernels to improve performance, reduce training time, and increase the scalability of machine learning models. Leveraging principles of systems programming, we…
ReViT: Enhancing vision transformers with residual attention
Vision Transformer (ViT) self-attention mechanism is characterized by feature collapse in deeper layers, resulting in the vanishing of low-level visual features. However, such features can be helpful to accurately represent and identify elements within an image and increase the accuracy and robustness of vision-based recognition systems. Following this rationale, we propose a novel residual attention…
S-GEAR: Semantically Guided Representation Learning for Action Anticipation (ECCV2024)
Action anticipation is forecasting future activity from a partially observed sequence of events. However, this task is exposed to intrinsic future uncertainty and the difficulty of reasoning upon interconnected actions. Unlike previous works that focus on extrapolating better visual and temporal information, we concentrate on learning action representations that are aware of their semantic interconnectivity…
Representation Learning and Multimodal Alignment
Representation learning and multimodal alignment are two pivotal concepts at the heart of advancing AI’s ability to understand and interact with the world in a more comprehensive and human-like manner. Representation learning focuses on developing techniques that allow AI models to automatically discover and learn meaningful representations of data, enabling them to capture the underlying…
Video Understanding
Video understanding stands as a critical pillar of AI research, encompassing the multifaceted challenge of interpreting the rich visual and temporal information embedded in videos. Within this field, two key branches emerge: egocentric video understanding and exocentric video understanding, each posing unique challenges and opportunities. Egocentric video understanding involves interpreting footage captured from a first-person…
Flagship 6 – Rome Technopole
Artificial intelligence, virtual reality and digital twin for advanced engineering and aerospace Lead industry: Thales Alenia Space S.p.A.Universities and EPR: Sapienza Università di Roma, Università di Roma Tor Vergata, Università degli Studi Roma Tre, Università degli Studi di Cassino e del Lazio Meridionale, Università degli Studi della Tuscia, Università LUISS, Università Campus Bio-Medico di Roma, CNR…