# Computer Vision — science-database.com > Visual AI and image understanding. Object detection, segmentation, 3D vision, video understanding, visual transformers, and multimodal vision-language models. - Discipline: Computer Science / AI - URL: https://science-database.com/technology/computer-vision - API: https://science-database.com/api/v1/technology/computer-vision - Last Updated: 2026-04-11T08:29:55.461Z - Articles Indexed: 15 ## Top Publications ### Swin Transformer: Hierarchical Vision Transformer using Shifted Windows - Authors: Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo - Journal: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) - Date: 2021-10-01 - DOI: https://doi.org/10.1109/iccv48922.2021.00986 - Citations: 28719 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W3138516171/llms.txt - Abstract: This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual ent... ### A ConvNet for the 2020s - Authors: Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie - Journal: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - Date: 2022-06-01 - DOI: https://doi.org/10.1109/cvpr52688.2022.01167 - Citations: 6598 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W4312443924/llms.txt - Abstract: The “Roaring 20s” of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object d... ### Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions - Authors: Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lü, Ping Luo, Ling Shao - Journal: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) - Date: 2021-10-01 - DOI: https://doi.org/10.1109/iccv48922.2021.00061 - Citations: 4540 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W3131500599/llms.txt - Abstract: Although convolutional neural networks (CNNs) have achieved great success in computer vision, this work investigates a simpler, convolution-free backbone network use-fid for many dense prediction tasks. Unlike the recently-proposed Vision Transformer (ViT) that was designed for image classification ... ### PVT v2: Improved baselines with pyramid vision transformer - Authors: Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lü, Ping Luo, Ling Shao - Journal: Computational Visual Media - Date: 2022-03-16 - DOI: https://doi.org/10.1007/s41095-022-0274-8 - Citations: 2057 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W3175515048/llms.txt - Abstract: Transformers have recently lead to encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs: (i) a linear complexity attention layer, (ii) an overlapping patch embedding, and (iii) a convolut... ### Learning RoI Transformer for Oriented Object Detection in Aerial Images - Authors: Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, Qikai Lu - Date: 2019-06-01 - DOI: https://doi.org/10.1109/cvpr.2019.00296 - Citations: 1464 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W2964979676/llms.txt - Abstract: Object detection in aerial images is an active yet challenging task in computer vision because of the bird’s-eye view perspective, the highly complex backgrounds, and the variant appearances of objects. Especially when detecting densely packed objects in aerial images, methods relying on horizontal ... ### UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery - Authors: Libo Wang, Rui Li, Ce Zhang, Shenghui Fang, Chenxi Duan, Xiaoliang Meng, Peter M. Atkinson - Journal: ISPRS Journal of Photogrammetry and Remote Sensing - Date: 2022-06-24 - DOI: https://doi.org/10.1016/j.isprsjprs.2022.06.008 - Citations: 1059 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W4283450732/llms.txt ### Transformers in medical imaging: A survey - Authors: Fahad Shamshad, Salman Khan, Syed Waqas Zamir, Muhammad Haris Khan, Munawar Hayat, Fahad Shahbaz Khan, Huazhu Fu - Journal: Medical Image Analysis - Date: 2023-04-05 - DOI: https://doi.org/10.1016/j.media.2023.102802 - Citations: 1050 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W4362603432/llms.txt ### BiFormer: Vision Transformer with Bi-Level Routing Attention - Authors: Lei Zhu, Xinjiang Wang, Zhanghan Ke, Wei Zhang, Rynson W. H. Lau - Date: 2023-06-01 - DOI: https://doi.org/10.1109/cvpr52729.2023.00995 - Citations: 985 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W4386075524/llms.txt - Abstract: As the core building block of vision transformers, attention is a powerful tool to capture long-range dependency. However, such power comes at a cost: it incurs a huge computation burden and heavy memory footprint as pairwise token interaction across all spatial locations is computed. A series of wo... ### Transformers in Time Series: A Survey - Authors: Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi Yan, Liang Sun - Date: 2023-08-01 - DOI: https://doi.org/10.24963/ijcai.2023/759 - Citations: 942 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W4385763767/llms.txt - Abstract: Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interest in the time series community. Among multiple advantages of Transformers, the ability to capture long-range dependencies and interactions is especiall... ### Visual attention network - Authors: Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming‐Ming Cheng, Shi‐Min Hu - Journal: Computational Visual Media - Date: 2023-07-28 - DOI: https://doi.org/10.1007/s41095-023-0364-2 - Citations: 921 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W4385346076/llms.txt - Abstract: While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three challenges for applying self-attention in computer vision: (1) treating images as 1D sequences neglec... ### 3D Human Pose Estimation with Spatial and Temporal Transformers - Authors: Ce Zheng, Sijie Zhu, Matías Mendieta, Taojiannan Yang, Chen Chen, Zhengming Ding - Journal: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) - Date: 2021-10-01 - DOI: https://doi.org/10.1109/iccv48922.2021.01145 - Citations: 624 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W3136525061/llms.txt - Abstract: Transformer architectures have become the model of choice in natural language processing and are now being introduced into computer vision tasks such as image classification, object detection, and semantic segmentation. However, in the field of human pose estimation, convolutional architectures stil... ### Transformers in medical image analysis - Authors: Kelei He, Gan Chen, Zhuoyuan Li, Islem Rekik, Zihao Yin, Ji Wen, Yang Gao, Qian Wang, Junfeng Zhang, Dinggang Shen - Journal: Intelligent Medicine - Date: 2022-08-24 - DOI: https://doi.org/10.1016/j.imed.2022.07.002 - Citations: 422 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W4293163051/llms.txt - Abstract: Transformers have dominated the field of natural language processing and have recently made an impact in the area of computer vision. In the field of medical image analysis, transformers have also been successfully used in to full-stack clinical applications, including image synthesis/reconstruction... ### Swin Transformer: Hierarchical Vision Transformer using Shifted Windows - Authors: Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo - Journal: arXiv (Cornell University) - Date: 2021-03-25 - DOI: https://doi.org/10.48550/arxiv.2103.14030 - Citations: 373 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W3202406646/llms.txt - Abstract: This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual ent... ### Computer Vision Based Transfer Learning-Aided Transformer Model for Fall Detection and Prediction - Authors: Sheldon Mccall, Shina Samuel Kolawole, Afreen Naz, Liyun Gong, Syed Waqar Ahmed, Pandey Shourya Prasad, Miao Yu, James Wingate, Saeid Pourroostaei Ardakani - Journal: IEEE Access - Date: 2024-01-01 - DOI: https://doi.org/10.1109/access.2024.3368065 - Citations: 22 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W4391953475/llms.txt - Abstract: Falls bring about significant risks to individuals’ well-being and independence, prompting widespread public health concerns. Swift detection and even predicting the risk of falls are crucial for implementing effective measures to alleviate the adverse consequences associated with such incidents. Th... ### A Computer Vision Enabled damage detection model with improved YOLOv5 based on Transformer Prediction Head - Authors: Arunabha M. Roy, Jayabrata Bhaduri - Journal: arXiv (Cornell University) - Date: 2023-03-07 - DOI: https://doi.org/10.48550/arxiv.2303.04275 - Citations: 22 - Source: OpenAlex - llms.txt: https://science-database.com/technology/computer-vision/paper/oa-W4323706312/llms.txt - Abstract: Objective:Computer vision-based up-to-date accurate damage classification and localization are of decisive importance for infrastructure monitoring, safety, and the serviceability of civil infrastructure. Current state-of-the-art deep learning (DL)-based damage detection models, however, often lack ... --- Generated by science-database.com — The Knowledge Interface Full data available at: https://science-database.com/api/v1/technology/computer-vision