---
title: "Large Language Models"
slug: "large-language-models"
discipline: "Computer Science / AI"
description: "LLM research and development. Transformer architectures, training methods, alignment, reasoning capabilities, multimodal models, and AI safety."
icon: "🤖"
url: "https://science-database.com/technology/large-language-models"
api: "https://science-database.com/api/v1/technology/large-language-models"
llms_txt: "https://science-database.com/technology/large-language-models/llms.txt"
articles_indexed: 15
last_updated: "2026-04-11T06:42:40.480Z"
search_terms:
  - "large language model transformer"
  - "LLM alignment safety RLHF"
  - "multimodal AI foundation model"
source: "science-database.com"
license: "metadata CC0, abstracts belong to respective publishers"
---

# Large Language Models

LLM research and development. Transformer architectures, training methods, alignment, reasoning capabilities, multimodal models, and AI safety.

**Discipline:** Computer Science / AI  
**Indexed Papers:** 15  
**Last Updated:** 2026-04-11

## Top Publications

Ranked by citation impact across Semantic Scholar, OpenAlex & arXiv.

### Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

- **Authors:** Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo
- **Journal:** 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- **Published:** 2021-10-01
- **DOI:** [10.1109/iccv48922.2021.00986](https://doi.org/10.1109/iccv48922.2021.00986)
- **Citations:** 28,719
- **Source:** OpenAlex
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W3138516171/llms.txt)

> This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text. To address these differ...

### Restormer: Efficient Transformer for High-Resolution Image Restoration

- **Authors:** Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming–Hsuan Yang
- **Journal:** 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- **Published:** 2022-06-01
- **DOI:** [10.1109/cvpr52688.2022.00564](https://doi.org/10.1109/cvpr52688.2022.00564)
- **Citations:** 3,239
- **Source:** OpenAlex
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W4225672218/llms.txt)

> Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks. Recently, another class of neural architectures, Transformers, have shown significant performance gains on natural language and high-level vision tasks. While the Transformer model mitigates the shor...

### PaLM: Scaling Language Modeling with Pathways

- **Authors:** Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek S. Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James T. Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier García, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, D. Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Érica Rodrigues Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Díaz, Orhan Fırat, Michele Catasta, Jason Lee, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel
- **Journal:** arXiv (Cornell University)
- **Published:** 2022-04-05
- **DOI:** [10.48550/arxiv.2204.02311](https://doi.org/10.48550/arxiv.2204.02311)
- **Citations:** 2,124
- **Source:** OpenAlex
- **Access:** Open Access
- **PDF:** [Download](https://arxiv.org/pdf/2204.02311)
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W4224308101/llms.txt)

> Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Tran...

### Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

- **Authors:** Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J. Fleet, Mohammad Norouzi
- **Journal:** arXiv (Cornell University)
- **Published:** 2022-05-23
- **DOI:** [10.48550/arxiv.2205.11487](https://doi.org/10.48550/arxiv.2205.11487)
- **Citations:** 2,103
- **Source:** OpenAlex
- **Access:** Open Access
- **PDF:** [Download](https://arxiv.org/pdf/2205.11487)
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W4281485151/llms.txt)

> We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only c...

### TinyBERT: Distilling BERT for Natural Language Understanding

- **Authors:** Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Dong Chen, Linlin Li, Fang Wang, Qun Liu
- **Published:** 2020-01-01
- **DOI:** [10.18653/v1/2020.findings-emnlp.372](https://doi.org/10.18653/v1/2020.findings-emnlp.372)
- **Citations:** 1,590
- **Source:** OpenAlex
- **Access:** Open Access
- **PDF:** [Download](https://www.aclweb.org/anthology/2020.findings-emnlp.372.pdf)
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W3105966348/llms.txt)

> Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently execute them on resourcerestricted devices. To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer disti...

### The great Transformer: Examining the role of large language models in the political economy of AI

- **Authors:** Dieuwertje Luitse, Wiebke Denkena
- **Journal:** Big Data & Society
- **Published:** 2021-07-01
- **DOI:** [10.1177/20539517211047734](https://doi.org/10.1177/20539517211047734)
- **Citations:** 164
- **Source:** OpenAlex
- **Access:** Open Access
- **PDF:** [Download](https://journals.sagepub.com/doi/pdf/10.1177/20539517211047734)
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W3202773593/llms.txt)

> In recent years, AI research has become more and more computationally demanding. In natural language processing (NLP), this tendency is reflected in the emergence of large language models (LLMs) like GPT-3. These powerful neural network-based models can be used for a range of NLP tasks and their language generation capacities have become so sophisticated that it can be very difficult to distinguis...

### Transformers and large language models in healthcare: A review

- **Authors:** Subhash Nerella, Sabyasachi Bandyopadhyay, Jiaqing Zhang, Miguel Á. Contreras, Scott Siegel, Aysegül Bumin, Brandon Silva, Jessica Sena, Benjamin Shickel, Azra Bihorac, Kia Khezeli, Parisa Rashidi
- **Journal:** Artificial Intelligence in Medicine
- **Published:** 2024-06-05
- **DOI:** [10.1016/j.artmed.2024.102900](https://doi.org/10.1016/j.artmed.2024.102900)
- **Citations:** 114
- **Source:** OpenAlex
- **Access:** Open Access
- **PDF:** [Download](https://www.ncbi.nlm.nih.gov/pmc/articles/11638972)
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W4399367209/llms.txt)

### Retentive Network: A Successor to Transformer for Large Language Models

- **Authors:** Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei
- **Journal:** arXiv (Cornell University)
- **Published:** 2023-07-17
- **DOI:** [10.48550/arxiv.2307.08621](https://doi.org/10.48550/arxiv.2307.08621)
- **Citations:** 107
- **Source:** OpenAlex
- **Access:** Open Access
- **PDF:** [Download](https://arxiv.org/pdf/2307.08621)
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W4384648484/llms.txt)

> In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurre...

### Transformers and large language models for efficient intrusion detection systems: A comprehensive survey

- **Authors:** Hamza Kheddar
- **Journal:** Information Fusion
- **Published:** 2025-05-29
- **DOI:** [10.1016/j.inffus.2025.103347](https://doi.org/10.1016/j.inffus.2025.103347)
- **Citations:** 85
- **Source:** OpenAlex
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W4410857897/llms.txt)

### Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?

- **Authors:** Byung-Doh Oh, William Schuler
- **Journal:** Transactions of the Association for Computational Linguistics
- **Published:** 2023-01-01
- **DOI:** [10.1162/tacl_a_00548](https://doi.org/10.1162/tacl_a_00548)
- **Citations:** 82
- **Source:** OpenAlex
- **Access:** Open Access
- **PDF:** [Download](https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00548/2075940/tacl_a_00548.pdf)
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W4361766487/llms.txt)

> Abstract This work presents a linguistic analysis into why larger Transformer-based pre-trained language models with more parameters and lower perplexity nonetheless yield surprisal estimates that are less predictive of human reading times. First, regression analyses show a strictly monotonic, positive log-linear relationship between perplexity and fit to reading times for the more recently releas...

### BioM-Transformers: Building Large Biomedical Language Models with BERT, ALBERT and ELECTRA

- **Authors:** Sultan Alrowili, Vijay Shanker
- **Published:** 2021-01-01
- **DOI:** [10.18653/v1/2021.bionlp-1.24](https://doi.org/10.18653/v1/2021.bionlp-1.24)
- **Citations:** 68
- **Source:** OpenAlex
- **Access:** Open Access
- **PDF:** [Download](https://aclanthology.org/2021.bionlp-1.24.pdf)
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W3166204619/llms.txt)

> The impact of design choices on the performance of biomedical language models recently has been a subject for investigation. In this paper, we empirically study biomedical domain adaptation with large transformer models using different design choices. We evaluate the performance of our pretrained models against other existing biomedical language models in the literature. Our results show that we a...

### Prompt text classifications with transformer models! An exemplary introduction to prompt-based learning with large language models

- **Authors:** Christian Mayer, Sabrina Ludwig, Steffen Brandt
- **Journal:** Journal of Research on Technology in Education
- **Published:** 2022-11-22
- **DOI:** [10.1080/15391523.2022.2142872](https://doi.org/10.1080/15391523.2022.2142872)
- **Citations:** 51
- **Source:** OpenAlex
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W4309685935/llms.txt)

> This study investigates the potential of automated classification using prompt-based learning approaches with transformer models (large language models trained in an unsupervised manner) for a domain-specific classification task. Prompt-based learning with zero or few shots has the potential to (1) make use of artificial intelligence without sophisticated programming skills and (2) make use of art...

### An LPDDR-based CXL-PNM Platform for TCO-efficient Inference of Transformer-based Large Language Models

- **Authors:** Sangsoo Park, Kyung-Soo Kim, Jinin So, Jin Chul Jung, Jong-Geon Lee, Kyoungwan Woo, Nayeon Kim, Younghyun Lee, Hyungyo Kim, Yongsuk Kwon, Jinhyun Kim, Jieun Lee, Yeongon Cho, Yong-Min Tai, Jeong‐Hyeon Cho, Hoyoung Song, Jung Ho Ahn, Nam Sung Kim
- **Published:** 2024-03-02
- **DOI:** [10.1109/hpca57654.2024.00078](https://doi.org/10.1109/hpca57654.2024.00078)
- **Citations:** 50
- **Source:** OpenAlex
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W4393407316/llms.txt)

> Transformer-based large language models (LLMs) such as Generative Pre-trained Transformer (GPT) have become popular due to their remarkable performance across diverse applications, including text generation and translation. For LLM training and inference, the GPU has been the predominant accelerator with its pervasive software development ecosystem and powerful computing capability. However, as th...

### Self-Attention and Transformers: Driving the Evolution of Large Language Models

- **Authors:** Qing Luo, Wei Zeng, Manni Chen, Gang‐Ding Peng, Xiaofeng Yuan, Qiang Yin
- **Published:** 2023-07-21
- **DOI:** [10.1109/iceict57916.2023.10245906](https://doi.org/10.1109/iceict57916.2023.10245906)
- **Citations:** 36
- **Source:** OpenAlex
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W4386921320/llms.txt)

> Transformers, originally introduced for machine translation, and built upon the Self-Attention mechanism, have undergone a remarkable evolution, establishing themselves as the bedrock of large language models (LLMs). Their unparalleled capacity to model intricate relationships and capture extensive dependencies within sequences has propelled their prominence. This article, presented in a popular s...

### To Transformers and Beyond: Large Language Models for the Genome

- **Authors:** Micaela Elisa Consens, C Dufault, Michael Wainberg, Duncan Forster, Mehran Karimzadeh, Hani Goodarzi, Fabian J. Theis, Alan M Moses, Bo Wang
- **Journal:** arXiv (Cornell University)
- **Published:** 2023-11-13
- **DOI:** [10.48550/arxiv.2311.07621](https://doi.org/10.48550/arxiv.2311.07621)
- **Citations:** 31
- **Source:** OpenAlex
- **Access:** Open Access
- **PDF:** [Download](https://arxiv.org/pdf/2311.07621)
- **llms.txt:** [View](https://science-database.com/technology/large-language-models/paper/oa-W4388717695/llms.txt)

> In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool for tackling complex computational challenges. This review focuses on the transformative role of Large Language Models (LLMs), which are mostly based on the transformer architecture, in genomics. Building on the foundation of traditional convolutional neural networks and recurrent neural networks, we explore ...

---

*Generated by [science-database.com](https://science-database.com) — The Knowledge Interface*  
*Full data available via [JSON API](https://science-database.com/api/v1/technology/large-language-models)*