{"technology":{"slug":"protein-structure","name":"Protein Structure Prediction","description":"Computational protein biology. AlphaFold, protein design, structure prediction, drug discovery through molecular simulation, and de novo protein engineering.","discipline":"Biochemistry / AI","icon":"🔬"},"lastUpdated":"2026-04-11T06:58:06.186Z","articleCount":15,"articles":[{"id":"oa-W3177828909","title":"Highly accurate protein structure prediction with AlphaFold","authors":"John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon Köhl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera‐Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michał Zieliński, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian W. Bodenstein, David Silver, Oriol Vinyals, Andrew Senior, Koray Kavukcuoglu, Pushmeet Kohli, Demis Hassabis","journal":"Nature","pubDate":"2021-07-15","doi":"10.1038/s41586-021-03819-2","abstract":"Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort<sup>1-4</sup>, the structures of around 100,000 unique proteins have been determined<sup>5</sup>, but this represents a small fraction of the billions of known protein sequences<sup>6,7</sup>. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence-the structure prediction component of the 'protein folding problem'<sup>8</sup>-has been an important open research problem for more than 50 years<sup>9</sup>. Despite recent progress<sup>10-14</sup>, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14)<sup>15</sup>, demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W3177828909","citationCount":43225,"isOpenAccess":true,"pdfUrl":"https://www.nature.com/articles/s41586-021-03819-2.pdf"},{"id":"oa-W4396721167","title":"Accurate structure prediction of biomolecular interactions with AlphaFold 3","authors":"Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Lindsay Willmore, Andrew J. Ballard, Joshua Bambrick, Sebastian W. Bodenstein, David A. Evans, Chia-Chun Hung, Michael O’Neill, David Reiman, Kathryn Tunyasuvunakool, Zachary Wu, Akvilė Žemgulytė, Eirini Arvaniti, Charles Beattie, Ottavia Bertolli, Alex Bridgland, Alexey V. Cherepanov, Miles Congreve, Alexander I. Cowen-Rivers, Andrew Cowie, Michael Figurnov, Fabian B. Fuchs, Hannah Gladman, Rishub Jain, Yousuf A. Khan, Caroline M. R. Low, Kuba Perlin, Anna Potapenko, Pascal Savy, Sukhdeep Singh, Adrian Stecuła, Ashok Thillaisundaram, Catherine Tong, Sergei Yakneen, Ellen D. Zhong, Michał Zieliński, Augustin Žídek, Victor Bapst, Pushmeet Kohli, Max Jaderberg, Demis Hassabis, John Jumper","journal":"Nature","pubDate":"2024-05-08","doi":"10.1038/s41586-024-07487-w","abstract":"The introduction of AlphaFold 2<sup>1</sup> has spurred a revolution in modelling the structure of proteins and their interactions, enabling a huge range of applications in protein modelling and design<sup>2-6</sup>. Here we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues. The new AlphaFold model demonstrates substantially improved accuracy over many previous specialized tools: far greater accuracy for protein-ligand interactions compared with state-of-the-art docking tools, much higher accuracy for protein-nucleic acid interactions compared with nucleic-acid-specific predictors and substantially higher antibody-antigen prediction accuracy compared with AlphaFold-Multimer v.2.3<sup>7,8</sup>. Together, these results show that high-accuracy modelling across biomolecular space is possible within a single unified deep-learning framework.","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W4396721167","citationCount":11938,"isOpenAccess":true,"pdfUrl":"https://www.nature.com/articles/s41586-024-07487-w_reference.pdf"},{"id":"oa-W3211795435","title":"AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models","authors":"Mihály Váradi, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yu Yuan, Oana Stroe, Gemma Wood, Agata Laydon, Augustin Žídek, Tim Green, Kathryn Tunyasuvunakool, Stig Petersen, John Jumper, Ellen Clancy, Richard Green, Ankur Vora, Mira Lutfi, Michael Figurnov, Andrew Cowie, Nicole Hobbs, Pushmeet Kohli, Gerard J. Kleywegt, Ewan Birney, Demis Hassabis, Sameer Velankar","journal":"Nucleic Acids Research","pubDate":"2021-10-19","doi":"10.1093/nar/gkab1061","abstract":"Abstract The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an openly accessible, extensive database of high-accuracy protein-structure predictions. Powered by AlphaFold v2.0 of DeepMind, it has enabled an unprecedented expansion of the structural coverage of the known protein-sequence space. AlphaFold DB provides programmatic access to and interactive visualization of predicted atomic coordinates, per-residue and pairwise model-confidence estimates and predicted aligned errors. The initial release of AlphaFold DB contains over 360,000 predicted structures across 21 model-organism proteomes, which will soon be expanded to cover most of the (over 100 million) representative sequences from the UniRef90 data set.","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W3211795435","citationCount":8008,"isOpenAccess":true,"pdfUrl":"https://doi.org/10.1093/nar/gkab1061"},{"id":"oa-W3202105508","title":"Protein complex prediction with AlphaFold-Multimer","authors":"Richard Evans, M. E. O’Neill, Alexander Pritzel, Н. В. Антропова, Andrew Senior, Tim Green, Augustin Žídek, Russ Bates, Sam Blackwell, Jason Yim, Olaf Ronneberger, Sebastian W. Bodenstein, Michał Zieliński, Alex Bridgland, Anna Potapenko, Andrew Cowie, Kathryn Tunyasuvunakool, Rishub Jain, Ellen Clancy, Pushmeet Kohli, John Jumper, Demis Hassabis","journal":"bioRxiv (Cold Spring Harbor Laboratory)","pubDate":"2021-10-04","doi":"10.1101/2021.10.04.463034","abstract":"While the vast majority of well-structured single protein chains can now be predicted to high accuracy due to the recent AlphaFold [1] model, the prediction of multi-chain protein complexes remains a challenge in many cases. In this work, we demonstrate that an AlphaFold model trained specifically for multimeric inputs of known stoichiometry, which we call AlphaFold-Multimer, significantly increases accuracy of predicted multimeric interfaces over input-adapted single-chain AlphaFold while maintaining high intra-chain accuracy. On a benchmark dataset of 17 heterodimer proteins without templates (introduced in [2]) we achieve at least medium accuracy (DockQ [3] ≥ 0.49) on 13 targets and high accuracy (DockQ ≥ 0.8) on 7 targets, compared to 9 targets of at least medium accuracy and 4 of high accuracy for the previous state of the art system (an AlphaFold-based system from [2]). We also predict structures for a large dataset of 4,446 recent protein complexes, from which we score all non-redundant interfaces with low template identity. For heteromeric interfaces we successfully predict the interface (DockQ ≥ 0.23) in 70% of cases, and produce high accuracy predictions (DockQ ≥ 0.8) in 26% of cases, an improvement of +27 and +14 percentage points over the flexible linker modification of AlphaFold [4] respectively. For homomeric inter-faces we successfully predict the interface in 72% of cases, and produce high accuracy predictions in 36% of cases, an improvement of +8 and +7 percentage points respectively.","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W3202105508","citationCount":3937,"isOpenAccess":false,"pdfUrl":""},{"id":"oa-W2999044305","title":"Improved protein structure prediction using potentials from deep learning","authors":"Andrew Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green, Chongli Qin, Augustin Žídek, Alexander Nelson, Alex Bridgland, Hugo Penedones, Stig Petersen, Karen Simonyan, Steve Crossan, Pushmeet Kohli, David T. Jones, David Silver, Koray Kavukcuoglu, Demis Hassabis","journal":"Nature","pubDate":"2020-01-15","doi":"10.1038/s41586-019-1923-7","abstract":"","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W2999044305","citationCount":3451,"isOpenAccess":true,"pdfUrl":"https://discovery.ucl.ac.uk/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"},{"id":"oa-W3183475563","title":"Highly accurate protein structure prediction for the human proteome","authors":"Kathryn Tunyasuvunakool, Jonas Adler, Zachary Wu, Tim Green, Michał Zieliński, Augustin Žídek, Alex Bridgland, Andrew Cowie, Clemens Meyer, Agata Laydon, Sameer Velankar, Gerard J. Kleywegt, Alex Bateman, Richard Evans, Alexander Pritzel, Michael Figurnov, Olaf Ronneberger, Russ Bates, Simon Köhl, Anna Potapenko, Andrew J. Ballard, Bernardino Romera‐Paredes, Stanislav Nikolov, Rishub Jain, Ellen Clancy, David Reiman, Stig Petersen, Andrew Senior, Koray Kavukcuoglu, Ewan Birney, Pushmeet Kohli, John Jumper, Demis Hassabis","journal":"Nature","pubDate":"2021-07-22","doi":"10.1038/s41586-021-03828-1","abstract":"Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure<sup>1</sup>. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold<sup>2</sup>, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W3183475563","citationCount":3143,"isOpenAccess":true,"pdfUrl":"https://www.nature.com/articles/s41586-021-03828-1.pdf"},{"id":"oa-W4388464011","title":"AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences","authors":"Mihály Váradi, Damian Bertoni, Paulyna Magaña, Urmila Paramval, Ivanna Pidruchna, Malarvizhi Radhakrishnan, Maxim Tsenkov, Sreenath Nair, Milot Mirdita, Jingi Yeo, Oleg Kovalevskiy, Kathryn Tunyasuvunakool, Agata Laydon, Augustin Žídek, Hamish Tomlinson, Dhavanthi Hariharan, Josh Abrahamson, Tim Green, John Jumper, Ewan Birney, Martin Steinegger, Demis Hassabis, Sameer Velankar","journal":"Nucleic Acids Research","pubDate":"2023-11-02","doi":"10.1093/nar/gkad1011","abstract":"The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements in data archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, and a host of curated protein datasets. We detail the data access mechanisms of AlphaFold DB, from direct file access via FTP to advanced queries using Google Cloud Public Datasets and the programmatic access endpoints of the database. We also discuss the improvements and services added since its initial release, including enhancements to the Predicted Aligned Error viewer, customisation options for the 3D viewer, and improvements in the search engine of AlphaFold DB.","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W4388464011","citationCount":1749,"isOpenAccess":true,"pdfUrl":"https://academic.oup.com/nar/advance-article-pdf/doi/10.1093/nar/gkad1011/52777135/gkad1011.pdf"},{"id":"oa-W4206563428","title":"Protein structure predictions to atomic accuracy with AlphaFold","authors":"John Jumper, Demis Hassabis","journal":"Nature Methods","pubDate":"2022-01-01","doi":"10.1038/s41592-021-01362-6","abstract":"","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W4206563428","citationCount":285,"isOpenAccess":false,"pdfUrl":""},{"id":"oa-W3208408872","title":"The case for post-predictional modifications in the AlphaFold Protein Structure Database","authors":"Haroldas Bagdonas, Carl A. Fogarty, Elisa Fadda, Jon Agirre","journal":"Nature Structural & Molecular Biology","pubDate":"2021-10-29","doi":"10.1038/s41594-021-00680-9","abstract":"","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W3208408872","citationCount":84,"isOpenAccess":false,"pdfUrl":""},{"id":"oa-W4376131109","title":"De novo protein design by inversion of the <scp>AlphaFold</scp> structure prediction network","authors":"Casper A. Goverde, Benedict Wolf, Hamed Khakzad, Stéphane Rosset, Bruno E. Correia","journal":"Protein Science","pubDate":"2023-05-11","doi":"10.1002/pro.4653","abstract":"De novo protein design enhances our understanding of the principles that govern protein folding and interactions, and has the potential to revolutionize biotechnology through the engineering of novel protein functionalities. Despite recent progress in computational design strategies, de novo design of protein structures remains challenging, given the vast size of the sequence-structure space. AlphaFold2 (AF2), a state-of-the-art neural network architecture, achieved remarkable accuracy in predicting protein structures from amino acid sequences. This raises the question whether AF2 has learned the principles of protein folding sufficiently for de novo design. Here, we sought to answer this question by inverting the AF2 network, using the prediction weight set and a loss function to bias the generated sequences to adopt a target fold. Initial design trials resulted in de novo designs with an overrepresentation of hydrophobic residues on the protein surface compared to their natural protein family, requiring additional surface optimization. In silico validation of the designs showed protein structures with the correct fold, a hydrophilic surface and a densely packed hydrophobic core. In vitro validation showed that 7 out of 39 designs were folded and stable in solution with high melting temperatures. In summary, our design workflow solely based on AF2 does not seem to fully capture basic principles of de novo protein design, as observed in the protein surface's hydrophobic vs. hydrophilic patterning. However, with minimal post-design intervention, these pipelines generated viable sequences as assessed experimental characterization. Thus, such pipelines show the potential to contribute to solving outstanding challenges in de novo protein design.","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W4376131109","citationCount":84,"isOpenAccess":true,"pdfUrl":"https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/pro.4653"},{"id":"oa-W2967175367","title":"Protein structure prediction beyond AlphaFold","authors":"Guo‐Wei Wei","journal":"Nature Machine Intelligence","pubDate":"2019-08-09","doi":"10.1038/s42256-019-0086-4","abstract":"","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W2967175367","citationCount":83,"isOpenAccess":true,"pdfUrl":"https://pmc.ncbi.nlm.nih.gov/articles/PMC10956386/pdf/nihms-1972073.pdf"},{"id":"oa-W4388571183","title":"Enhancing alphafold-multimer-based protein complex structure prediction with MULTICOM in CASP15","authors":"Jian Liu, Zhiye Guo, Tianqi Wu, Raj S. Roy, Farhan Quadir, Chen Chen, Jianlin Cheng","journal":"Communications Biology","pubDate":"2023-11-10","doi":"10.1038/s42003-023-05525-3","abstract":"To enhance the AlphaFold-Multimer-based protein complex structure prediction, we developed a quaternary structure prediction system (MULTICOM) to improve the input fed to AlphaFold-Multimer and evaluate and refine its outputs. MULTICOM samples diverse multiple sequence alignments (MSAs) and templates for AlphaFold-Multimer to generate structural predictions by using both traditional sequence alignments and Foldseek-based structure alignments, ranks structural predictions through multiple complementary metrics, and refines the structural predictions via a Foldseek structure alignment-based refinement method. The MULTICOM system with different implementations was blindly tested in the assembly structure prediction in the 15<sup>th</sup> Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 as both server and human predictors. MULTICOM_qa ranked 3<sup>rd</sup> among 26 CASP15 server predictors and MULTICOM_human ranked 7<sup>th</sup> among 87 CASP15 server and human predictors. The average TM-score of the first predictions submitted by MULTICOM_qa for CASP15 assembly targets is ~0.76, 5.3% higher than ~0.72 of the standard AlphaFold-Multimer. The average TM-score of the best of top 5 predictions submitted by MULTICOM_qa is ~0.80, about 8% higher than ~0.74 of the standard AlphaFold-Multimer. Moreover, the Foldseek Structure Alignment-based Multimer structure Generation (FSAMG) method outperforms the widely used sequence alignment-based multimer structure generation.","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W4388571183","citationCount":72,"isOpenAccess":true,"pdfUrl":"https://www.nature.com/articles/s42003-023-05525-3.pdf"},{"id":"oa-W4406063185","title":"Proteins with alternative folds reveal blind spots in AlphaFold-based protein structure prediction","authors":"Devlina Chakravarty, Myeongsang Lee, Lauren L. Porter","journal":"Current Opinion in Structural Biology","pubDate":"2025-01-05","doi":"10.1016/j.sbi.2024.102973","abstract":"In recent years, advances in artificial intelligence (AI) have transformed structural biology, particularly protein structure prediction. Though AI-based methods, such as AlphaFold (AF), often predict single conformations of proteins with high accuracy and confidence, predictions of alternative folds are often inaccurate, low-confidence, or simply not predicted at all. Here, we review three blind spots that alternative conformations reveal about AF-based protein structure prediction. First, proteins that assume conformations distinct from their training-set homologs can be mispredicted. Second, AF overrelies on its training set to predict alternative conformations. Third, degeneracies in pairwise representations can lead to high-confidence predictions inconsistent with experiment. These weaknesses suggest approaches to predict alternative folds more reliably.","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W4406063185","citationCount":60,"isOpenAccess":true,"pdfUrl":"https://doi.org/10.1016/j.sbi.2024.102973"},{"id":"oa-W4293046261","title":"Enhancing Protein Function Prediction Performance by Utilizing AlphaFold-Predicted Protein Structures","authors":"Wenjian Ma, Shugang Zhang, Zhen Li, Mingjian Jiang, Shuang Wang, Weigang Lu, Xiangpeng Bi, Huasen Jiang, Henggui Zhang, Zhiqiang Wei","journal":"Journal of Chemical Information and Modeling","pubDate":"2022-08-25","doi":"10.1021/acs.jcim.2c00885","abstract":"The structure of a protein is of great importance in determining its functionality, and this characteristic can be leveraged to train data-driven prediction models. However, the limited number of available protein structures severely limits the performance of these models. AlphaFold2 and its open-source data set of predicted protein structures have provided a promising solution to this problem, and these predicted structures are expected to benefit the model performance by increasing the number of training samples. In this work, we constructed a new data set that acted as a benchmark and implemented a state-of-the-art structure-based approach for determining whether the performance of the function prediction model can be improved by putting additional AlphaFold-predicted structures into the training set and further compared the performance differences between two models separately trained with real structures only and AlphaFold-predicted structures only. Experimental results indicated that structure-based protein function prediction models could benefit from virtual training data consisting of AlphaFold-predicted structures. First, model performances were improved in all three categories of Gene Ontology terms (GO terms) after adding predicted structures as training samples. Second, the model trained only on AlphaFold-predicted virtual samples achieved comparable performances to the model based on experimentally solved real structures, suggesting that predicted structures were almost equally effective in predicting protein functionality.","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W4293046261","citationCount":59,"isOpenAccess":false,"pdfUrl":""},{"id":"oa-W4308930019","title":"Deep learning for protein secondary structure prediction: Pre and post-AlphaFold","authors":"Dewi Pramudi Ismi, Reza Pulungan, Afiahayati","journal":"Computational and Structural Biotechnology Journal","pubDate":"2022-01-01","doi":"10.1016/j.csbj.2022.11.012","abstract":"This paper aims to provide a comprehensive review of the trends and challenges of deep neural networks for protein secondary structure prediction (PSSP). In recent years, deep neural networks have become the primary method for protein secondary structure prediction. Previous studies showed that deep neural networks had uplifted the accuracy of three-state secondary structure prediction to more than 80%. Favored deep learning methods, such as convolutional neural networks, recurrent neural networks, inception networks, and graph neural networks, have been implemented in protein secondary structure prediction. Methods adapted from natural language processing (NLP) and computer vision are also employed, including attention mechanism, ResNet, and U-shape networks. In the post-AlphaFold era, PSSP studies focus on different objectives, such as enhancing the quality of evolutionary information and exploiting protein language models as the PSSP input. The recent trend to utilize pre-trained language models as input features for secondary structure prediction provides a new direction for PSSP studies. Moreover, the state-of-the-art accuracy achieved by previous PSSP models is still below its theoretical limit. There are still rooms for improvement to be made in the field.","tldr":"","source":"OpenAlex","sourceUrl":"https://openalex.org/W4308930019","citationCount":44,"isOpenAccess":true,"pdfUrl":"https://doi.org/10.1016/j.csbj.2022.11.012"}],"links":{"web":"https://science-database.com/technology/protein-structure","llms_txt":"https://science-database.com/technology/protein-structure/llms.txt","api":"https://science-database.com/api/v1/technology/protein-structure"}}