The growing demand for biological products drives many efforts to maximize expression of heterologous proteins. Advances in high-throughput sequencing can produce data suitable for building sequence-to-expression models with machine learning. The most accurate models have been trained on one-hot encodings, a mechanism-agnostic representation of nucleotide sequences. Moreover, studies have consistently shown that training on mechanistic sequence features leads to much poorer predictions, even with features that are known to correlate with expression, such as DNA sequence motifs, codon usage, or properties of mRNA secondary structures. However, despite their excellent local accuracy, current sequence-to-expression models can fail to generalize predictions far away from the training data. Through a comparative study across datasets in Escherichia coli and Saccharomyces cerevisiae, here we show that mechanistic sequence features can provide gains on model generalization, and thus improve their utility for predictive sequence design. We explore several strategies to integrate one-hot encodings and mechanistic features into a single predictive model, including feature stacking, ensemble model stacking, and geometric stacking, a novel architecture based on graph convolutional neural networks. Our work casts new light on mechanistic sequence features, underscoring the importance of domain-knowledge and feature engineering for accurate prediction of protein expression levels.
Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.
Evidence ID | Analyze ID | Gene/Complex | Systematic Name/Complex Accession | Qualifier | Gene Ontology Term ID | Gene Ontology Term | Aspect | Annotation Extension | Evidence | Method | Source | Assigned On | Reference |
---|
Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details.
Evidence ID | Analyze ID | Gene | Gene Systematic Name | Phenotype | Experiment Type | Experiment Type Category | Mutant Information | Strain Background | Chemical | Details | Reference |
---|
Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.
Evidence ID | Analyze ID | Gene | Gene Systematic Name | Disease Ontology Term | Disease Ontology Term ID | Qualifier | Evidence | Method | Source | Assigned On | Reference |
---|
Increase the total number of rows displayed on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; to filter the table by a specific experiment type, type a keyword into the Filter box (for example, “microarray”); download this table as a .txt file using the Download button or click Analyze to further view and analyze the list of target genes using GO Term Finder, GO Slim Mapper, or SPELL.
Evidence ID | Analyze ID | Regulator | Regulator Systematic Name | Target | Target Systematic Name | Direction | Regulation of | Happens During | Regulator Type | Direction | Regulation Of | Happens During | Method | Evidence | Strain Background | Reference |
---|
Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through its pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.
Site | Modification | Modifier | Source | Reference |
---|
Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details about experiment type and any other genes involved in the interaction.
Evidence ID | Analyze ID | Interactor | Interactor Systematic Name | Interactor | Interactor Systematic Name | Allele | Assay | Annotation | Action | Phenotype | SGA score | P-value | Source | Reference | Note |
---|
Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details about experiment type and any other genes involved in the interaction.
Evidence ID | Analyze ID | Interactor | Interactor Systematic Name | Interactor | Interactor Systematic Name | Assay | Annotation | Action | Modification | Source | Reference | Note |
---|
Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through its pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.
Complement ID | Locus ID | Gene | Species | Gene ID | Strain background | Direction | Details | Source | Reference |
---|
Increase the total number of rows displayed on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; download this table as a .txt file using the Download button;
Evidence ID | Analyze ID | Dataset | Description | Keywords | Number of Conditions | Reference |
---|