We started by gathering and curating a complete and comprehensive dataset that encompasses enzyme substrates, sequences, structures, and catalytic constants (kcat) from SABIO-RK, Uniprot, PubChem, and AlphaFold databases. The download and filtering of the data was automated to enable easy updates at any time. In total, we collected ~10.000 entries to train, validate, and test a deep learning model. Before that, we selected alternative tasks with larger datasets that could be useful to find appropriate sequence representations and network architectures. Specifically, we investigated the effect of protein embeddings on the prediction of brightness in green fluorescent proteins (GFP regression task, ~50.000 entries), and also network architectures on the prediction of first-level enzyme commission numbers (EC classification task, ~70.000 entries). For the GFP regression task, we evaluated the performance of sequence representations obtained from pre-trained evolutionary models, including Unirep (mLSTM model), SeqVec (biLSTM model), and ProtBERT and ESM-1B (both transformer models). We found that a simple one-hot encoding of the sequence was competitive and more data efficient than all the tested representations. For the EC classification task, we tested architectures using as inputs both sequence-only (convolutional neural networks, CNN) and structural information (graph neural networks, GNN). Our results showed that the two networks performed excellently well for predicting EC numbers, both with high classification accuracies. We then trained both the CNN and GNN networks on our curated dataset, aiming to predict log10[kcat] values (regression task). Despite the limited size of our dataset and the complexity of the factors that affect kcat values, we were able to achieve satisfactory correlations between predicted and ground truth values, even though model generalization was difficult outside the space of representative substrates, sequences, and structures. Finally, we addressed model interpretability by implementing an attention mechanism to the structural GNN model, and mapping attention weights onto known binding and catalytic residues. These analyses revealed that the network learned meaningful structural patterns, leading to deep enzymatic representations that could be used for different tasks. As part of the project, we also aimed to design an enzyme able to catalyze an SNAr reaction involved in the synthesis of a relevant drug, with the goal of reducing the environmental impact of current synthetic approaches. To that end, we computed the transition state (TS) of the reaction using the nudged elastic band method as implemented in ORCA, using density functional theory to describe the system. Afterwards, we obtained thermostable structures from the protein data bank (PDB) to use them as scaffolds for the TS. We then devised a design protocol that optimizes a multi-objective function (PyRosetta) and checks the stability of the designs using molecular dynamics (OpenMM). We tested this protocol with the scaffold of a thermostable azoreductase enzyme, finding variants with improved in silico properties. Additionally, during the project we established collaborations with experimental groups to work on the mechanistic understanding of enzymes, including a human receptor related to SARS-CoV-2 viral infection and a novel glycosyl transferase that is able to synthesize drug-like molecules and natural products.