Thermally Activated Delayed Fluorescence Material (TADF) Property Prediction¶

Note

Before starting training and evaluation, please ensure the existence of property data files (.dat) and SMILES (smis.txt) data files, and modify data_dir in the yaml configuration file to the property data file path and sim_dir to the SMILES data file path.
Before starting training and evaluation, please install rdkit etc. For related dependencies, please execute pip install -r requirements.txt to install.

Pretrained Model	Metrics
Est.pdparams	loss(MAE): 0.045
f.pdparams	loss(MAE): 0.036
angle.pdparams	loss(MAE): 0.041

Model Training CommandModel Evaluation Command

# Est Prediction:
cd TADF_Est
wget -c https://paddle-org.bj.bcebos.com/paddlescience/datasets/TADF/Est/Est.dat https://paddle-org.bj.bcebos.com/paddlescience/datasets/TADF/smis.txt
python Est.py mode=train

# f Prediction:
cd TADF_f
wget -c https://paddle-org.bj.bcebos.com/paddlescience/datasets/TADF/f/f.dat https://paddle-org.bj.bcebos.com/paddlescience/datasets/TADF/smis.txt
python f.py mode=train

# angle Prediction:
cd TADF_angle
wget -c https://paddle-org.bj.bcebos.com/paddlescience/datasets/TADF/angle/angle.dat https://paddle-org.bj.bcebos.com/paddlescience/datasets/TADF/smis.txt
python angle.py mode=train

# Est Evaluation:
cd TADF_Est
wget -c https://paddle-org.bj.bcebos.com/paddlescience/datasets/TADF/Est/Est.dat https://paddle-org.bj.bcebos.com/paddlescience/datasets/TADF/smis.txt
python Est.py mode=eval EVAL.pretrained_model_path=https://paddle-org.bj.bcebos.com/paddlescience/models/TADF/Est/Est_model.pdparams

# f Evaluation:
cd TADF_f
wget -c https://paddle-org.bj.bcebos.com/paddlescience/datasets/TADF/f/f.dat https://paddle-org.bj.bcebos.com/paddlescience/datasets/TADF/smis.txt
python f.py mode=eval EVAL.pretrained_model_path=https://paddle-org.bj.bcebos.com/paddlescience/models/TADF/f/f_model.pdparams

# angle Evaluation:
cd TADF_angle
wget -c https://paddle-org.bj.bcebos.com/paddlescience/datasets/TADF/f/f.dat https://paddle-org.bj.bcebos.com/paddlescience/datasets/TADF/smis.txt
python angle.py mode=eval EVAL.pretrained_model_path=https://paddle-org.bj.bcebos.com/paddlescience/models/TADF/angle/angle_model.pdparams

1. Background Introduction¶

Organic Light-Emitting Diodes (OLEDs) have attracted widespread attention in advanced display and lighting technologies due to their advantages of high efficiency, flexible structure and low cost. In OLED devices, electrically injected carriers form singlet and triplet excitons in a ratio of 1:3. The theoretical limit of OLED luminous efficiency IQE constructed with pure fluorescent materials as luminescent materials is 25%. On the other hand, organometallic complex luminescent materials introduce rare metals (Ir, Pt, etc.) to bring strong spin-orbit coupling (SOC), which can convert singlet excitons into triplet excitons through intersystem crossing process, thereby utilizing triplet excitons to emit phosphorescence, and its IQE can reach 100%, but the high price of rare metals hinders its widespread use. Thermally Activated Delayed Fluorescence materials (TADF) provide new ideas for solving these problems and have attracted widespread attention. In TADF, triplets are converted into singlets and emit fluorescence through Reverse Intersystem Crossing process (RISC), thereby achieving 100% IQE, and the RISC process largely depends on the energy gap (\(\Delta Est\)) between the lowest singlet state (S1) and the lowest triplet state (T1). According to quantum mechanics theory, ΔEST is equivalent to twice the exchange integral between HOMO and LUMO. Therefore, a common design strategy for TADF molecules is to combine electron donor (D) and electron acceptor (A) with a significantly twisted dihedral angle to achieve significant separation of HOMO and LUMO in space. However, contrary to \(\Delta Est\), oscillator strength (\(f\)) requires a larger overlap integral between HOMO and LUMO, and the contradiction between the two needs to be further balanced.

2. Model Principle¶

The dataset is constructed through high-throughput computing, and molecular feature representation is realized through three links: molecular structure input, fingerprint feature extraction, and feature dimensionality reduction. Subsequently, the complex mapping relationship between molecular structure features and key TADF parameters is learned through multi-layer non-linear transformation, and finally end-to-end property prediction is realized.

3. Implementation of TADF Property Prediction Model¶

This example includes predictions of three properties: energy gap (\(\Delta Est\)), oscillator strength (\(f\)), and dihedral angle (\(angle\)) between electron donor and electron acceptor of chemical molecules. Next, taking dihedral angle \(angle\) as an example, we will explain how to implement the construction, training, testing and evaluation of the TADF property prediction model based on PaddleScience code. The directory structure of the case is as follows:

tadf/
├──TADF_angle/
│   ├── config/
│   │    └── angle.yaml
│   ├── angle_model.py
│   ├── angle.dat
│   ├── angle.py
│   └── smis.txt
├── TADF_Est/
│   └── ...
├── TADF_f/
│   └── ...
└── requirements.txt

3.1 Dataset Preparation¶

We choose 49 common acceptors and 50 donors to combine by single bond connection. By exhausting all possible combination sites, we obtained 44470 molecules. The initial structures of molecules were obtained by MMFF94 force field optimization. 5136 molecules were randomly extracted from 44470 molecules, ground state structure optimization was performed on 5136 molecules at B3LYP/6-31G(d) level, and excited state properties were calculated under ground state configuration using TDDFT method.

The data used in this case includes property data files (such as angle.dat) and SMILES data files (smis.txt). Each line of the molecular property data file is a molecular property; each line of smis.txt is a SMILES description of a molecule, that is, encoding the molecular structure into a linear string with a string of characters. Taking the first data as an example

c1cc(-c2c3ccoc3cc3ccoc23)n[nH]1

Among them, lowercase c represents aromatic carbon, n represents aromatic nitrogen, and [nH] represents aromatic nitrogen with a hydrogen atom. Numbers 1, 2, 3 represent ring opening and closing marks: the first 1 opens a ring, and closes it when encountering the next 1.

After loading the property data file and SMILES data file line by line according to the information in the configuration file, first convert the SMILES description of the molecule into Morgan fingerprint through rdkit.Chem.rdFingerprintGenerator. Morgan fingerprint is a vectorized description of molecular structure, encoded as hash value through local topology, mapped to 2048-bit fingerprint bits. Subsequently, PCA is used to reduce the 2048 dimensions to dimensions where principal components retain 99% variance. Expressed in PaddleScience code as follows

examples/tadf/TADF_angle/angle_model.py
def load_data(cfg):
    data_dir = cfg.data_dir
    sim_dir = cfg.sim_dir
    angle_dat_path = os.path.join(data_dir)
    smis_txt_path = os.path.join(sim_dir)

    data = []
    with open(angle_dat_path) as f:
        for line in f:
            num = float(line.strip()) / 90
            data.append(num)

    smis = []
    with open(smis_txt_path) as f:
        for line in f:
            smis.append(line.strip())

    return data, smis


def featurize_molecules(smis):
    vectors = []
    del_mol = []
    for s in smis:
        # Convert SMILES to RDKit molecule object
        mol = Chem.MolFromSmiles(s)
        try:
            # Create Morgan fingerprint generator (radius 2, 2048-bit vector)
            generator = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)

            # Generate fingerprint for current molecule
            fp = generator.GetFingerprint(mol)

            _input = np.array(list(map(int, fp.ToBitString())))
            vectors.append(_input)
        except Exception as e:
            print(f"Error processing {s}: {e}")
            del_mol.append(s)

    # Initialize PCA preserving 99% of variance
    pca = PCA(n_components=0.99)

    # Learn PCA transformation from fingerprint vectors
    pca.fit(vectors)
    X = pca.transform(vectors)
    return paddle.to_tensor(X, dtype="float32")

3.2 Constraint Construction¶

This study uses supervised learning. According to the PaddleScience API structure description, the built-in SupervisedConstraint is used to construct supervised constraints. Expressed in PaddleScience code as follows

examples/tadf/TADF_angle/angle_model.py
    dataloader_cfg={
        "dataset": {
            "name": "IterableNamedArrayDataset",
            "input": x,
            "label": {"u": y_train},
        },
        "batch_size": cfg.TRAIN.batch_size,
    },
    loss=ppsci.loss.MSELoss("mean"),
    name="sup",
)

The second parameter of SupervisedConstraint indicates using mean squared error MSELoss as the loss function, and the third parameter indicates the name of the constraint condition, which is convenient for subsequent indexing.

3.3 Model Construction¶

For three prediction objects, the same deep neural network was designed. The network structure is a neural network containing two hidden layers. The first hidden layer contains 587 neurons, and the second hidden layer contains 256 neurons. Dropout is added between hidden layers. Taking \(angle\) prediction as an example, expressed in PaddleScience code as follows

examples/tadf/TADF_angle/angle_model.py
hidden_size = [587, 256]
num_layers = None
# Instantiate TADF model
model = ppsci.arch.TADF(
    input_keys=tuple(x.keys()),
    hidden_size=hidden_size,
    num_layers=num_layers,
    **cfg.MODEL,
)
optimizer = ppsci.optimizer.Adam(

3.4 Optimizer Construction¶

In the angle property prediction of this case, the trainer uses the Adam optimizer, the learning rate is set to 0.01, and weight_decay is set to 1e-5. Expressed in PaddleScience code as follows

examples/tadf/TADF_angle/angle_model.py
    learning_rate=cfg.TRAIN.learning_rate,
    beta1=0.9,
    beta2=0.99,
    weight_decay=cfg.TRAIN.weight_decay,
)(model)

3.5 Model Training¶

After completing the above settings, you only need to pass the instantiated objects to ppsci.solver.Solver in order, and then start training. Expressed in PaddleScience code as follows

examples/tadf/TADF_angle/angle_model.py
solver = ppsci.solver.Solver(
    model,
    constraint={sup.name: sup},
    optimizer=optimizer,
    epochs=cfg.TRAIN.epochs,
    eval_during_train=False,
    iters_per_epoch=cfg.TRAIN.iters_per_epoch,
)
try:
    solver.train()
except Exception as ex:

4. Complete Code¶

examples/tadf/TADF_angle/angle_model.py
import os

import matplotlib.pyplot as plt
import numpy as np
import paddle
import rdkit.Chem as Chem
from omegaconf import DictConfig
from rdkit.Chem import rdFingerprintGenerator
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split

import ppsci

paddle.set_device("gpu:0")
os.environ["HYDRA_FULL_ERROR"] = "1"
os.environ["KMP_DUPLICATE_LIB_OK"] = "True"
plt.rcParams["axes.unicode_minus"] = False
plt.rcParams["font.sans-serif"] = ["DejaVu Sans"]
# Data preparation
def load_data(cfg):
    data_dir = cfg.data_dir
    sim_dir = cfg.sim_dir
    angle_dat_path = os.path.join(data_dir)
    smis_txt_path = os.path.join(sim_dir)

    data = []
    with open(angle_dat_path) as f:
        for line in f:
            num = float(line.strip()) / 90
            data.append(num)

    smis = []
    with open(smis_txt_path) as f:
        for line in f:
            smis.append(line.strip())

    return data, smis


def featurize_molecules(smis):
    vectors = []
    del_mol = []
    for s in smis:
        # Convert SMILES to RDKit molecule object
        mol = Chem.MolFromSmiles(s)
        try:
            # Create Morgan fingerprint generator (radius 2, 2048-bit vector)
            generator = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)

            # Generate fingerprint for current molecule
            fp = generator.GetFingerprint(mol)

            _input = np.array(list(map(int, fp.ToBitString())))
            vectors.append(_input)
        except Exception as e:
            print(f"Error processing {s}: {e}")
            del_mol.append(s)

    # Initialize PCA preserving 99% of variance
    pca = PCA(n_components=0.99)

    # Learn PCA transformation from fingerprint vectors
    pca.fit(vectors)
    X = pca.transform(vectors)
    return paddle.to_tensor(X, dtype="float32")


def train(cfg: DictConfig, X, data):
    # k-fold cross validation splitter
    def k_fold(k, i, X, Y):
        fold_size = X.shape[0] // k
        val_start = i * fold_size
        if i != k - 1:
            val_end = (i + 1) * fold_size
            x_val, y_val = X[val_start:val_end], Y[val_start:val_end]
            x_train = paddle.concat((X[0:val_start], X[val_end:]), axis=0)
            y_train = paddle.concat((Y[0:val_start], Y[val_end:]), axis=0)
        else:
            x_val, y_val = X[val_start:], Y[val_start:]
            x_train = X[0:val_start]
            y_train = Y[0:val_start]
        return x_train, y_train, x_val, y_val

    Y = paddle.to_tensor(data, dtype="float32")
    x_train, y_train, x_test, y_test = k_fold(cfg.TRAIN.k, cfg.TRAIN.i, X, Y)
    # Prepare feature dictionary
    x = {
        f"key_{i}": paddle.unsqueeze(
            paddle.to_tensor(x_train[:, i], dtype="float32"), axis=1
        )
        for i in range(x_train.shape[1])
    }
    y_train = paddle.unsqueeze(paddle.to_tensor(y_train, dtype="float32"), axis=1)

    # Build supervised constraint
    sup = ppsci.constraint.SupervisedConstraint(
        dataloader_cfg={
            "dataset": {
                "name": "IterableNamedArrayDataset",
                "input": x,
                "label": {"u": y_train},
            },
            "batch_size": cfg.TRAIN.batch_size,
        },
        loss=ppsci.loss.MSELoss("mean"),
        name="sup",
    )

    # Set model architecture parameters
    hidden_size = [587, 256]
    num_layers = None
    # Instantiate TADF model
    model = ppsci.arch.TADF(
        input_keys=tuple(x.keys()),
        hidden_size=hidden_size,
        num_layers=num_layers,
        **cfg.MODEL,
    )
    optimizer = ppsci.optimizer.Adam(
        learning_rate=cfg.TRAIN.learning_rate,
        beta1=0.9,
        beta2=0.99,
        weight_decay=cfg.TRAIN.weight_decay,
    )(model)

    # Build solver for training
    solver = ppsci.solver.Solver(
        model,
        constraint={sup.name: sup},
        optimizer=optimizer,
        epochs=cfg.TRAIN.epochs,
        eval_during_train=False,
        iters_per_epoch=cfg.TRAIN.iters_per_epoch,
    )
    try:
        solver.train()
    except Exception as ex:
        print(ex)


def evaluate(cfg: DictConfig, X, data):
    y_full = paddle.to_tensor(data, dtype="float32")
    X_np = X.numpy()
    y_np = y_full.numpy()
    X_train_np, X_test_np, y_train_np, y_test_np = train_test_split(
        X_np,
        y_np,
        test_size=cfg.EVAL.test_size,
        random_state=cfg.EVAL.seed,
    )
    x_test = paddle.to_tensor(X_test_np, dtype="float32")
    y_test = paddle.to_tensor(y_test_np, dtype="float32")

    x_dict = {
        f"key_{i}": paddle.unsqueeze(x_test[:, i], axis=1)
        for i in range(x_test.shape[1])
    }

    test_validator = ppsci.validate.SupervisedValidator(
        dataloader_cfg={
            "dataset": {
                "name": "IterableNamedArrayDataset",
                "input": x_dict,
                "label": {"u": paddle.unsqueeze(y_test, axis=1)},
            },
            "batch_size": cfg.EVAL.batch_size,
            "shuffle": False,
        },
        loss=ppsci.loss.MSELoss("mean"),
        metric={
            "MAE": ppsci.metric.MAE(),
            "RMSE": ppsci.metric.RMSE(),
            "R2": ppsci.metric.R2Score(),
        },
        name="test_eval",
    )
    validators = {"test_eval": test_validator}

    model = ppsci.arch.TADF(
        input_keys=tuple(x_dict.keys()),
        hidden_size=[587, 256],
        num_layers=None,
        **cfg.MODEL,
    )

    solver = ppsci.solver.Solver(
        model,
        validator=validators,
        cfg=cfg,
    )

    _, metric_dict = solver.eval()

    ypred = model(x_dict)["u"].numpy()
    ytrue = paddle.unsqueeze(y_test, axis=1).numpy()

    mae = metric_dict["MAE"]["u"]
    rmse = metric_dict["RMSE"]["u"]
    r2 = metric_dict["R2"]["u"]

    print("Evaluation metrics:")
    print(f"MAE:  {mae:.4f}")
    print(f"RMSE: {rmse:.4f}")
    print(f"R2:   {r2:.4f}")

    plt.scatter(
        ytrue,
        ypred,
        s=15,
        color="royalblue",
        marker="s",
        linewidth=1,
    )
    plt.plot(
        [ytrue.min(), ytrue.max()],
        [ytrue.min(), ytrue.max()],
        "r-",
        lw=1,
    )
    plt.legend(title=f"R²={r2:.3f}\n\nMAE={mae:.3f}")
    plt.xlabel("Test θ(°)")
    plt.ylabel("Predicted θ(°)")
    save_path = "test_angle.png"
    plt.savefig(save_path)
    print(f"图片已保存至：{save_path}")
    plt.show()

examples/tadf/TADF_angle/angle.py
import hydra
from angle_model import evaluate
from angle_model import featurize_molecules
from angle_model import load_data
from angle_model import train
from omegaconf import DictConfig


@hydra.main(version_base=None, config_path="./config", config_name="angle.yaml")
def main(cfg: DictConfig):
    data, smis = load_data(cfg)
    X = featurize_molecules(smis)
    if cfg.mode == "train":
        train(cfg, X, data)
    elif cfg.mode == "eval":
        evaluate(cfg, X, data)
    else:
        raise ValueError(f"cfg.mode should be 'train' or 'eval', but got '{cfg.mode}'")


if __name__ == "__main__":
    main()

5. Result Display¶

The figure below shows the model prediction results for three properties: energy gap (\(\Delta Est\)), oscillator strength (\(f\)), and dihedral angle (\(angle\)) between electron donor and electron acceptor.

Model prediction result of energy gap (ΔEst)

Model prediction result of oscillator strength (f)

Model prediction result of dihedral angle (angle) between electron donor and electron acceptor

6. References¶

[1] Yufei Bu, Qian Peng*, Designing Promising Thermally Activated Delayed Fluroscence Emitters via Machine Learning-Assisted High-Throughput Virtual Screening. J. Phys. Chem. C. 2023. DOI: 10.1021/acs.jpcc.3c05337.