Miscellaneous Utilities

smi2mol_with_errors(smi)[source]

Parse SMILES and return any associated errors or warnings

Parameters:

smi (str) – input SMILES

Return type:

Tuple[Mol, str]

Returns:

tuple of RDKit molecule, warning or error

count_fragments(mol)[source]

Count the number of fragments in a molecule

Parameters:

mol (Mol) – RDKit molecule

Return type:

int

Returns:

number of fragments

get_largest_fragment(mol)[source]

Return the fragment with the largest number of atoms

Parameters:

mol (Mol) – RDKit molecule

Return type:

Mol

Returns:

RDKit molecule with the largest number of atoms

taylor_butina_clustering(fp_list, cutoff=0.65)[source]

Cluster a set of fingerprints using the RDKit Taylor-Butina implementation

Parameters:
  • fp_list (List[ExplicitBitVect]) – a list of fingerprints

  • cutoff (float) – distance cutoff (1 - Tanimoto similarity)

Return type:

List[int]

Returns:

a list of cluster ids

label_atoms(mol, labels)[source]

Label atoms when depicting a molecule

Parameters:
  • mol (Mol) – input molecule

  • labels (List[str]) – labels, one for each atom

Return type:

Mol

Returns:

molecule with labels

tag_atoms(mol, atoms_to_tag, tag='x')[source]

Tag atoms with a specified string

Parameters:
  • mol (Mol) – input molecule

  • atoms_to_tag (List[int]) – indices of atoms to tag

  • tag (str) – string to use for the tags

Return type:

Mol

Returns:

molecule with atoms tagged

rd_shut_the_hell_up()[source]

Make the RDKit be a bit more quiet

Return type:

None

Returns:

None

demo_block_logs()[source]

An example of another way to turn off RDKit logging

Return type:

None

Returns:

None

boxplot_base64_image(dist, x_lim=[0, 10])[source]

Plot a distribution as a seaborn boxplot and save the resulting image as a base64 image.

Parameters:
  • dist (ndarray) – The distribution data to plot.

  • x_lim (list[int]) – The x-axis limits for the boxplot.

Return type:

str

Returns:

The base64 encoded image string.

mol_to_base64_image(mol, target='html', pattern_mol=None, width=300, height=150)[source]

Convert an RDKit molecule to a base64 encoded image string, optionally highlighting atoms matching a SMARTS pattern.

Parameters:
  • mol (Mol) – RDKit molecule to convert.

  • target (str) – Target format for the image, either “html” or “altair”.

  • pattern_mol (Mol) – Molecule created from Chem.MolFromSmarts to highlight matching atoms.

  • width (int) – Width of the image in pixels.

  • height (int) – Height of the image in pixels.

Return type:

str

Returns:

Base64 encoded image string.

smi_to_base64_image(smiles, target='html', width=300, height=150)[source]

Convert a SMILES string to a base64 encoded image string.

Parameters:
  • smiles (str) – SMILES string to convert.

  • target (str) – Target format for the image, either “html” or “altair”.

  • width (int) – Width of the image in pixels.

  • height (int) – Height of the image in pixels.

Return type:

str

Returns:

Base64 encoded image string.

remove_dummy_atoms(mol)[source]

Remove dummy atoms (atomic number 0) from an RDKit molecule.

Parameters:

mol (rdkit.Chem.Mol) – RDKit molecule from which to remove dummy atoms.

Returns:

RDKit molecule with dummy atoms removed.

Return type:

rdkit.Chem.Mol

align_mols_to_template(template_smiles, smiles_list)[source]

Aligns a list of molecules to a template molecule defined by its SMILES string.

Removes dummy atoms (atomic number 0) from the template, generates 2D coordinates, and aligns each molecule in the input list to the template structure.

Parameters:
  • template_smiles – SMILES string of the template molecule.

  • smiles_list – List of SMILES strings to align.

Returns:

List of aligned RDKit molecule objects.

mcs_align(smiles_list)[source]

Aligns a list of molecules to their Maximum Common Substructure (MCS).

Converts SMILES strings to RDKit molecules, finds the MCS (restricted to complete rings), generates 2D coordinates for the MCS template, and aligns all molecules to this template.

Parameters:

smiles_list (list[str]) – List of SMILES strings to align.

Returns:

List of aligned RDKit molecule objects.

Return type:

list[rdkit.Chem.Mol]