Fingerprints and Descriptors

class Smi2Fp(radius=3, fpSize=2048)[source]

Calculate Morgan fingerprints from SMILES strings

get_np(smiles)[source]

Convert a SMILES string to a numpy array with Morgan fingerprint bits.

Parameters:

smiles – SMILES string

Returns:

numpy array with Morgan fingerprint bits

get_np_counts(smiles)[source]

Convert a SMILES string to a numpy array with Morgan fingerprint counts.

Parameters:

smiles – SMILES string

Returns:

numpy array with Morgan fingerprint counts

get_fp(smiles)[source]

Convert a SMILES string to a Morgan fingerprint.

Parameters:

smiles – SMILES string

Returns:

Morgan fingerprint

get_count_fp(smiles)[source]

Convert a SMILES string to a Morgan count fingerprint.

Parameters:

smiles – SMILES string

Returns:

Morgan count fingerprint

mol2morgan_fp(mol, radius=2, nBits=2048)[source]

Convert an RDKit molecule to a Morgan fingerprint To avoid the rdkit deprecated warning, do this from rdkit import rdBase with rdBase.BlockLogs():

uru.smi2numpy_fp(“CCC”)

Parameters:
  • mol (Mol) – RDKit molecule

  • radius (int) – fingerprint radius

  • nBits (int) – number of fingerprint bits

Return type:

ExplicitBitVect

Returns:

RDKit Morgan fingerprint

smi2morgan_fp(smi, radius=2, nBits=2048)[source]

Convert a SMILES to a Morgan fingerprint To avoid the rdkit deprecated warning, do this from rdkit import rdBase with rdBase.BlockLogs():

uru.smi2numpy_fp(“CCC”)

Parameters:
  • smi (str) – SMILES

  • radius (int) – fingerprint radius

  • nBits (int) – number of fingerprint bits

Return type:

Optional[ExplicitBitVect]

Returns:

RDKit Morgan fingerprint

mol2numpy_fp(mol, radius=2, n_bits=2048)[source]

Convert an RDKit molecule to a numpy array with Morgan fingerprint bits Borrowed from https://iwatobipen.wordpress.com/2019/02/08/convert-fingerprint-to-numpy-array-and-conver-numpy-array-to-fingerprint-rdkit-memorandum/

Parameters:
  • mol (Mol) – RDKit molecule

  • radius (int) – fingerprint radius

  • n_bits (int) – number of fingerprint bits

Return type:

ndarray

Returns:

numpy array with RDKit fingerprint bits

smi2numpy_fp(smi, radius=2, nBits=2048)[source]

Convert a SMILES to a numpy array with Morgan fingerprint bits

Parameters:
  • smi (str) – SMILES string

  • radius (int) – fingerprint radius

  • nBits (int) – number of fingerprint bits

Return type:

ndarray

Returns:

numpy array with RDKit fingerprint bits

class RDKitDescriptors(desc_names=None, hide_progress=False, skip_fragments=False)[source]

Calculate RDKit descriptors for molecules or SMILES.

Provide methods to compute descriptor vectors for a single molecule or SMILES, and to produce pandas DataFrames for lists of molecules or SMILES.

Attributes

desc_names

Sorted list of descriptor names that will be calculated.

hide_progress

Whether to hide progress bars when processing lists.

Initialize descriptor calculator.

type desc_names:

param desc_names:

Optional list of descriptor names to use. If not provided, the full RDKit descriptor list is used.

type hide_progress:

bool

param hide_progress:

If true, progress bars are disabled when processing lists.

type skip_fragments:

param skip_fragments:

If true, descriptors whose names contain “fr_” are excluded.

return:

None

update_descriptors(index_list)[source]

Update the descriptor names to only include those at the specified indices.

Parameters:

index_list (List[int]) – List of indices to keep

Return type:

None

Returns:

None

calc_mol(mol)[source]

Calculate descriptors for an RDKit molecule.

Parameters:

mol (Mol) – RDKit molecule

Return type:

ndarray

Returns:

A numpy array with descriptor values

calc_smiles(smiles)[source]

Calculate descriptors for a SMILES string.

Parameters:

smiles (str) – SMILES string

Return type:

ndarray

Returns:

A numpy array with descriptor values

pandas_smiles(smiles_list)[source]

Calculate descriptors for a list of SMILES and return a DataFrame.

Parameters:

smiles_list (List[str]) – List of SMILES strings

Return type:

DataFrame

Returns:

DataFrame where each row corresponds to a SMILES and columns are descriptors

pandas_mols(mol_list)[source]

Calculate descriptors for a list of RDKit molecules and return a DataFrame.

Parameters:

mol_list (List[Mol]) – List of RDKit molecule objects

Return type:

DataFrame

Returns:

DataFrame where each row corresponds to a molecule and columns are descriptors

clean_descriptors(desc_in)[source]

Remove descriptor columns that contain any NaN or infinite values.

Parameters:

desc_in (ndarray) – Input descriptor array

Return type:

tuple[Union[ndarray, List[int]]]

Returns:

Tuple containing the cleaned descriptor array (only columns with all finite values) and a list of kept column indices

scale_descriptors(desc_in)[source]

Scale descriptor DataFrame using StandardScaler.

Parameters:

desc_in (DataFrame) – Input descriptor DataFrame

Return type:

tuple[Any, StandardScaler]

Returns:

Tuple with the scaled descriptor array and the fitted StandardScaler

clean_and_scale_descriptors(desc_in)[source]

Clean and scale a descriptor DataFrame.

Parameters:

desc_in (DataFrame) – Input descriptor DataFrame

Return type:

tuple[Any, StandardScaler]

Returns:

Tuple containing the cleaned and scaled descriptor array and the fitted StandardScaler

class RDKitProperties[source]

Calculate RDKit properties

calc_mol(mol)[source]

Calculate properties for an RDKit molecule

Parameters:

mol (Mol) – RDKit molecule

Return type:

ndarray

Returns:

a numpy array with properties

calc_smiles(smi)[source]

Calculate properties for a SMILES string

Parameters:

smi (str) – SMILES string

Return type:

Optional[ndarray]

Returns:

a numpy array with properties

pandas_smiles(smi_list)[source]

Calculates properties for a list of SMILES strings and returns them as a pandas DataFrame.

Parameters:

smi_list (List[str]) – List of SMILES strings

Returns:

DataFrame with calculated properties. Each row corresponds to a SMILES string and each column to a property.

Return type:

pd.DataFrame

pandas_mols(mol_list)[source]

Calculates properties for a list of RDKit molecules and returns them as a pandas DataFrame.

Parameters:

mol_list (List[Mol]) – List of RDKit molecules

Returns:

DataFrame with calculated properties. Each row corresponds to a molecule and each column to a property.

Return type:

pd.DataFrame

class Ro5Calculator[source]

A class used to calculate Lipinski’s Rule of Five properties for a given molecule.

Attributes

namesList[str]

A list of names of the properties to be calculated.

functionsList[Callable[[Mol], float]]

A list of functions used to calculate the properties.

Methods

calc_mol(mol: Mol) -> np.ndarray

Calculates properties for a RDKit molecule.

calc_smiles(smi: str) -> Optional[np.ndarray]

Calculates properties for a SMILES string.

pandas_smiles(smiles_list: List[str]) -> pd.DataFrame

Calculates properties for a list of SMILES strings and returns them as a pandas DataFrame.

pandas_mols(mol_list: List[Mol]) -> pd.DataFrame

Calculates properties for a list of RDKit molecules and returns them as a pandas DataFrame.

Initialize the Ro5Calculator class.

type self:

Ro5Calculator

param self:

An instance of the Ro5Calculator class

type self:

Ro5Calculator

return:

None

rtype:

None

calc_mol(mol)[source]

Calculate properties for a RDKit molecule

Parameters:

mol (Mol) – RDKit molecule

Returns:

a numpy array with properties

Return type:

np.ndarray

calc_smiles(smi)[source]

Calculate properties for a SMILES string

Parameters:

smi (str) – SMILES string

Returns:

a numpy array with properties

Return type:

Optional[np.ndarray]

pandas_smiles(smiles_list)[source]

Calculates properties for a list of SMILES strings and returns them as a pandas DataFrame.

Parameters:

smiles_list (List[str]) – List of SMILES strings

Returns:

DataFrame with calculated properties. Each row corresponds to a SMILES string and each column to a property.

Return type:

pd.DataFrame

pandas_mols(mol_list)[source]

Calculates properties for a list of RDKit molecules and returns them as a pandas DataFrame.

Parameters:

mol_list (List[Mol]) – List of RDKit molecules

Returns:

DataFrame with calculated properties. Each row corresponds to a molecule and each column to a property.

Return type:

pd.DataFrame

compare_datasets(train_fp, test_fp)[source]

Compare two datasets of fingerprints and return the maximum Tanimoto similarity for each test fingerprint.

Parameters:
  • train_fp (list) – Training set fingerprints.

  • test_fp (list) – Test set fingerprints.

Return type:

list

Returns:

List of maximum similarity values for each test fingerprint.