Scaffold Finder

cleanup_fragment(mol)[source]

Replace atom map numbers with Hydrogens :type mol: Mol :param mol: input molecule :rtype: Tuple[Mol, int] :return: modified molecule, number of R-groups

generate_fragments(mol)[source]

Generate fragments using the RDKit :type mol: Mol :param mol: RDKit molecule :rtype: DataFrame :return: a Pandas dataframe with Scaffold SMILES, Number of Atoms, Number of R-Groups

find_scaffolds(df_in, smiles_col='SMILES', name_col='Name', disable_progress=False)[source]

Generate scaffolds for a set of molecules :type df_in: DataFrame :param df_in: Pandas dataframe with [SMILES, Name, RDKit molecule] columns :rtype: Tuple[DataFrame, DataFrame] :return: dataframe with molecules and scaffolds, dataframe with unique scaffolds

get_molecules_with_scaffold(scaffold, mol_df, activity_df, smiles_col='SMILES', name_col='Name', activity_col='pIC50', extra_cols=[])[source]

Find molecules in a DataFrame that match a given scaffold and return their core structures and activity data.

Parameters:
  • scaffold (str) – SMILES string of the scaffold to match.

  • mol_df (DataFrame) – DataFrame containing molecule information and fragments.

  • activity_df (DataFrame) – DataFrame containing activity data for molecules.

  • smiles_col (str) – Name of the column containing SMILES strings.

  • name_col (str) – Name of the column containing molecule names.

  • activity_col (str) – Name of the column containing activity values.

  • extra_cols (List[str]) – Additional columns to include in the output DataFrame.

Return type:

Tuple[List[str], DataFrame]

Returns:

A tuple with a list of unique core SMILES and a DataFrame of matching molecules with activity data.

main()[source]

Read all SMILES files in the current directory, generate scaffolds, report stats for each scaffold :return: None