Scaffold Finder
- cleanup_fragment(mol)[source]
Replace atom map numbers with Hydrogens :type mol:
Mol:param mol: input molecule :rtype:Tuple[Mol,int] :return: modified molecule, number of R-groups
- generate_fragments(mol)[source]
Generate fragments using the RDKit :type mol:
Mol:param mol: RDKit molecule :rtype:DataFrame:return: a Pandas dataframe with Scaffold SMILES, Number of Atoms, Number of R-Groups
- find_scaffolds(df_in, smiles_col='SMILES', name_col='Name', disable_progress=False)[source]
Generate scaffolds for a set of molecules :type df_in:
DataFrame:param df_in: Pandas dataframe with [SMILES, Name, RDKit molecule] columns :rtype:Tuple[DataFrame,DataFrame] :return: dataframe with molecules and scaffolds, dataframe with unique scaffolds
- get_molecules_with_scaffold(scaffold, mol_df, activity_df, smiles_col='SMILES', name_col='Name', activity_col='pIC50', extra_cols=[])[source]
Find molecules in a DataFrame that match a given scaffold and return their core structures and activity data.
- Parameters:
scaffold (
str) – SMILES string of the scaffold to match.mol_df (
DataFrame) – DataFrame containing molecule information and fragments.activity_df (
DataFrame) – DataFrame containing activity data for molecules.smiles_col (
str) – Name of the column containing SMILES strings.name_col (
str) – Name of the column containing molecule names.activity_col (
str) – Name of the column containing activity values.extra_cols (
List[str]) – Additional columns to include in the output DataFrame.
- Return type:
- Returns:
A tuple with a list of unique core SMILES and a DataFrame of matching molecules with activity data.