Showing posts with label ChemConnector. Show all posts
Showing posts with label ChemConnector. Show all posts

Thursday, May 5, 2016

Two Billion Compounds?

I've been cracking my skull against a peculiar problem this week:
How many unique molecules compounds have ever been made?*

I'm referring to those produced by humankind, over the past 250 years - give or take a decade - of formal chemistry effort. CAS claims 100 million molecules in their collection, and predict, at the current rate of registration, another 650 million over the next 50 years.

Berries by the side of the road, 2016.
Not counted in billions.
Certainly other databases exist, a well-curated larger example being ChemSpider (34 million), but I'm sure the Venn diagram for that against CAS overlaps quite a bit. Ditto PubChem, which according to ChemConnector had over 37 million structures in 2009, but lots of errors, duplicates, and isotopomers, to hear him tell it. Outside the med-chem arena, there are exciting new collections such as the Aspuru-Guzik lab's Clean Energy Project, to identify photovoltaic materials. Surely the assembled collection of privately-held corporate data from all chemistry, pharma, biotech, and engineering firms must include another windfall; ~200 million compounds?

So, let's try a thought exercise - say we limit the set of what we call "made," or synthesized. We won't consider polymers, whether natural (DNA, polysaccharides) or artificial (Teflon, urethanes). Screening collections, libraries, and combinatorics; unless someone produced >1 mg, I'm leaving it out. Metal complexes and salts are in, since most of the time inorganic and formulations colleagues still produce quantities you can hold and measure (and get a melting point on!).

Granted, by referring explicitly to the public and private chemistry databases, I'm not including dark reactions, those failed experiments or perhaps non-optimal yields that never make it to publication. Based on my lab career (and that of my hood-mates), I'd say there's a comfortable 5-10 molecules made for every 1 that gets reported somewhere. Of course, since many of those are literature preps or repeat reactions, I don't think it inflates the count that much; truly, novel molecules tend to creep into papers and patents somehow.

Chemical space gurus, I apologize - I only want to count things that have been bottled, columned, purified, and analyzed. Large computational data sets of billions - unless they've been made and characterized - aren't up for consideration. Neither are metabolites isolated from plants or microbes; no fair counting what we relied on other organisms to make. S'posing this means we also leave out decomposition products and geological materials.

So them's the rules: 1 mg produced and characterized, non-polymeric, must have been made or produced with human hands. Salts and metals are in, along with isotopomers and stereoisomers.

What do readers and commenters think? My guess is in the title of this post.

--

*On the Twitter, Peter Kenny points out that I should, in truth, be asking after compounds, not molecules. Fair enough.
** Another reader points out that ZINC15, the database of "stuff you can buy now," only includes ~10M at present.

Monday, May 28, 2012

Olympicene's "Top Secret" Final Step

Over in London, preparations for the 2012 Summer Olympic games continue apace. The torch winds its way through the countryside, the ticket printers hum along, and the British Army has mounted defensive missiles on local apartment roofs. But, for those who've been missing the synthetic chemistry connection, wait no longer: enter, Olympicene!

Olympicene
Source: IBM Zurich | BBC
Olympicene, a tight five-ringed structure, does indeed resemble the famous logo of the quadrennial international contest. IBM Zurich, who used specially-functionalized AFM tips to image pentacene in 2009, now brings us fantastic high-res images of this polycycle (see right). 

I won't go into the story behind the science, as that's been elegantly summarized in a number of places already. Instead, I want to highlight a perplexing 'teaser line' from yesterday's ChemConnector post: 
"You can see the Olympicene compound coming together step by step and yes, the final step is not yet reported!" 
OK. Let's see, we have the first few steps laid out for us, thanks to RSC's ChemSpider. Easiest way to make anything? Start with most of it intact! From commercial 1-pyrenecarboxaldehyde, a Wittig olefination, H2 reduction, basic ester hydrolysis, chlorination, Friedel-Crafts, and lithium aluminum hydride (LAH) reduction brings us to the 5-ringed alcohol (shown below). All the steps are greater than 89% yield, except the F/C (15%), which one imagines might make the "other" pentacene isomer preferentially.


I find the final "Top Secret" step amusing, because any organic chemist "familiar with the art" could think of at least five ways to do it! (Non-chemist readers: the molecule on the left needs a single C=C double bond, and standing in the way is just a molecule of water). That alcohol is fairly "activated" for elimination. My guess? A little strong acid, gentle heat, and some molecular sieves.

Pro Tip: Don't believe the hype declaring olympicene the "smallest 5-ringed structure," at just 1.2 nm across. Skeptics, cynics should check their bond lengths. Is olympicene smaller than cubane? (6 rings, ~0.6 nm). How about a ladderane? (5 rings, ~1 nm). Anyone know other molecules that might qualify?


Updates (04:18, 5/29/12) - ChemConnector mentions, via Twitter, that the step is less 'Top Secret,' and more not-yet-drawn-up for ChemSpider Synthetic Pages. Per Excimer's comment, fixed the position of the 'saturated' CH2 carbon. 
(21:10, 5/31/12) - Commenter (And U. Warwick Prof!) Peter Scott points out the new ChemSpider page, showing major isomer and detailing conditions.