Vaccine and Immunotherapy Technologies
9-11 April 2008, Canberra
Zihe Rao
Professor Zihe Rao
President, Nankai University, Tianjin, China
Zihe Rao is a renowned molecular biophysicist and structural biologist in China. A graduate of the University of Science and Technology of China, he received his masters degree from the Graduate School of the Chinese Academy of Sciences and his PhD from The University of Melbourne in Australia. Following a long period of research at the University of Oxford in the UK, Zihe returned to China as a Professor of Structural Biology at Tsinghua University. From 2003 to 2007, he served as Director-general of the Institute of Biophysics, part of the Chinese Academy of Sciences, and was also Director of the National Laboratory of Biomacromolecules. He was awarded the Chen Jiageng Science Prize for his work on the structure of mitochondrial respiratory membrane protein complex II, and the Trieste Science Prize for his SARS research.
 
Structural biology studies of infectious diseases: From AIDS to SARS to bird flu

I am not talking about vaccines, but I can probably give you some information for your vaccine design. In the second part of my talk I would like to say something about a high throughput approach for structure-based ‘hits’ screening – not drug screening but hits – from a natural mixtures library.

The work I did during my Oxford time, with David Stuart, was on the crystal structure of an SIV matrix antigen. Nature commented, ‘These findings provide a model for the assembly of other lentiviruses, including HIV, and a new target for antiviral therapy’ (Nature (1995) 378: 743-47).


(Click on image for a larger version)

This slide shows most of the important proteins in HIV that have been solved. The matrix protein is very small, but you can see that it is responsible for the viral assembly.


(Click on image for a larger version)

This matrix protein at the top left was expressed from a megalovirus. We crystallised and solved the structure by a method. It is just a structure and we cannot tell you too much about this, but fortunately this protein forms a trimer. Shown here is a surface view of the trimer, and also a side view and a bottom view. You can see here the N-terminus and this is a C-terminus trimer. The N-terminus we understand is attached to the membrane. So we suppose this is probably a ‘building block’ for the lentivirus, because this is flat and biological studies show that the N-terminus is attached to that membrane.

If this is a trimer, of course the trimer always can form an icosahedron. So one of the very important conditions is the diameter of this icosahedron. If it is similar to a retrovirus, probably that is fine. So it would be like a melon, but this piece is too exaggerated. If this trimer (in the centre at the top of the slide) could be like a whole virus building block and we made this structure at the top right, trimer-trimer, we would find that this is about 68 Ångstrom. And here we see the EM results showing a size of about 65 Å. So the diameter is right.

So then, on that basis, we have trimer-trimer assembly and then a whole virus. This is a very exciting result. At that time we did not know it, but probably it was the first result to show that retrovirus assembly could be trimeric.

After our publication, Science named it as a new model, ‘The New Face of AIDS’. My work was done in Oxford, and we have done some serious work on this.


(Click on image for a larger version)

Then my group started its SARS study during the SARS outbreak in May 2003. Our group still continues this work. Will SARS re-emerge, or will other new coronavirus-associated diseases emerge, in the future? Well, we don’t know if SARS will be back.


(Click on image for a larger version)

But we know there are 26 species of coronaviruses that have been identified. The SARS coronavirus was the 24th. That means that after the SARS outbreak in 2003 another two human coronaviruses have emerged. But they are not so serious, so not many people know that. One occurred in April 2004.


(Click on image for a larger version)

There was another one in January 2005. So coronaviruses still are dangerous, and we do not know about the future. We hope that something like SARS is the last and will never come back again. But we have to be prepared.


(Click on image for a larger version)

SARS coronaviruses have three subgroups, group I, group II and group III, and actually this is a slight bit of a problem. You see here the species that belong to group I. Group II is MHV and SARS. And then group III is IBV and bird coronavirus. (I will talk about bird flu later.)

Our group mainly concentrates on the SARS coronavirus, but for systematic study we have also been working on a structural study of group I; and in group II, MHV; and in group III, IBV. So our group at the moment is working on the three groups and different protein structures, mainly focusing on the non-structural proteins.


(Click on image for a larger version)

SARS coronavirus has five major open reading frames and they encode 28 proteins, including four structural proteins, 16 non-structural proteins and eight that probably encode accessory proteins. (These proteins are relatively small and people do not really know too much about their function.)


(Click on image for a larger version)

This is the structure that has been solved by my group, since 2003 when we started working on it. At the top left here you see the first structure, the main protease, being solved just during the SARS outbreak. Extensive work has been done on inhibitor screening, and this is also an example I will tell you about later on, regarding natural mixtures screening. It is very exciting. Also at the top you see some other structures.

We have been working with some other groups like the Scripps group, David Stuart’s group and a French group, as well as Steve Harrison, in Harvard. Working together we have solved quite a few protein structures in the SARS coronavirus. But at the moment we still have some other work to do.

We have very good crystals of nsp2 – this is a very important protein interacting with quite a few other non-structural proteins and the host proteins – and nsp4 and nsp6. But of course there are core proteins, and RdRp encodes nsp12. This is one we are working on very hard, but there is not any breakthrough so far.


(Click on image for a larger version)

That was the SARS result, but now I will try to focus on giving you an example of the complexes with nsp7 and nsp8, the supercomplexes. This is a heterodecamer complex of nsp7 and nsp8.


(Click on image for a larger version)

nsp7 actually is small; it is a peptide and has a novel fold. It is a helical bundle of proteins.


(Click on image for a larger version)

Here is nsp8. There are two conformations, as you can see here. nsp8 conformation I is like a golf club – I call this the ‘golf club’ fold. The second one is bent – I call this the ‘bent golf club’ fold. Actually, these are superimposed as independent symmetry in the crystals. You have two regular ‘golf club’ folds and two ‘bent golf club’ folds. And then eight nsp8 and eight nsp7 assemble into a big particle, a supercomplex.


(Click on image for a larger version)

At the bottom left here you see a regular ‘golf club’ fold and to the right of it is an nsp8 complexed with nsp7. (This is like a golf ball.) It is a ‘bent golf club’ fold bound with an nsp7. Actually, the binding relationship is the same with this nsp8. You can see the relationship with nsp7 and nsp8, the interaction being mainly a hydrophobic interaction. That is how they assemble.


(Click on image for a larger version)

These are two regular ‘golf club’ folds and two nsp7.


(Click on image for a larger version)

They form tetramer 1.


(Click on image for a larger version)

This is two ‘bent golf club’ folds with two nsp7.


(Click on image for a larger version)

They form tetramer 2. There are two in tetramer 1 and two in tetramer 2, and they form supercomplexes.


(Click on image for a larger version)

This shows the shape, a little bit like a roll of toilet paper with two ‘handles’. (We have put it here in 3-D.) Then the diameter of this particle is 100 Ångstrom, 10 nanometres. The diameter of the channel is about 30 Å, 3 nm. Now, 3 nm is very meaningful, because 3 nm means double helixes can easily go through. You can see here the coordination of nsp7 and the coordination of nsp8, and that eight nsp8 form a framework. This is like ‘bricks’ and ‘mortar’; it makes this a very solid particle.

So is that just crystal packing? From the solution, as represented by the image at the bottom left here, we tried then negative-staining EM, as represented by the image in the centre, and we found we had different sized particles. We picked some particles, and with a low resolution 3-D reconstruction, such as you see here at the right, we could see that this hexadecamer structure can be easily fitted into the EM reconstruction.


(Click on image for a larger version)

You see here the surface representation, with a positive charge around the channel. So the double helix and RNA should be in very strong interaction. In other places you can see a negative charge.


(Click on image for a larger version)

Another thing we found was that this particle must be doing something related to replication. I have mentioned this regular ‘golf club’ fold, nsp7, and the ‘bent golf club’ fold, nsp8. Recently my group found another condition; we have just a complex. It cannot be assembled into a big particle; it is just a heterodimer, like nsp7 and nsp8. You can see here the ‘bent’ position, Asp83. At the left, broken, is also Asp83. Is this something functionally related? We do not know, and we will have to see what happens.


(Click on image for a larger version)

What happens when the ‘handle’ of the ‘golf club’ fold is broken? At the left here we have regular ‘golf club’ folds, 1, 2, 3, 4 – they just hold the particle together. If it is broken, as it is in three places here at the right, that means they will just be separated. This regular ‘golf club’ fold plays a very crucial role in holding the whole particle assembly together. If it is broken, it just falls apart. This result is something quite new. You see it as falling apart.


(Click on image for a larger version)

We also find out that nsp8 has a C-terminal particle domain, and it is very much homogeneous with one of the human RNA-binding domains. So you can see that if you model this double-stranded DNA, as in the central image here, it looks fine. The binding is very similar.


(Click on image for a larger version)

Based on that, we think there is probably evidence that in the replication procedure we need nsp7 and nsp8 as a big supercomplex, as a hexadecamer. We also need the nsp7_nsp8 heterodimer to play a role like a coronavirus-RNA synthesis. But this is a very speculative and very preliminary result. We are trying to find some more biological evidence for that. Anyway, we give some information there on the structure.


(Click on image for a larger version)

We have solved many structures in the coronavirus, but there is one thing we must emphasise. Most of these structures are new folds – so the coronavirus and most of the overall structures are new folds – and some of these must be very good drug targets.

I would like to take this opportunity to report our new breakthrough on bird flu, but I have to wait for publication because I cannot exactly say the name. But I will tell you this is one of the very important proteins from bird flu.

This structure has been speculated about for decades, and now the structure is solved. It has been sent away and is under review, and we have to do further work. It is like neuraminidase: more than 20 years ago Peter Colman solved it and applied it to drug design to get some commercial outcome. It is the same thing, that neuraminidase and haemagglutinin always have mutations. That is the pattern. But this protein, I would like to tell you, is a very conserved sequence, not only for bird flu but for human, also conserved within the B, perhaps C. This is most conserved sequence in the proteins of almost all flu, so this is one reason why we are excited.


(Click on image for a larger version)

As to the second reason, the structure we solved is like a dragon head with a very big, open mouth. And in the ‘mouth’ there is actually a peptide. This peptide activates other proteins, or other subunits. You can see here one subunit, and another, indicated by the green arrow. And then the results are shown, because this peptide bond is destroyed, and this is a whole complex assembly. Second, they also are inactive and they inhibit the whole activity, so this is actually an antiviral drug. Based on this, we have started extensive drug screening on this.

But there is another path that shows direct interaction with the host cell. People know there is a series of peptides and they bind to the host protein. Now, based on the structure, we have mapped these peptides. We know that this a big peptide interactive region, and I hope I will have another chance, after this paper is published, to tell you the whole story.


(Click on image for a larger version)

I will try to use a little bit of time to talk about our ‘Chinese tea’ – this is what we call the Chinese natural mixtures library. So we have many, many pots of ‘Chinese tea’, and the resources have come from different herbs, different resource materials, different sorts of microbial secondary products and marine microbial secondary metabolism products. So, all sorts of things we just collect to make a library, and we never separate them before we want to.

Why use this novel approach? And what is this novel approach? I will give you some examples, but just for SARS, where there is less competition.

There are a lot of ways to do structure-based drug screening.


(Click on image for a larger version)

Structural biologists always do drug screening from a single compound library. But I said that in my laboratory we are going to do that; we will leave it for the pharmaceutical companies to work on that.


(Click on image for a larger version)

There is also computer-aided virtual drug discovery.


(Click on image for a larger version)

And there is fragment-based and scaffold-based drug screening.


(Click on image for a larger version)

These are not the methods for us. We just concentrate on screening from our ‘pot of tea’, like the compounds listed here. We are always trying to get chemical compounds separated from natural products. This is very important work, and it is a huge amount of work, probably not able to be completed by the next generation or even the generation later. This is a big job, especially for the minor abundant compounds that probably you can never separate out or that are very difficult to purify. Some companies have said, ‘Well, 50 or 90.’ I am not trusting it. Probably if this is an abundant amount they can get it. They have said more than 50, more than 30. I think there are probably some hundreds of these.


(Click on image for a larger version)

This is a profile for one of the natural products.


(Click on image for a larger version)

I am not going to emphasise the importance of natural products for medication.

Why is this approach novel?


(Click on image for a larger version)

I have already mentioned this big library. In this library the source material is single herbs, microorganism secondary products, marine microbial products and also some special TCM (Chinese traditional medicine) – any kinds of things you can collect by organic extraction are okay. For every source sample we have three fractions, three samples to keep. Then we not only have this material, we also have a database with barcodes and all the information is put in.


(Click on image for a larger version)

Then this is the way we collect the samples.


(Click on image for a larger version)

In principle when we have these samples we like the molecular weight to be from 200 to a little bit over 600. We just leave the other things, to make our collections relatively simple. Even this covers most drug candidates and drugs, and these traditional compound hits, but not really the fragments. They are too small.

In mixtures screening we always have the problem of the quenching factor. We have a lot of false positives, and one source is the quenching factor.


(Click on image for a larger version)

So during the assay we try to eliminate the quenching factor.


(Click on image for a larger version)

And then we have more real positive results. You can see here some of the quenching factors – using quenching to eliminate false positives.

From this, then, we use another library, called a protein target library, and cell-based assays. We use this mixtures screening against the cell-based assay. If we have some positives, then we record them and we shift to the target protein. Then if there are some positive results, we shift to crystal screening and soaking.


(Click on image for a larger version)

There is something we can use to identify the density.

We have a PC assistant computer system to help us to have all the information and analysis in our PC system.

(Click on images for larger versions)

It includes the density identification, and the small molecules ‘Dali’.

I will show you our practical applications.


(Click on image for a larger version)

Our group at the moment is focusing mainly on pathogens like HIV, HCV, TB and SARS, and as soon as we have the structure of bird flu solved it is going to be another main target.


(Click on image for a larger version)

Here is a table where you can see our protein targets. Sometimes we are a bit fussy, because we cannot simply pick up anything we are interested in. We have preconditions: if these proteins can be expressed relatively easily, can be crystallised well, can be diffracted well in-house, then we can do some more work in-house. And here you can see some exciting results.

So now I will quickly show you an example of SARS coronavirus nsp5.


(Click on image for a larger version)

This shows the main protease, with domain I, domain II and domain III. The substrate binding position is in between domain I and domain II. Before this we had already made a wide-spectrum inhibitor targeting the coronavirus main protease.


(Click on image for a larger version)

We started relatively easily, because during the SARS outbreak in China we had collected quite a bit of Chinese herbal medicine and identified that there are 317 herbs that can inhibit SARS viral activity. We used this as starter material. Then, in initial screening, 37 per cent of these gave inhibition larger than 50 per cent.

Shifting then to those 118 initial samples, when we took out the quenching factor we had 85 per cent left.

After that we had three samples soaked in the crystal ‘tea’, for binding.


(Click on image for a larger version)

These are the three complexes, and we can see the clear binding, in green. Today I am going to show you these results.


(Click on image for a larger version)

You see here that before soaking there is nothing there. After soaking there is some density, there is covalent binding and density there.


(Click on image for a larger version)

Here in the inhibition assay you can see the control in red, the water-soluble fraction in yellow and the crude extraction in blue. Then in green is the organic extraction. So you can see that in most cases organic extractions are the most useful.


(Click on image for a larger version)

Then we check the binding by the use of MALDI-TOF, and we see what kinds of things have been bound. We can find the binding of small molecules at about 340 to 400 daltons.


(Click on image for a larger version)

Then we use this to do the fraction corrections. In fraction 1 you can find a little bit of density, but the activity is not high.


(Click on image for a larger version)

In the second fraction you can see the density is a little bit bigger but there is no real inhibition there.


(Click on image for a larger version)

In fraction 3 there is big density with very high, strong inhibition. So there are two things that we can find here.


(Click on image for a larger version)

Then in fraction 4 there is big density but there is no real inhibition. This is because of the quenching rate.

(Click on images for larger versions)

Then the density is smaller and inhibition is down.


(Click on image for a larger version)

Then there is almost nothing. This is the quenching factor, which should cause a false positive but there is no density so it doesn’t matter.


(Click on image for a larger version)

After this analysis we can show that there is something we want, mainly just at fraction 3. Probably, for the sake of safety, we can shift to fraction 4, where you can see the quenching factor.


(Click on image for a larger version)

After purification we find this target compound.


(Click on image for a larger version)

Then we do further NMR, ESIMS and CD analysis of the target compound, and we find it is what we want.

And is that true?


(Click on image for a larger version)

We use this in pure molecules, and do further evaluation. We use pure molecules and soak and get exactly the same density, and we can see the binding interaction is fine.


(Click on image for a larger version)

This is just to tell us that the approach we are taking is right.


(Click on image for a larger version)

We can check the inhibition, using this compound, in group I, group II and group III. So this compound can inhibit all three groups of the coronavirus.


(Click on image for a larger version)

This is the cell-based assay.


(Click on image for a larger version)

Finally, I want to say that from the soaking we can always extract a very small percentage of compound, like this. This is the profile here. This is only 0.026 per cent of the total amount of the compound.


(Click on image for a larger version)

This is the key position. There is a cell-based assay and a target-based assay, then soaking and compound identification. Then it goes back through, and the quenching rate is checked. And everything we have gets put into the database.

(Click on images for larger versions)

These are the people who have been doing the work. I must emphasise that the flu work has been in collaboration with my colleagues in Yingfang Liu’s group. We have had a fantastic time and a fantastic result.

 

Discussion

Peter Colman (Chair): I have got one burning question. What are you going to do next? That looks like a very ugly compound to have to do medicinal chemistry with, to improve its binding.

Zihe Rao: Binding is no problem. But the further analysis is why I say this is a ‘hit’. But this is a very crucial stage and further work will have to be carried out.

> Back to program