Leverhulme Centre for Biological Complexity

University of Cambridge










The Programme Grant awarded by the Leverhulme Trust in July 2002 enabled the Leverhulme Centre for Biological Complexity to be established in the University of Cambridge in April 2003, based within the Department of Chemistry. As envisaged in the original proposal, we have focused the activities of the Centre on the study of proteins, which we consider an exceptionally powerful system for understanding fundamental principles of biological complexity. As we discuss below, proteins have the critical advantage over other complex systems of being amenable both to an accurate theoretical description and to detailed experimental testing.


The specific research activities of the Centre include: (1) the experimental characterization of the multiple possible states of proteins, including partially folded conformations, misfolding intermediates, amyloid fibrils and complexes; (2) the computational determination of the ensembles of structures corresponding to these states and of the rates and pathways of interconversion; (3) the investigation of the effects of misfolded proteins on cellular viability and their connection with human disease.


We describe in this report the results obtained during the initial phase of research at the Centre and discuss how we plan to build on the progress that we have made to increase our understanding of the fundamental principles through which proteins interact and are regulated in the cell, and of the origin of their aberrant behaviour that causes a wide range of diseases, including Alzheimer’s and Parkinson’s diseases, and type II diabetes.






The Free Energy Landscape of Proteins


Characterizing the nature of the multitude of unfolded, partially folded and aggregated states of proteins is crucial for understanding the determinants of many aspects of their behaviour, including the stability of the native state, the kinetics of folding and the mechanisms and consequences of misfolding1. The increasing numbers of proteins that are found to be unfolded under native conditions, at least in the absence of their binding partners, are also extremely important in their own right; in the cell they are involved in a wide range of processes including signal transduction, translocation across membranes, transcriptional activation and the regulation of the cell cycle and the processes of growth and differentiation2. It is estimated that up to one third of the proteins in the genomes of higher organisms contain a significant fraction of their sequences as highly disordered regions, making the methods that we have developed to define the heterogeneous ensembles that represent their structures of major importance.


Most approaches used in structural biology, including X-ray crystallography, NMR spectroscopy and electron microscopy, involve three major steps: (I) a suitable technique is chosen and the results of the experimental measurements are interpreted in terms of parameters known to be related to molecular structure; (II) a model is chosen to represent the structure and the energy of the molecule; (III) an optimisation method is defined that minimises violations of the calculated conformations from the experimentally-derived structural information. This approach has proved extremely successful for defining the average structures of the native states of proteins and their complexes. Examples of recent spectacular achievements include the determination of the structure of F1-ATPase3, the protein involved in the generation and utilisation of chemical energy within a cell, the proteasome, the molecular machine that degrades proteins that are targeted for elimination from the cell,4 and the ribosome, the complex of proteins and RNA molecules that is responsible for the synthesis of all proteins in living organisms5; 6. In principle, extensions of these methods will make it possible to determine the structures of progressively larger and more intricate and complex systems, provided that the structures themselves are compact and highly organised7. However, as we mentioned above, a number of important aspects of protein behaviour involve non-native states, or more generally states that are only partially or transiently structured1 (Figure 1).


The general problem posed by the determination of the structures of such species is that they must be described in terms of ensembles of conformations rather than the single conformations that describe the structures of native folded states in a reasonable manner. Sometimes these ensembles are made up of relatively compact and fairly native-like structures, but in other cases the polypeptide chains can have a very high variability in their globular and local structures. In these latter cases, the conformational preferences may not differ greatly from those corresponding to a random coil state in which there are no specific non-covalent interactions between residues8. In addition, the dynamics within stable or metastable non-native states are usually extremely rich and may span timescales from picoseconds to seconds or longer. Furthermore it is often extremely important to characterize the dynamics of transitions between different states, as well as within a particular state, in order to understand complex biological processes such as protein folding or signal transduction. In such cases kinetic measurements can provide structural information about the relevant transition states through which such transformations occur9.


One major goal for theoreticians is to calculate protein structures, and the pathways that connect them, ab initio from the sequence of the polypeptide chain. In approaches of this type only step II in the previous scheme is used, without assuming directly any experimental knowledge. The latter, however, may be used subsequently for the purpose of validating the structures emerging from the calculations. This approach has been used with some success in a number of applications10 but relies on the availability of reliable force fields, as well as of considerable amounts of computer time. The latter issue is of major significance in the case of non-native states where the ensembles may cover vast regions of conformational space.


Experimentalists, in contrast, strive to measure parameters that describe in the required detail the key structural features of a given state of a protein molecule. In the case of native states the procedures are relatively well developed, notably for X-ray diffraction and NMR spectroscopy11. During the first phase of the work of the Leverhulme Centre, a major priority has been to develop novel techniques to define the structures and dynamics of non-native states of proteins12. In particular, new approaches involving NMR spectroscopy have been shown to be very powerful in providing such information at the atomic level, e.g. through the detection of dipolar couplings between atomic nuclei and their relaxation rates13; 14. Experimentalists so far have relied for the most part on using mainly step I in the previous scheme in the context of the structure determination of non-native states. The results are sometimes interpreted simply by relating the characteristics of the partially folded states (and thus implicitly carrying out step III) to the structure of the native state in a qualitative manner (a procedure corresponding to step II). Although this strategy has provided valuable insights into the general organization of non-native states, it becomes difficult to apply, and sometimes even misleading, when the chain topology departs significantly from that of the native state and often lacks any specific structural information and predictive power. As a result, a great deal of insight into the fundamental principles of biology has been inaccessible in molecular terms.


In order to exploit the potential of such experimental data to the full, the first requirement is to develop techniques for the quantitative determination of non-native structures. It is essential in this process to address the issue that there will often be quite large numbers of different conformations that correspond reasonably well with the experimental data that serve as restraints8; 15; 16. A series of questions arises about any approach: Is the structural interpretation of the restraints correct? Is the ensemble under-determined, i.e. is there too little information (in the form of restraints) to define the structure? How well does a calculated ensemble of structures match the true one? Moreover, as experimental techniques give time- and ensemble-averaged measurements, one needs to be able to reconstruct a representative ensemble of conformations knowing only some of its average properties8; 16. In the following sections we shall discuss methods that we have been developing through which solutions to these problems can be achieved.





Restrained Simulations: Applications


General approach

We have been developing an approach that combines computer-based simulations directly with experimental data, by using the latter as restraints to limit the sampling of conformational space to those regions that are consistent with specific experimental measurements. This approach aims at finding all the conformations that minimise a pseudo-energy function composed of two parts. The first part defines the physicochemical properties of the polypeptide chain and its environment, and the second part penalises deviations from the experimentally-derived information15; 17. The simultaneous optimisation of the two components of the pseudo-energy function ensures that the molecular models generated by the simulations have both the general properties of proteins and the specific qualities that are compatible with a given set of experimental measurements. A significant part of the work of the Leverhulme Centre has been so far been directed toward this objective.


Native state fluctuations

Under native conditions, a globular protein exists for most of the time in a relatively narrow ensemble of conformations, centred about a well-defined average structure. Large thermal fluctuations are, however, possible and a wide variety of experimental data, ranging from proteolytic susceptibility to fluorescence quenching measurements, indicates that the protein does occasionally visit substantially more unfolded conformations that can be of great biological significance, for example in defining the susceptibility to degradation or to aggregation18; 19. Such states are, however, difficult to study as they are populated only during very rare and transient events. Hydrogen exchange experiments are uniquely suited to provide residue-specific information about such rare fluctuations9. These experiments monitor, through the use of NMR spectroscopy or mass spectrometry, the replacement of labile hydrogen atoms, notably those bonded to backbone amide nitrogen atoms, by deuterons from D2O molecules in the solvent. The results are usually given in terms of protection factors, which are defined as the ratios of the rates of exchange observed for labile hydrogen atoms in the state of the protein under consideration to the rates of exchange of the same type of hydrogen atoms in a reference state representing a fully unfolded random coil state.


In order to define the transient structures of the protein from which hydrogen exchange takes place we have developed a method that uses protection factors to bias the sampling of conformational space in molecular dynamics simulations20. This approach is based on the assumption that protection factors are proportional to the free energy differences between open (or exchange competent) and closed (or exchange incompetent) conformations9. In the calculations, these free energies can be approximated in terms of the numbers of inter-atomic contacts and hydrogen bonds formed by a given labile hydrogen atom relative to those in the native state; these terms are based on the well established assumption that protection arises either from burial inside the core of a protein or from the involvement of the labile group in hydrogen bonding9. An important technical aspect of this computational approach is that, as protection factors are measured as averages over a large number of molecules in solution (typically 1018), no single molecule can be expected by itself to satisfy a given set of experimental protection factors. Since an exchange event requires a significant disruption of the structure, at least in the proximity of the amide group involved, the simultaneous enforcement of all the protection factors on a single molecule will always require global unfolding. Therefore, we impose the restraints as averages over a sufficient number of replicas of the molecules (typically 20, which we have found is enough to generate convergence of the calculations), so that a single molecule at any given instant may maintain its overall folded conformation, except for one (or a very few) local structural rearrangements.


We have initially applied this approach to determine the structures that give rise to hydrogen exchange from the native state of a-lactalbumin, a protein we have studied in great detail in the context of its folding behaviour20. The ensemble of resulting structures reveals that the large fluctuations that give rise to exchange may be more than five times larger than those measured by crystallographic B-factors, which are parameters that describe the average structural variability of the atoms in a crystal structure, and that some native a-helices may undergo sizeable reorientations. The use of this technique is not, however, limited to native states, and can be applied to any state, stable or metastable, for which some protection from hydrogen exchange is experimentally detectable. We anticipate that particularly interesting applications will involve the determination of the structures of compact denatured (molten globule) states21, of reaction intermediates22 and of amyloid fibrils23 and their precursors24. For all these systems we have either appropriate experimental data already, or are in the process of its collection.


Native states of proteins are also highly dynamic on much shorter timescales than those that give rise to hydrogen exchange, notably those from picoseconds to nanoseconds where fluctuations occur about the average structure. Motions of individual bond vectors on these timescales are accessible through NMR relaxation experiments, which are usually analysed by means of a so-called model-free approach25 where it is assumed that the large scale motions of the molecule take place on longer timescales than the small fluctuations of interest. In these experiments, the amplitude of the motion of a bond vector (usually a backbone amide group or a side-chain methyl group26) is quantified in terms of an NMR order parameter (S2).  We have recently devised a method that uses S2 values as restraints in computer simulations, hence generating a structural ensemble that is consistent with experimental measurements of the protein backbone and side-chain motions27; 28. The use of multiple replicas in the simulations is in this case essential, as order parameters are by definition always equal to unity for a single molecule at any given time. In this computational technique, order parameter restraints act to define the width of the distribution of bond vectors. They are therefore complementary to other restraints, such as NOEs or residual dipolar couplings, where average values are enforced in the simulations and both types of restraints can be incorporated simultaneously.


Application of this approach to describe the native dynamics of the fibronectin type III domain from human tenascin has provided a uniquely detailed description of the relative populations of the rotameric states of the side-chains of the protein and has been validated by the calculation of the average values of a set of three-bond (3J) couplings that were compared with the corresponding experimental values27. The successful prediction of these quantities, which were not used as a bias in the simulations, suggests that the ensemble of conformations determined through the use of S2 order parameters as a bias is a faithful representation of the true native ensemble. We have shown that the simultaneous use of restraints that define average values of molecular properties and their width provides a more accurate determination of the ensemble of structures representing the native states of proteins. We believe that the use of this approach will give new insight into the role of dynamic effects in the packing of the hydrophobic cores of proteins and the behaviour of surfaces that are involved in binding processes, as well as exploring the changes in the dynamics upon mutation or upon ligand binding. In this manner we can not only probe the nature of molecular recognition processes, but estimate changes in entropy associated with such events29. We have recently used this approach to define the native state ensemble of the protein ubiquitin by making use of NOE data from NMR experiments as distance restraints in conjunction with S2 values for both side chain and main chain atoms28 (Figure 2). The resulting ensemble reveals that the true conformational variability in protein molecules is considerably larger than that suggested by previous X-ray or NMR based ensembles. Indeed, it validates the ability of unrestrained MD simulations to generate the right magnitude of the fluctuations within proteins, although the limitations on current force fields do not permit the detailed properties of individual residues or regions of proteins to be defined in the absence of the types of restraints we have introduced. It is likely that this approach will be of very considerable value both to describe the factors that detrmine the stability of proteins and to improve our ability to rationalise and predict protein-protein and protein-small molecule interactions.



Unfolded states

The conformations present in highly unfolded states of proteins may be expected to resemble those present in a random coil ensemble, except for the effects of any residual preferential interactions between side-chain or main-chain atoms. The characterization of this residual structure is important since any weak conformational preferences may, for example, bias the system towards conformations involved in the initial events leading to folding and, in less benign cases, to misfolding. Furthermore, this type of structure is now recognised to be important in its own right, as a sizeable portion, perhaps up to one third, of the proteins encoded in the genome of an organism may be, at least in some regions, intrinsically disordered, i.e. lacking a well-defined three-dimensional structure under physiological conditions unless in complex with their binding partners30. In addition, unfolded states of even those proteins whose native states are close packed globular structures may have considerable significance for membrane transport and degradation mechanisms


NMR spectroscopy can be used, at least in favourable cases, to obtain structural information about such unfolded states, e.g. through NOEs, chemical shifts, or residual dipolar couplings26; 31; 32; 33; 34; 35. Relaxation experiments may also be used to identify regions of reduced mobility and hence of partial ordering; the existence of persistent and partially non-native hydrophobic clusters in the unfolded state of lysozyme has, for example, been demonstrated in this way36. In addition, since dipolar interactions between unpaired electrons and atomic nuclei even 20 Ā apart can be detected experimentally (compared to about 5 Ā for nuclear-nuclear interactions) the use of paramagnetic probes such as stable radicals (spin labels) can provide extremely valuable information about residual structure in such states. We have used this type of experiment to characterize the residual structure in several unfolded states, e.g. of the four-helix bundle protein ACBP under various types of denaturing conditions8; 37 and of a-synuclein38, the natively unfolded protein whose aggregation process is associated with a variety of neurodegenerative diseases, including Parkinson’s disease  (Figure 3).


The studies of ACBP reveal that in the denatured state of this protein induced by guanidinium chloride, there is very weak preferential sampling of a native-like topology but no evidence for a persistent population of native-like structures. Indeed, most structures in the unfolded state of ACBP have characteristics similar to those of a random coil, but there is a small but significant increase in the probability of specific native-like interactions in the regions of the sequences corresponding to the second and the fourth a-helix in the native state. Interestingly, the latter regions have been shown to be important in the nucleation of the folding process. These results suggest that evolution may have resulted in the selection of polypeptide sequences that have a tendency in the unfolded state to form transiently clusters of residues with native-like characteristics; the bias in the stochastic search for interactions involving some residues rather than others would then increase the probability of the formation of the folding nucleus. A similar experimental approach has revealed that a region of the sequence of a-synuclein at the C-terminus of the protein provides a degree of shielding of the most aggregation-prone region of the sequence, hence reducing the propensity of the protein to aggregate38. Such studies promise to provide important insight into the biological control of the aggregation behaviour of proteins.




Transition states for folding

While most proteins fold through a complex process characterised by a multiplicity of intermediates formed on different timescales (multi-exponential kinetics), it has been shown that some small proteins can be observed essentially in only two states, the native and the unfolded ones9. During folding, the free-energy barrier between the unfolded and the folded states must be crossed, but it is extremely difficult, as for all reactions, to obtain any direct experimental information about the transition state for folding because of its transient nature. The major technique currently available for this purpose involves protein engineering9, in which conservative mutations of particular side-chains are carried out and the effects on the kinetics of folding and unfolding are measured. In order to map the transition state, conservative mutations (e.g. the replacement of a methyl group by a hydrogen atom) are most readily interpreted. Experimental results are often given in terms of F values; a F value of unity indicates that the stability of the interactions made by a particular side chain in the transition state is the same as in the native state. A F value of zero, by contrast, indicates that no stabilising interactions are formed in the transition state, and therefore that residue does not play an active role in determining the rate of folding.


In some respect, F values are analogous to NOEs, as they provide structural information related to residue proximity15; 39. An important difference is, however, that fractional F values are difficult to interpret in structural terms. Nevertheless, armed with the assumption that F values can be approximated as the fraction of native interactions formed in the transition state, at least in the cases of conservative mutations, we have shown that it is possible to use F values as restraints in simulations to generate an ensemble of structures that represents the transition state of a protein15; 17; 40; 41; 42; 43; 44. The most important insight that these calculations have given so far is that the topology of the transition state can be established when the F values of just a small number of particular residues are specified. For example, using only the high F values for just three residues as restraints in simulations of AcP, the topology of the native state is present in the large majority of the members of the transition state ensemble15 (Figure 4). Moreover, all the remaining F values can be predicted with high precision from this ensemble. One of the extremely interesting features of this result is that the transition state ensemble can be highly heterogeneous and yet the topology is native-like in essentially all the contributing structures15; 41; 42. We have confirmed this conclusion quantitatively for an SH3 domain by showing that the overwhelming majority of the structures of the transition state ensemble matches the SH3 fold more closely than any of the other native folds in the PDB when a structural alignment procedure is carried out42.


These results imply that a careful F value analysis provides an over-determined set of restraints, as far as the definition of the overall topology of the molecule is concerned, in that a few specific F values are sufficient for its unambiguous definition. This surprising conclusion may be understood in the context of a network analysis of protein structures45. Given the severe restraints that result from the connectivity of the polypeptide chain, the overall fold of a protein is indeed established when the interactions between a few key residues are created. In turn, this result provides a physical description of the mechanism of the nucleation process that is thought to be principally responsible for efficient protein folding. It also provides a structural model of the critical folding nucleus that complements those obtained by techniques such as atomic force microscopy for other nucleation processes, such as the crystallisation of proteins46 and of colloidal systems47.


In terms of the local details of the structure of the transition state ensemble, the restraints obtained through F value analysis will generally represent an under-determined set, as they are likely to provide insufficient information to describe accurately the detailed structure at a local level. The most important question in this respect is whether the calculated ensembles can be validated by using them to make predictions about the results of additional experimental measurements. In the case of the two transition states of barnase, a highly studied protein that folds through a partially structured intermediate, the structural ensemble has been used to predict “double-mutant” F values that are in remarkably good agreement with experimental measurements48. This type of test is particularly stringent, as double-mutant experiments report on the interactions between specific residue pairs, particularly at the surface of the molecule, i.e. far from the relatively well organised folding nucleus, where structural information from the single-mutant F values is mostly missing.


Another important question is whether the structural interpretation that we have given of F values i.e., in terms of the fraction of native contacts, is appropriate. This interpretation is, however, strongly supported by the fact, as we mention above, that we can use a subset of F values as restraints to predict well all the remaining ones15; 17; 40; 41; 42; 43; 44. Since F values are derived from ensemble-averaged experimental measurements it is also important to examine how the use of replicas influences the results of the structure determination procedure. We have shown previously that, when parallel folding pathways are present, use of multiple replicas in simulations restrained by the F values is essential for making meaningful predictions49. However, when folding proceeds through a dominant pathway, i.e. through a single transition state, calculations using a single replica are a highly appropriate method. In most cases, proteins appear to fold through a single dominant pathway, perhaps as a consequence of the evolution of a quality-control mechanism encoded in the sequence that minimises the tendency to misfold49.




The free-energy landscapes of proteins appear to have evolved to minimise whenever possible the presence of long-lived non-native intermediate states, perhaps because these states enhance the possibility of misfolding and aggregation18. Intermediate states must, however, always be populated to some degree; this expectation is based on a view of protein folding as a condensation process, akin to liquid-solid transitions. It is known that this type of processes tends to progress through metastable states. According to this observation, the phase that appears first in the process need not be the most stable thermodynamically, but the one closest in free energy to the initial state50; 51 or, more precisely since nucleation is a dynamic process, the state that is separated by the lowest free energy barrier from the initial state.


The short lifetime, as well as the structural heterogeneity, of many intermediates in protein folding, however, makes it difficult to extract experimental information about their structures. These considerations suggest that the formation of early intermediates is driven by interactions in which extended regions of the sequence of the protein are selected through evolution to have certain properties (e.g. hydrophobicity or secondary structure propensity) in order to encode the native topology. For the completion of the folding process – the crossing of the free energy barrier between the intermediate and the native state – local interactions are probably responsible for the close packing of the side chains. More complex folding processes may involve the successive formation of several intermediates according to similar principles. In addition, large proteins are likely to fold in segments, or domains, with a final step in which such domains are closely packed together.


Structural information about intermediates can be obtained through a variety of experiments including measurement of F values, NOEs, NMR chemical shifts and protection against hydrogen exchange. In principle, all these types of measurements can be used as restraints in computer simulations (Table 1). As one example, we have recently identified and characterised intermediates present at very low population (about 1%) in equilibrium with the native and unfolded states of two mutants of an SH3 domain52; in this case the restraints were based on the measurement of the chemical shift differences between the native state and the intermediate. The latter were obtained by relaxation dispersion NMR experiments that extracted such data from the contribution of chemical exchange to the transverse relaxation rates of individual nuclei.


By repeating these experiments at different temperatures, it was possible to extract not only structural information but also thermodynamic and kinetic parameters characterising the folding process. In these experiments, therefore, contributions to the line widths of cross peaks in 1H-15N correlation spectra can be interpreted in terms of the rates of interconversion between states (kinetics), their chemical shift differences (structure) and their populations (thermodynamics)14; 53. The results of the restrained molecular dynamics simulations reveal that the SH3 intermediates share distinctive topological features with the corresponding transition state for folding (Figure 5). By taking this approach further, this study suggests a general strategy for an experimental determination of the free energy landscape of a protein by applying this methodology to a series of mutants whose intermediates have different degrees of structural similarity to the native state, i.e. represent structural ensembles at different positions along the progress variable for the attainment of the native fold.



“Molten globules”

The native states of some proteins are relatively unstable with respect to rather small perturbations, such as the removal of a substrate or metal ion, the reduction of one or more disulfide bridges, or the lowering of pH. In these cases, the tight native packing of side-chains may be lost so as to allow the protein to convert into a so-called molten globule state, a state nearly as compact as the native state, despite having large structural fluctuations54; 55; 56; 57. The dynamic nature of the molten globule state, however, makes it difficult to extract experimental information about its structure, as such species not only do not crystallise but their NMR spectra are usually severely broadened by chemical exchange effects of the type discussed in the previous section. We have recently shown that it is possible to use NMR 1H-15N HSQC spectra recorded at increasing concentrations of denaturant to monitor the relative stability of native-like interactions in the molten globule state of a-lactalbumin56. These NMR data can then be translated into restraints for simulations, by defining the sets of residues that become unstructured at different stages of the denaturation procedure.  Analysis of the structural ensembles obtained in this manner reveals that, upon addition of denaturant, the molten globule state of a-lactalbumin undergoes a structural transition in which the interface between the two structural domains of the protein becomes disordered, while the two domains themselves retain a significant degree of native-like order16. The use of the restraints derived from experimental measurements at different concentrations of denaturant permits extensive sampling of conformational space to be achieved and, as a result, a coarse-grained experimental free energy landscape could be defined16. The result of this procedure suggests that evolution has created landscapes that are characterised by the presence of deep and extended basins that make them generally robust against perturbations such as addition of denaturant, changes in temperature, mutations or chemical modifications of the molecules16; 58.


The molten globule state of a-lactalbumin represents a particularly interesting case for structure determination approaches using restrained simulations, as a range of different sets of experimental measurements are available54. By repeating the determination of the ensemble of structures representing the molten globule state using different types of experimental data, it should be possible to validate and extend the structural interpretation of the experimental measurements. In addition, combining the various experimental sets of data as restraints in the structure calculation should allow increasingly well-defined ensembles to be determined. Studies of this type are presently in progress.





Protein Misfolding and Aggregation


It has been recently recognized that the phase diagram of proteins contains a new but generally accessible state in which proteins assemble into ordered aggregates, including quasi-linear assemblies known as amyloid fibrils59. The formation of this type of aggregate, and particularly its precursors, can be extremely damaging, and often fatal to cells in which it forms or is in contact, and a variety of cellular mechanisms has evolved to avoid its formation, or to counteract its effects if it does form60; 61. Such protective measures, however, are sometimes inadequate and more than twenty disorders have been so far associated with the presence of amyloid aggregates, including Alzheimer’s disease, late onset diabetes and the spongiform encephalopathies62; 63. One reason for this imperfect quality control is that the factors driving the folding process (e.g. the burial from solvent of hydrophobic residues) are similar to those of the aggregation process and it is therefore difficult to disrupt the latter without interfering with the former64.


We have recently put forward the hypothesis many, if not all, proteins can convert into fibrils that are essentially indistinguishable in their core structure from those formed, at least in vitro, from disease-related peptides and proteins59 and that the amyloid structure is a generic form of assembly that results from the physico-chemical properties of the main chain that is common to all polypeptides19; 65. This idea has enabled us to investigate fundamental aspects of the process of amyloid formation by studying small proteins of the type that have already allowed many of the important principles of the folding (as opposed to misfolding) process of proteins to be rationalised.



Understanding the Principles of Amyloid Formation


Amyloid aggregates form in vitro on a time scale ranging from seconds to days or more; in vivo the process often takes many decades before its effects are seen. It is therefore possible to follow the process using a wide variety of biophysical techniques, without the need for the rapid reaction methods usually required to study protein folding12. The results of such studies indicate clearly that there are common features in the behaviour of different protein systems18; 19; 66. The kinetics of amyloid formation are frequently characterised by two phases - first a lag phase followed by a rapid growth phase. Further, EM images often reveal amorphous or spherical aggregates, followed by thin protofilaments, prior to the appearance of mature fibrils18. These results suggest that the formation of fibrils takes place by a nucleation and growth mechanism, a conclusion supported by the observation that fibril formation can be dramatically enhanced by seeding the solution by addition of pre-formed aggregates18.


We have recently proposed a strategy based on site-directed mutagenesis to probe the role of individual residues in the conversion of soluble proteins into amyloid structures67. This procedure follows closely in concept the protein engineering procedure used in the investigation of the mechanism of folding 9. In this approach mutations are made throughout the sequence of a protein and the relative propensities of the different variants to aggregate are monitored. Application of this approach to AcP has suggested that there are highly specific nucleation sites for aggregation and that these may be distinct from the nucleation sites for folding67. The results suggest further that the intrinsic physico-chemical properties of polypeptide chains can be used to rationalise the influence of specific mutations on aggregation rates, and to make quantitative predictions about the changes in these rates upon mutation64. The surprising simplicity of the rules governing protein aggregation, that we attribute to the essential polymeric behaviour of proteins once they have lost their evolved structure, has led us to extend this approach and to formulate an algorithm for the prediction of the absolute rates of aggregation of peptides or proteins, once the experimental conditions are specified68.


The ability to predict aggregation propensities should be a powerful tool to assist experimental studies of the behaviour of natural polypeptides as well as to establish the principles by the which amino acid sequences of peptides and proteins have been selected through evolution to avoid misfolding and aggregation in vivo, at least under under normal circumstances.  A quantitative understanding of the factors influencing aggregation rates will increase our capability to predict the onset of amyloidoses and other protein deposition diseases, in addition to helping us explore effective therapeutic strategies. It will also help us to design or modify rationally polypeptides and proteins, to enhance their folding and controlled self-association behaviour for biotechnology, pharmaceutical development and structural biology.


Using an extension of this type of analysis we have developed a procedure to determine the intrinsic propensities for aggregation of individual amino acids and polypeptide sequences using properties that can be calculated directly from the amino acid sequence. We are presently using these results to identify the regions of sequences that are most important in promoting amyloid formation. The knowledge of these “sensitive regions” for aggregation will be important in the development of targeted strategies to combat diseases associated with amyloid formation69.



Structural Studies of Aggregates


As Figure 1 illustrates, it is important to characterise not only monomeric states, but also aggregated states, in order to provide a comprehensive description of the conformations accessible to any given protein molecule. Conventional techniques, such as X-ray crystallography and solution NMR spectroscopy, are not ideally suited for obtaining the detailed information needed for structural determination of amyloid fibrils and their precursors. It has been shown recently, however, that modern solid-state NMR techniques can allow interatomic distances within amyloid fibrils to be measured with great accuracy70; 71; 72; 73. These distances can then be used as restraints in computational approaches to structure determination. In this case, simulation techniques of the type that we have discussed earlier in this report for the determination of native states of proteins11 should be very suitable for defining the narrow ensembles of conformations that characterise the rather rigid structures of amyloid fibrils.


Of very great interest in addition to the structure of amyloid fibrils is the determination of the intermediate species that are present along the pathways that lead to their formation. A question of general interest is the extent to which folding and misfolding intermediates resemble each other and to what extent folding and misfolding pathways share common traits. Studies of the aggregation of lysozyme, for example, suggest that the amyloidogenic intermediates are closely similar to species populated during the normal folding process74. In addition, recent studies have identified amino acids that appear to play a crucial role in directing correct folding67; 75. One exciting possibility that we are at present exploring is to use NMR-derived restraints57 of the type described above to bias computer simulations in order to determine an ensemble of conformations representing these amyloidogenic intermediates. Such a strategy could provide an extremely powerful approach to defining the molecular basis of the different types of misfolding diseases.


Once fully developed, these techniques will enable us to investigate the structures of a range of different peptides and proteins assembled into amyloid fibrils. By exploiting the procedures that we have developed to generate highly organised fibrils from a wide variety of molecules, we will explore the similarities and difference between different systems. The PI3-SH3 domain will be the next system that we will study, and we have already incorporated isotope labels into the recombinant protein for this purpose. By using data from our cryo-EM studies of this protein76 we shall have a unique opportunity to combine the results of these two approaches to describe the detailed arrangement of protein molecules within the fibrils. As one technique defines the local molecular conformation and the other the global arrangement of molecules in the fibril, we are in an excellent position to define a high-resolution structure of these fibrils.


In addition to these studies we are also beginning to carry out hydrogen exchange experiments to monitor the protection of labile hydrogen atoms in the fibrillar forms of these proteins. We are using methods developed to solubilise the fibrils23, along with a combination of mass spectrometry and NMR spectroscopy to obtain detailed information both about the extent and mechanism of protection of individual amide hydrogen atoms from solvent. This method does not have the potential to provide structural information with the detail that can be provided by methods such as solid state NMR, but our recently developed ability to use hydrogen exchange information for structural determination20 suggest that this method will be extremely valuable to characterise in particular the more disordered species that are involved in the assembly of the fibrils.





Understanding the Origin of Aggregation Diseases


In the two preceding sections we reviewed the results that we have obtained about the structure and mechanism of protein folding and aggregation. The knowledge gained from such studies can be used to increase our understanding of both the normal and aberrant behaviour of cellular systems and their links with amyloid disease. In the conceptual framework that we are using, we consider the various states accessible to polypeptide chains, together with their possible interconversions1; 18 (Figure 1). The relative populations of the various states can be controlled in vitro by varying solution conditions such as pH and temperature. In vivo several factors act to control and regulate the interconversion between different states; these include molecular chaperones, folding catalysts, quality control mechanisms and degradatory pathways18; 66.  We are now beginning to explore the underlying principles through which such regulation can be achieved in a variety of circumstances.



Effects of Protein Aggregates on Cell Viability


Aggregates associated with at least some amyloid diseases are cytotoxic to cells in culture, even when added to the extracellular medium, an observation that appears to be linked to the known destruction of neurons associated with the onset and progression of neurodegenerative diseases63. Increasing evidence indicates that the species with greatest toxicity are those formed in the early stages of the aggregation process rather than the mature fibrils77. This conclusion is suggested by the results of experiments with species formed during the aggregation process of two proteins having no association with amyloid disease - the PI3-SH3 domain and the HypF domain77; the latter is a structural homologue of AcP, the protein  discussed in some detail above. We have found that the early aggregates of both these proteins are as cytotoxic as the aggregates of species such as the Ab peptide, whilst the mature fibrils, like the soluble precursors, showed no significant toxicity  (Figure 6). The cytotoxic behaviour may result from the improper interaction of the exposed surfaces of misfolded species with cellular components including receptors and membranes77. Strong support for this proposal has been provided recently by the finding that antibodies raised against early aggregates of Ab are able to bind to aggregates of unrelated proteins and to suppress their toxicity78.


These results prompt the hypothesis that there may be a limited number of structural types of aggregates that are generic to polypeptide sequences in analogy with the observation that there is only a limited number of native protein folds79. Moreover, we have recently carried out more detailed studies of the mechanism of cellular toxicity and shown that both apoptotic and necrotic cell death can be associated with the interactions of aggregated species with cells in culture80). We are now extending these experiments to animal models and have already found clear parallels between the effects conclusions from the in vivo systems. Such approaches promise to enable us to link our fundamental studies to the physiological basis of diseases.



Regulation and Suppression of the Pathological Effects of Aggregation


Our increasing understanding of how protein aggregates induce pathological effects is beginning to suggest rational strategies by which such effects can be inhibited or prevented, including some that could potentially lead to effective therapies63; 81. Particularly promising is the development of approaches that can prevent the deposition of proteinaceous aggregates (rather than deal with them once they have formed)62. One such approach is to identify compounds capable of perturbing the aggregation process by means of high throughput techniques able to screen large numbers of compounds in vitro for their efficacy in this context. Another approach is to find ways of boosting the intrinsic mechanisms that protect cells from the deleterious effects of aggregates. We are exploring a related strategy that is based on the use of antibodies that recognise peptides or proteins associated with disease and thereby prevent their conversion into fibrillar aggregates. By carrying out experiments on singe-chain camelid antibodies specific for human lysozyme we have shown that complexation of their single-chain binding domains with amyloidogenic forms of lysozyme inhibits by a factor of at least 100 the rate of the aggregation process that ultimately leads to amyloid fibrils82 (Figure 7).


These initial experiments suggest strategies potentially applicable to a wider range of disease-associated systems including a-synuclein and the Ab peptide. We plan to investigate the effect of both conventional and camelid antibodies against early aggregates of the type we have found to be cytotoxic. As it has been shown that conventional antibodies raised against such aggregates of a given protein bind to similar species generated for other proteins78, we are hopeful that the more stable and soluble binding domains of camelid antibodies raised against such species will have similar properties.  Once produced, such antibodies could also in principle be used to stabilise small aggregates formed by a range of amyloidogenic proteins and facilitate the determination of their structures by conventional techniques, such as NMR or X-ray crystallography. We shall also investigate the effects of the antibodies in suppressing aggregation in vitro, and in reducing cytotoxicity in cell cultures. As conventional antibodies are showing promise as therapeutic agents, particularly for Alzheimer’s disease, we anticipate considerable interest in the possibilities offered by camelid antibodies, both in the development of diagnostic reagents and as the basis for the design of potential therapeutics83.






Very considerable progress has been made in the study of complexity in biological systems in the last two years84; 85; 86. Structural biology, one of our major areas of interest, has been at the forefront of such research. On the one hand, it is becoming possible to resolve at atomic resolution the structures of ever larger macromolecular complexes, such as the ribosome5; 6; indeed early steps have been taken to define the structures of entire cells, initially at low resolution but with the prospect of increasing the level of detail87. On the other hand, it is also becoming clear that biological macromolecules, and the complexes that they form, exist in highly dynamical states that are crucial to their function. An understanding of their behaviour requires the knowledge of all the conformations that are accessible to them, as well as their interactions, as we have discussed above1 (Figure 1).


This type of research requires a strategic and highly interdisciplinary effort, such as the one that we have initiated in Cambridge at the Leverhulme Centre for Biological Complexity. In this report, we have summarised the results the first phase of our research programme aimed at understanding the physical processes that determine the behaviour of proteins, either normal or pathological, and in addition we have outlined the direction of our future efforts.




Researcher scientists supported


The interdisciplinary programme carried out at the Centre has already attracted outstanding young research scientists from disciplines ranging from applied mathematics, to physics, chemistry and medicine. A substantial number of these researchers have obtained their own individual fellowships, a fact that indicates the success of our initiative. In all these cases, the Leverhulme Grant has played a crucial role in allowing us to attract top class young scientists to Cambridge from all over the world, particularly by encouraging them to apply for their own fellowship by underwriting their positions while waiting for the outcome of their applications, or to complete their research projects after the completion of their individual fellowships. We have so far achieved a 100% success rate in this type of individual fellowships, and seven postodoctoral researchers are currently working at the Centre with their own funding. Therefore, these individual fellowships enabled us to effectively double the “added value” to the Leverhulme funding.



The list of the postdoctoral researchers that we have supported directly is:


Stefan Auer (01/04/03 – 30/06/04, now HFSP Long Term Fellow at the Centre)

Filip Meersman (01/04/03 – 30/10/04, now EU Marie Curie Fellow at the Centre)

Jesus Zurdo (01/04/03 – to date)

Robert B. Best (01/04/03 – 28/02/04, now Postdoctoral Fellow at NIH, USA)

Reto Bader (01/01/04 – to date)

Adam Squires (01/02/04 – to date)

Glyn Devlin (01/04/04 – 28/2/05, now CJ Martin Fellow at the Centre)

Sarah Meehan (01/04/04 – 31/1/05, now Postdoctoral Fellow at Adelaide, Australia)

John Christodoulou (01/06/04 – to date)

Peter Varnai (01/07/04 – to date)

Stephen Poon (01/07/04 – to date)

Janet Kumita (01/10/04 – to date)

Silva Giannini (01/12/04 – to date)



The list of other postdoctoral researchers that are working in the Centre is:


Xavier Salvatella (01/01/03 – to date, EU Marie Curie Postdoctoral Fellow)

Joerg Gsponer (01/10/03 – to date, Swiss NSF Postdoctoral Fellow)

Andrea Cavalli (01/09/04 – to date, Swiss NSF Postdoctoral Fellow)

Giorgio Favrin (01/10/04 – to date, EU Marie Curie Postdoctoral Fellow)

Mookyung Cheon (01/05/05 – to date, Korea Research Foundation Postdoctoral Fellow)








Collaborations within Cambridge and elsewhere


Professor Sir Alan R. Fersht FRS, University of Cambridge

Dr. Jane Clarke, University of Cambridge,

Dr. David Klenerman, University of Cambridge

Dr. David J. Wales, University of Cambridge

Professor Athene M. Donald FRS, University of Cambridge

Professor Sheena E. Radford, University of Leeds

Professor Flemming M. Poulsen, University of Copenhagen, Denmark

Professor Harald Schwalbe, University of Frankfurt, Germany

Professor Lewis E. Kay, University of Toronto, Canada

Professor Julie Forman-Kay, University of Toronto, Canada

Professor Robert G. Griffin, MIT, USA

Professor Fabrizio Chiti, University of Florence, Italy

Professor Massimo Stefani University of Florence, Italy

Professor Martin Karplus, Harvard Univeristy, USA



Other funding sources


We have recently obtained a STREP grant from the European Commission (“Understanding Protein Misfolding and Aggregation by NMR” Ř 2,030,000, 01/04/05 – 30/03/07), in conjunction with many of the leading scientists in Europe associated with NMR spectroscopy

Professor Kurt Wuthrich, ETH Zurich, Switzerland.

Professor Harald Schwalbe, University of Frankfurt, Germany.

Professor Robert Kaptein, University of Utrecht, The Netherlands.

Professor Lucia Banci, University of Florence, Italy.

Professor Ineke Braakman, University of Utrecht, The Netherlands.

Professor Astrid Graslund, University of Stockholm, Sweden.

Professor Flemming M. Poulsen, University of Copenhagen, Denmark.

Professor Ago Samoson, University of Tallin, Estonia.

This funding will result in a greatly increased degree of activity in this area, as several of the participants were not previously working actively with us in the field of protein folding and disease. The major objective of this grant is to expand experimental data on the structures and dynamics of proteins in different states, and to use the simulation methodology we have been developing in the Centre {!!} to calculate structural ensembles.






Web Site


In order to disseminate the results of our research we maintain a web site (http://www-leverhulme-bc.ch.cam.ac.uk), where information can be found about the aims of the Centre, as well as its current activities and the research scientists involved.







Native states

NMR order parameters27; 28


Residual dipolar couplings28


Nuclear Overhauser effects28


J couplings27; 28


Hydrogen exchange protection factors20

Transition states

F values15; 17; 40; 41; 42; 43; 44


Double mutant cycles48


b Tanford values15; 17; 40; 41; 42; 43; 44

Intermediate states

NMR chemical shifts52


Hydrogen exchange protection factors


Nuclear Overhauser Effects

Molten globules

Progressive unfolding detection by NMR techniques16


Protection factors

Unfolded states

Spin-labelling NMR techniques8; 37; 38


Nuclear Overhauser effects


NMR chemical shifts


Residual dipolar couplings in NMR

Protein aggregates

Solid-state NMR distance measurements70; 72


NMR chemical shifts


Spin-labelling NMR techniques


Hydrogen exchange protection factors




Table 1


Examplesof experimental techniques that we are using, or planning to use, in order to obtain structural information about native and non-native states of proteins.









Figure 1


After its biosynthesis in the ribosome a protein undergoes a complex series of conformational changes. One of the main goals of our research is to characterise the structure and the dynamics of these states, and to clarify their involvement in amyloid diseases























Figure 2


Molecular dynamics simulations restrained with NMR order parameters were used to determine simultaneously the structure and the dynamics of proteins in their native states. The figure illustrates the case of ubiquitin28, by comparing the X-ray structure (left) with the NMR ensemble (right).












Figure 3


The use of distance information derived from NMR measurements enabled us to characterise the ensemble of structures that a-synuclein, a natively unfolded protein linked with Parkinson’s disease38, populates under physiological conditions. This study revealed that the C-terminal region of the protein (in red) has the tendency to fold back on to the NAC region of the protein (in cyan), a central segment of the sequence with high amyloidogenic propensity, thus revealing an evolutionary mechanism that reduces the tendency of this protein to aggregate.









Figure 4


Comparison between the native structures and the transition state ensembles of three small globular proteins: (a) TNfn3, (b) AcP and (c) a-spectrin SH341. These structures demonstrate that the transition states for folding have native-like topologies despite their high structural heterogeneity.


Figure 5


Ensembles of structures representing the intermediate states for folding of two mutational variants of Fyn SH3: G48M (left) and G48V (right)52. These ensembles were obtained by incorporating structural information derived from NMR chemical shift measurements as restraints in molecular dynamics simulations.











Figure 6

Comparison between the cytotoxic effects of fibrillar (a-b) and granular (c-f) aggregates of the PI3 SH3 domain protein on NIH-3T3 cells77. For increasing concentrations of the fibrillar species, no significant effect is observed on cell viability (as reported by the MMT assay77). On the contrary, raising the concentration of the granular species leads to cell death.





Figure 7

X-ray structure of human lysozyme complexed with the VHH domain of the camel antibody CaB-HuL7. (a) Ribbon representation of the complex; (b) Enlarged view of the binding interface82. Residues forming the epitope are shown in violet, light blue and dark blue for the a-domain, C-helix and b-domain of lysozyme, respectively. Residues constituing the paratope are shown in yellow, pink and red for the CDR1, CDR2 and CDR3 loops of the VHH structure, respectively.





1.         Vendruscolo, M., Zurdo, J., MacPhee, C. E. & Dobson, C. M. (2003). Protein folding and misfolding: a paradigm of self-assembly and regulation in complex biological systems. Philos. Trans. R. Soc. Lond. Ser. A-Math. Phys. Eng. Sci. 361, 1205-1222.

2.         Uversky, V. N. (2003). Protein folding revisited. A polypeptide chain at the folding- misfolding-nonfolding cross-roads: which way to go? Cell. Mol. Life Sci. 60, 1852-1871.

3.         Abrahams, J. P., Leslie, A. G. W., Lutter, R. & Walker, J. E. (1994). Structure at 2.8-Angstrom Resolution of F1-Atpase from Bovine Heart-Mitochondria. Nature 370, 621-628.

4.         Groll, M., Ditzel, L., Lowe, J., Stock, D., Bochtler, M., Bartunik, H. D. & Huber, R. (1997). Structure of 20S proteasome from yeast at 2.4 angstrom resolution. Nature 386, 463-471.

5.         Wimberly, B. T., Brodersen, D. E., Clemons, W. M., Morgan-Warren, R. J., Carter, A. P., Vonrhein, C., Hartsch, T. & Ramakrishnan, V. (2000). Structure of the 30S ribosomal subunit. Nature 407, 327-339.

6.         Ban, N., Nissen, P., Hansen, J., Moore, P. B. & Steitz, T. A. (2000). The complete atomic structure of the large ribosomal subunit at 2.4 angstrom resolution. Science 289, 905-920.

7.         Aloy, P., Bottcher, B., Ceulemans, H., Leutwein, C., Mellwig, C., Fischer, S., Gavin, A. C., Bork, P., Superti-Furga, G., Serrano, L. & Russell, R. B. (2004). Structure-based assembly of protein complexes in yeast. Science 303, 2026-2029.

8.         Lindorff-Larsen, K., Kristjansdottir, S., Teilum, K., Fieber, W., Dobson, C. M., Poulsen, F. M. & Vendruscolo, M. (2004). Determination of an ensemble of structures representing the denatured state of the bovine acyl-coenzyme a binding protein. J. Am. Chem. Soc. 126, 3291-3299.

9.         Fersht, A. R. (1999). Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding, W. H. Freeman.

10.       Fersht, A. R. & Daggett, V. (2002). Protein folding and unfolding at atomic resolution. Cell 108, 573-582.

11.       Brunger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J. S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr. Sect. D-Biol. Crystallogr. 54, 905-921.

12.       Ferguson, N. & Fersht, A. R. (2003). Early events in protein folding. Curr. Opin. Struct. Biol. 13, 75-81.

13.       Kay, L. E. (1998). Protein dynamics from NMR. Nat. Struct. Biol. 5, 513-517.

14.       Akke, M. (2002). NMR methods for characterizing microsecond to millisecond dynamics in recognition and catalysis. Curr. Opin. Struct. Biol. 12, 642-647.

15.       Vendruscolo, M., Paci, E., Dobson, C. M. & Karplus, M. (2001). Three key residues form a critical contact network in a protein folding transition state. Nature 409, 641-645.

16.       Vendruscolo, M., Paci, E., Karplus, M. & Dobson, C. M. (2003). Structures and relative free energies of partially folded states of proteins. Proc. Natl. Acad. Sci. U. S. A. 100, 14817-14821.

17.       Paci, E., Vendruscolo, M., Dobson, C. M. & Karplus, M. (2002). Determination of a transition state at atomic resolution from protein engineering data. J. Mol. Biol. 324, 151-163.

18.       Dobson, C. M. (2003). Protein folding and misfolding. Nature 426, 884-890.

19.       Dobson, C. M. (2004). In the footsteps of alchemists. Science 304, 1259-+.

20.       Vendruscolo, M., Paci, E., Dobson, C. M. & Karplus, M. (2003). Rare fluctuations of native proteins sampled by equilibrium hydrogen exchange. J. Am. Chem. Soc. 125, 15686-15687.

21.       Schulman, B. A., Redfield, C., Peng, Z. Y., Dobson, C. M. & Kim, P. S. (1995). Different Subdomains Are Most Protected From Hydrogen-Exchange In The Molten Globule And Native States Of Human Alpha- Lactalbumin. J. Mol. Biol. 253, 651-657.

22.       Gorski, S. A., Le Duff, C. S., Capaldi, A. P., Kalverda, A. P., Beddard, G. S., Moore, G. R. & Radford, S. E. (2004). Equilibrium hydrogen exchange reveals extensive hydrogen bonded secondary structure in the on-pathway intermediate of Im7. J. Mol. Biol. 337, 183-193.

23.       Hoshino, M., Katou, H., Hagihara, Y., Hasegawa, K., Naiki, H. & Goto, Y. (2002). Mapping the core of the beta(2)-microglobulin amyloid fibril by H/D exchange. Nat. Struct. Biol. 9, 332-336.

24.       Canet, D., Last, A. M., Tito, P., Sunde, M., Spencer, A., Archer, D. B., Redfield, C., Robinson, C. V. & Dobson, C. M. (2002). Local cooperativity in the unfolding of an amyloidogenic variant of human lysozyme. Nat. Struct. Biol. 9, 308-315.

25.       Lipari, G. & Szabo, A. (1982). Model-Free Approach to the Interpretation of Nuclear Magnetic- Resonance Relaxation in Macromolecules.1. Theory and Range of Validity. J. Am. Chem. Soc. 104, 4546-4559.

26.       Choy, W. Y., Shortle, D. & Kay, L. E. (2003). Side chain dynamics in unfolded protein states: an NMR based H- 2 spin relaxation study of Delta 131 Delta. J. Am. Chem. Soc. 125, 1748-1758.

27.       Best, R. B. & Vendruscolo, M. (2004). Determination of protein structures consistent with NMR order parameters. J. Am. Chem. Soc. 126, 8090-8091.

28.       Lindorff-Larsen, K., Best, R. B., DePristo, M. A., Dobson, C. M. & Vendruscolo, M. (2005). Simultaneous determination of protein structure and dynamics. Nature 433, 128-132.

29.       Mittermaier, A. & Kay, L. E. (2004). The response of internal dynamics to hydrophobic core mutations in the SH3 domain from the Fyn tyrosine kinase. Protein Sci. 13, 1088-1099.

30.       Dunker, A. K., Lawson, J. D., Brown, C. J., Williams, R. M., Romero, P., Oh, J. S., Oldfield, C. J., Campen, A. M., Ratliff, C. R., Hipps, K. W., Ausio, J., Nissen, M. S., Reeves, R., Kang, C. H., Kissinger, C. R., Bailey, R. W., Griswold, M. D., Chiu, M., Garner, E. C. & Obradovic, Z. (2001). Intrinsically disordered protein. J. Mol. Graph. 19, 26-59.

31.       Crowhurst, K. A. & Forman-Kay, J. D. (2003). Aromatic and methyl NOES highlight hydrophobic clustering in the unfolded state of an SH3 domain. Biochemistry 42, 8687-8695.

32.       Dyson, H. J. & Wright, P. E. (2002). Insights into the structure and dynamics of unfolded proteins from nuclear magnetic resonance. In Unfolded Proteins, Vol. 62, pp. 311-340.

33.       Choy, W. Y. & Forman-Kay, J. D. (2001). Calculation of ensembles of structures representing the unfolded state of an SH3 domain. J. Mol. Biol. 308, 1011-1032.

34.       Kazmirski, S. L., Wong, K. B., Freund, S. M. V., Tan, Y. J., Fersht, A. R. & Daggett, V. (2001). Protein folding from a highly disordered denatured state: The folding pathway of chymotrypsin inhibitor 2 at atomic resolution. Proc. Natl. Acad. Sci. U. S. A. 98, 4349-4354.

35.       Wong, K. B., Clarke, J., Bond, C. J., Neira, J. L., Freund, S. M. V., Fersht, A. R. & Daggett, V. (2000). Towards a complete description of the structural and dynamic properties of the denatured state of barnase and the role of residual structure in folding. J. Mol. Biol. 296, 1257-1282.

36.       Klein-Seetharaman, J., Oikawa, M., Grimshaw, S. B., Wirmer, J., Duchardt, E., Ueda, T., Imoto, T., Smith, L. J., Dobson, C. M. & Schwalbe, H. (2002). Long-range interactions within a nonnative protein. Science 295, 1719-1722.

37.       Kristjansdottir, S., Lindorff-Larsen, K., Fieber, W., Dobson, C. M., Vendruscolo, M. & Poulsen, F. M. (2005). Formation of native and non-native interactions in ensembles of denatured ACBP molecules from paramagnetic relaxation enhancement studies. J. Mol. Biol. 347, 1053-1062.

38.       Dedmon, M. M., Lindorff-Larsen, K., Christodoulou, J., Vendruscolo, M. & Dobson, C. M. (2005). Mapping long-range interactions in alpha-synuclein using spin- label NMR and ensemble molecular dynamics simulations. J. Am. Chem. Soc. 127, 476-477.

39.       Daggett, V., Li, A. J., Itzhaki, L. S., Otzen, D. E. & Fersht, A. R. (1996). Structure of the transition state for folding of a protein derived from experiment and simulation. J. Mol. Biol. 257, 430-440.

40.       Lindorff-Larsen, K., Paci, E., Serrano, L., Dobson, C. M. & Vendruscolo, M. (2003). Calculation of mutational free energy changes in transition states for protein folding. Biophys. J. 85, 1207-1214.

41.       Lindorff-Larsen, K., Rogen, P., Paci, E., Vendruscolo, M. & Dobson, C. M. (2005). Protein folding and the organization of the protein topology universe. Trends Biochem.Sci. 30, 13-19.

42.       Lindorff-Larsen, K., Vendruscolo, M., Paci, E. & Dobson, C. M. (2004). Transition states for protein folding have native topologies despite high structural variability. Nat. Struct. Mol. Biol. 11, 443-449.

43.       Paci, E., Clarke, J., Steward, A., Vendruscolo, M. & Karplus, M. (2003). Self-consistent determination of the transition state for protein folding: Application to a fibronectin type III domain. Proc. Natl. Acad. Sci. U. S. A. 100, 394-399.

44.       Paci, E., Friel, C. T., Lindorff-Larsen, K., Radford, S. E., Karplus, M. & Vendruscolo, M. (2004). Comparison of the transition state ensembles for folding of Im7 and Im9 determined using all-atom molecular dynamics simulations with phi value restraints. Proteins 54, 513-525.

45.       Vendruscolo, M., Dokholyan, N. V., Paci, E. & Karplus, M. (2002). Small-world view of the amino acids that play a key role in protein folding. Phys. Rev. E 65, art. no.-061910.

46.       Yau, S. T. & Vekilov, P. G. (2000). Quasi-planar nucleus structure in apoferritin crystallization. Nature 406, 494-497.

47.       Russel, W. B. (2003). Condensed-matter physics: Tunable colloidal crystals. Nature 421, 490-491.

48.       Salvatella, X., Dobson, C. M., Fersh, A. R. & Vendruscolo, M. (2004). Determination of the successive transition states on the folding of barnase: Validation by double-mutant cycle experiments. Submitted 342.

49.       Davis, R., Dobson, C. M. & Vendruscolo, M. (2002). Determination of the structures of distinct transition state ensembles for a beta-sheet peptide with parallel folding pathways. J. Chem. Phys. 117, 9510-9517.

50.       Ostwald, W. (1897). Studien uber die Bildung und Umwandlung fester Korper. Z. Phys. Chem. 22, 289-302.

51.       Auer, S. & Frenkel, D. (2001). Suppression of crystal nucleation in polydisperse colloids due to increase of the surface free energy. Nature 413, 711-713.

52.       Korzhnev, D. M., Salvatella, X., Vendruscolo, M., Di Nardo, A. A., Davidson, A. R., Dobson, C. M. & Kay, L. E. (2004). Low-populated folding intermediates of Fyn SH3 characterized by relaxation dispersion NMR. Nature 430, 586-590.

53.       Mulder, F. A. A., Mittermaier, A., Hon, B., Dahlquist, F. W. & Kay, L. E. (2001). Studying excited states of proteins by NMR spectroscopy. Nat. Struct. Biol. 8, 932-935.

54.       Kuwajima, K. (1996). The molten globule state of alpha-lactalbumin. Faseb J. 10, 102-109.

55.       Pande, V. S. & Rokhsar, D. S. (1998). Is the molten globule a third phase of proteins? Proc. Natl. Acad. Sci. U. S. A. 95, 1490-1494.

56.       Schulman, B. A., Kim, P. S., Dobson, C. M. & Redfield, C. (1997). A residue-specific NMR view of the non-cooperative unfolding of a molten globule. Nat. Struct. Biol. 4, 630-634.

57.       McParland, V. J., Kalverda, A. P., Homans, S. W. & Radford, S. E. (2002). Structural properties of an amyloid precursor of beta(2)- microglobulin. Nat. Struct. Biol. 9, 326-331.

58.       Vendruscolo, M. & Paci, E. (2003). Protein folding: bringing theory and experiment closer together. Curr. Opin. Struct. Biol. 13, 82-87.

59.       Dobson, C. M. (1999). Protein misfolding, evolution and disease. Trends Biochem.Sci. 24, 329-332.

60.       Cohen, F. E. & Kelly, J. W. (2003). Therapeutic approaches to protein-misfolding diseases. Nature 426, 905-909.

61.       Selkoe, D. J. (2003). Folding proteins in fatal ways. Nature 426, 900-904.

62.       Soto, C. (2003). Unfolding the role of protein misfolding in neurodegenerative diseases. Nat. Rev. Neurosci. 4, 49-60.

63.       Stefani, M. & Dobson, C. M. (2003). Protein aggregation and aggregate toxicity: new insights into protein folding, misfolding diseases and biological evolution. J Mol Med 81, 678-699.

64.       Chiti, F., Stefani, M., Taddei, N., Ramponi, G. & Dobson, C. M. (2003). Rationalization of the effects of mutations on peptide and protein aggregation rates. Nature 424, 805-808.

65.       Dobson, C. M. (2004). Principles of protein folding, misfolding and aggregation. Semin. Cell Dev. Biol. 15, 3-16.

66.       Dobson, C. M. (2003). Protein folding and disease: a view from the first Horizon Symposium. Nat. Rev. Drug Discov. 2, 154-160.

67.       Chiti, F., Taddei, N., Baroni, F., Capanni, C., Stefani, M., Ramponi, G. & Dobson, C. M. (2002). Kinetic partitioning of protein folding and aggregation. Nat. Struct. Biol. 9, 137-143.

68.       Dubay, K. F., Pawar, A. P., Chiti, F., Zurdo, J., Dobson, C. M. & Vendruscolo, M. (2004). Prediction of the absolute aggregation rates of amyloidogenic polypeptide chains. J. Mol. Biol. 341, 1317-1326.

69.       Pawar, A. P., Dubay, K. F., Zurdo, J., Chiti, F., Vendruscolo, M. & Dobson, C. M. (2005). Prediction of "aggregation-prone" and "aggregation-susceptible" regions in proteins associated with neurodegenerative diseases. J. Mol. Biol. 000, 0000.

70.       Jaroniec, C. P., MacPhee, C. E., Astrof, N. S., Dobson, C. M. & Griffin, R. G. (2002). Molecular conformation of a peptide fragment of transthyretin in an amyloid fibril. Proc. Natl. Acad. Sci. U. S. A. 99, 16748-16753.

71.       Petkova, A. T., Ishii, Y., Balbach, J. J., Antzutkin, O. N., Leapman, R. D., Delaglio, F. & Tycko, R. (2002). A structural model for Alzheimer's beta-amyloid fibrils based on experimental constraints from solid state NMR. Proceedings of the National Academy of Sciences of the United States of America 99, 16742-16747.

72.       Jaroniec, C. P., MacPhee, C. E., Bajaj, V. S., McMahon, M. T., Dobson, C. M. & Griffin, R. G. (2004). High-resolution molecular structure of a peptide in an amyloid fibril determined by magic angle spinning NMR spectroscopy. Proc. Natl. Acad. Sci. U. S. A. 101, 711-716.

73.       Tycko, R. (2004). Progress towards a molecular-level structural understanding of amyloid fibrils. Curr. Opin. Struct. Biol. 14, 96-103.

74.       Dobson, C. M. (2001). The structural basis of protein folding and its links with human disease. Philos. Trans. R. Soc. Lond. Ser. B-Biol. Sci. 356, 133-145.

75.       Otzen, D. E. & Oliveberg, M. (1999). Salt-induced detour through compact regions of the protein folding landscape. Proc. Natl. Acad. Sci. U. S. A. 96, 11746-11751.

76.       Jimenez, J. L., Guijarro, J. L., Orlova, E., Zurdo, J., Dobson, C. M., Sunde, M. & Saibil, H. R. (1999). Cryo-electron microscopy structure of an SH3 amyloid fibril and model of the molecular packing. Embo J. 18, 815-821.

77.       Bucciantini, M., Giannoni, E., Chiti, F., Baroni, F., Formigli, L., Zurdo, J. S., Taddei, N., Ramponi, G., Dobson, C. M. & Stefani, M. (2002). Inherent toxicity of aggregates implies a common mechanism for protein misfolding diseases. Nature 416, 507-511.

78.       Kayed, R., Head, E., Thompson, J. L., McIntire, T. M., Milton, S. C., Cotman, C. W. & Glabe, C. G. (2003). Common structure of soluble amyloid oligomers implies common mechanism of pathogenesis. Science 300, 486-489.

79.       Chothia, C. (1992). Proteins - 1000 Families for the Molecular Biologist. Nature 357, 543-544.

80.       Dedmon, M. M., Christodoulou, J., Wilson, M. R. & Dobson, C. M. (2005). Heat shock protein 70 inhibits alpha-synuclein fibril formation via preferential binding to prefibrillar species. J. Biol. Chem. 280, 14733 - 14740.

81.       Hammarstrom, P., Wiseman, R. L., Powers, E. T. & Kelly, J. W. (2003). Prevention of transthyretin amyloid disease by changing protein misfolding energetics. Science 299, 713-716.

82.       Dumoulin, M., Last, A. M., Desmyter, A., Decanniere, K., Canet, D., Larsson, G. R., Spencer, A., Archer, D. B., Sasse, J., Muyldermans, S., Wyns, L., Redfield, C., Matagne, A., Robinson, C. V. & Dobson, C. M. (2003). A camelid antibody fragment inhibits the formation of amyloid fibrils by human lysozyme. Nature 424, 783-788.

83.       Dumoulin, M. & Dobson, C. M. (2004). Probing the origins, diagnosis and treatment of amyloid diseases using antibodies. Biochimie 86, 589-600.

84.       Alon, U. (2003). Biological networks: The tinkerer as an engineer. Science 301, 1866-1867.

85.       Oltvai, Z. N. & Barabasi, A. L. (2002). Life's complexity pyramid. Science 298, 763-764.

86.       Hartwell, L. H., Hopfield, J. J., Leibler, S. & Murray, A. W. (1999). From molecular to modular cell biology. Nature 402, C47-C52.

87.       Medalia, O., Weber, I., Frangakis, A. S., Nicastro, D., Gerisch, G. & Baumeister, W. (2002). Macromolecular architecture in eukaryotic cells visualized by cryoelectron tomography. Science 298, 1209-1213.