Thursday, January 27, 2011

Excel


Microsoft Excel is a spreadsheet program which allows one to enter numerical values or data into the rows or columns of a spreadsheet, and to use these numerical entries for such things as calculations, graphs, and statistical analysis.

Microsoft Excel has the basic features of all spreadsheets, using a grid of cells arranged in numbered rows and letter-named columns to organize data manipulations like arithmetic operations. It has a battery of supplied functions to answer statistical, engineering and financial needs. In addition, it can display data as line graphs, histograms and charts, and with a very limited three-dimensional graphical display. It allows sectioning of data to view its dependencies on various factors from different perspectives (using pivot tables and the scenario manager). And it has a programming aspect, Visual Basic for Applications, allowing the user to employ a wide variety of numerical methods, for example, for solving differential equations of mathematical physics, and then reporting the results back to the spreadsheet. Finally, it has a variety of interactive features allowing user interfaces that can completely hide the spreadsheet from the user, so the spreadsheet presents itself as a so-called application, or decision support system (DSS), via a custom-designed user interface, for example, a stock analyzer, or in general, as a design tool that asks the user questions and provides answers and reports. In a more elaborate realization, an Excel application can automatically poll external databases and measuring instruments using an update schedule, analyze the results, make a Word report or Power Point slide show, and e-mail these presentations on a regular basis to a list of participants.

Microsoft allows for a number of optional command-line switches to control the manner in which Excel starts.

 Linear Regression

Linear regression analyzes the relationship between two variables, X and Y. For each subject (or experimental unit), you know both X and Y and you want to find the best straight line through the data. In some situations, the slope and/or intercept have a scientific meaning. In other cases, you use the linear regression line as a standard curve to find new values of X from Y, or Y from X.
The term "regression", like many statistical terms, is used in statistics quite differently than it is used in other contexts. The method was first used to examine the relationship between the heights of fathers and sons. The two were related, of course, but the slope is less than 1.0. A tall father tended to have sons shorter than himself; a short father tended to have sons taller than himself. The height of sons regressed to the mean. The term "regression" is now used for many sorts of curve fitting.
Prism determines and graphs the best-fit linear regression line, optionally including a 95% confidence interval or 95% prediction interval bands. You may also force the line through a particular point (usually the origin), calculate residuals, calculate a runs test, or compare the slopes and intercepts of two or more regression lines.
In general, the goal of linear regression is to find the line that best predicts Y from X. Linear regression does this by finding the line that minimizes the sum of the squares of the vertical distances of the points from the line.
Note that linear regression does not test whether your data are linear (except via the runs test). It assumes that your data are linear, and finds the slope and intercept that make a straight line best fit your data.





Quadratic Regression

Quadratic Regression is a process by which the equation of a parabola is found that “best fits” a given set of data. Quadratic regression models are often used in economics areas such as utility function , forecasting, cost-befit analysis, etc.

It is necessary to construct the scatter-diagram for the data before a quadratic regression can be constructed from the graph created.

The goal of regression analysis is to model the expected value of a dependent variable y in terms of the value of an independent variable (or vector of independent variables) x. In simple linear regression, the model
y = a_0 + a_1 x + \varepsilon, \,
is used, where ε is an unobserved random error with mean zero conditioned on a scalar variable x. In this model, for each unit increase in the value of x, the conditional expectation of y increases by a1 units.
In many settings, such a linear relationship may not hold. For example, if we are modeling the yield of a chemical synthesis in terms of the temperature at which the synthesis takes place, we may find that the yield improves by increasing amounts for each unit increase in temperature. In this case, we might propose a quadratic model of the form
y = a_0 + a_1x + a_2x^2 + \varepsilon. \,
In this model, when the temperature is increased from x to x + 1 units, the expected yield changes by a1 + a2 + 2a2x. The fact that the change in yield depends on x is what makes the relationship nonlinear (this must not be confused with saying that this is nonlinear regression; on the contrary, this is still a case of linear regression).



For more information, you can go to these websites:

Tuesday, January 11, 2011

SMILES

Its not just an ordinary smile.
It is SMILESTM

SMILESTM or Simplified Molecular Input Line Entry System is a simple yet comprehensive chemical language in which molecules and reactions can be specified using ASCII characters representing atom and bond symbols. SMILESTM contains the same information as is found in an extended connection table but with several advantages. A SMILESTM string is human understandable, very compact, and if canonicalized represents a unique string that can be used as a universal identifier for a specific chemical structure. In addition, a chemically correct and comprehensible depiction can be made from any SMILESTM string symbolizing either a molecule or reaction.

SMILESTM development was initiated by David Weininger in the late 1980s using the concept of a graph with nodes as atoms and edges as bonds to represent a molecule. Parentheses are used to indicate branching points and numeric labels designate ring connection points. The basic SMILESTM grammar also includes as well as isotopic information, configuration about double bonds, and chirality leading to what is known as isomeric SMILESTM.

Some simple SMILESTM examples:
Images SMILESTM Notations
C=CC\C=C\O
CC(C)CC(=O)O
CCN(CC)CC
CCCC(C#C)C(CCC)C(C)C
C1CCCCC1
CC1=CC(Br)CCC1
C1CN(CCC1)C2CCCCO2
c1ccco1
Oc1ccncn1
ON1CCCCC1
O[n+]1ccccc1
c1ccccn1
Oc1ccccn1
Cn1cccc1
C[C@H]=C\C=C\F
c1cccn1
C\C=C/C=C/F
Oc1ccccn1

Here is the links that you should visit:
DAYLIGHT (Cheminformatics)
DAYLIGHT Theory : SMILES

Tuesday, January 4, 2011

Protein Data Bank (PDB)

What Protein Data Bank Really Is?

The Protein Data Bank (PDB) is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids. (See also crystallographic database). The data, typically obtained by X-ray crystallography or NMR spectroscopy and submitted by biologists and biochemists from around the world, are freely accessible on the Internet via the websites of its member organisations (PDBe, PDBj, and RCSB). The PDB is overseen by an organization called the Worldwide Protein Data Bank, wwPDB.


The PDB is a key resource in areas of structural biology, such as structural genomics. Most major scientific journals, and some funding agencies, such as the NIH in the USA, now require scientists to submit their structure data to the PDB. If the contents of the PDB are thought of as primary data, then there are hundreds of derived (i.e., secondary) databases that categorize the data differently. For example, both SCOP and CATH categorize structures according to type of structure and assumed evolutionary relations; GO categorize structures based on genes. Furthermore, Brookhaven Protein Data Bank (PDB) is a database of protein and nucleic acid structures. It has the information for each molecular structure including atomic coordinates, atomic connectivity, and references.

By using RasWin software (RasMol Version 2.7.4.2), I will show three of the amazing structures of proteins that have been stored in the RCSB Protein Data Bank (PDB) website.

1. Subtilisin (3LPA)
Image: Display-Strands & Colour-Temperature

Experiment Methods: X-RAY DIFFRACTION with resolution of 2.00 Å
Compounds Involved: 1 Polymer and 1 Ligand
Authors: Porter, C.J., Wong, W., Whisstock, J.C., Rood, J.I., Kennan,R.M.
Classification: Hydrolase
Caption: The Subtilisin-Like Protease AprV2 Is Required for Virulence and Uses a Novel Disulphide-Tethered Exosite to Bind Substrates
PubMed Abstract: Many bacterial pathogens produce extracellular proteases that degrade the extracellular matrix of the host and therefore are involved in disease pathogenesis. Dichelobacter nodosus is the causative agent of ovine footrot, a highly contagious disease that is characterized by the separation of the hoof from the underlying tissue. D. nodosus secretes three subtilisin-like proteases whose analysis forms the basis of diagnostic tests that differentiate between virulent and benign strains and have been postulated to play a role in virulence. We have constructed protease mutants of D. nodosus; their analysis in a sheep virulence model revealed that one of these enzymes, AprV2, was required for virulence. These studies challenge the previous hypothesis that the elastase activity of AprV2 is important for disease progression, since aprV2 mutants were virulent when complemented with aprB2, which encodes a variant that has impaired elastase activity. We have determined the crystal structures of both AprV2 and AprB2 and characterized the biological activity of these enzymes. These data reveal that an unusual extended disulphide-tethered loop functions as an exosite, mediating effective enzyme-substrate interactions. The disulphide bond and Tyr92, which was located at the exposed end of the loop, were functionally important. Bioinformatic analyses suggested that other pathogenic bacteria may have proteases that utilize a similar mechanism. In conclusion, we have used an integrated multidisciplinary combination of bacterial genetics, whole animal virulence trials in the original host, biochemical studies, and comprehensive analysis of crystal structures to provide the first definitive evidence that the extracellular secreted proteases produced by D. nodosus are required for virulence and to elucidate the molecular mechanism by which these proteases bind to their natural substrates. We postulate that this exosite mechanism may be used by proteases produced by other bacterial pathogens of both humans and animals.
 
2. Prolyl Aminopeptidase (1X2B)
Image: Display-Backbone & Colour-Group

Experiment Methods: X-RAY DIFFRACTION with resolution of 2.40 Å
Compounds Involved: 1 Polymer and 1 Ligand
Authors: Nakajima, Y., Ito, K., Sakata, M., Xu, Y., Matsubara, F., Hatakeyama, S., Yoshimoto, T.
Classification: Hydrolase
Caption: Unusual extra space at the active site and high activity for acetylated hydroxyproline of prolyl aminopeptidase from Serratia marcescens
PubMed Abstract: The prolyl aminopeptidase complexes of Ala-TBODA [2-alanyl-5-tert-butyl-(1, 3, 4)-oxadiazole] and Sar-TBODA [2-sarcosyl-5-tert-butyl-(1, 3, 4)-oxadiazole] were analyzed by X-ray crystallography at 2.4 angstroms resolution. Frames of alanine and sarcosine residues were well superimposed on each other in the pyrrolidine ring of proline residue, suggesting that Ala and Sar are recognized as parts of this ring of proline residue by the presence of a hydrophobic proline pocket at the active site. Interestingly, there was an unusual extra space at the bottom of the hydrophobic pocket where proline residue is fixed in the prolyl aminopeptidase. Moreover, 4-acetyloxyproline-betaNA (4-acetyloxyproline beta-naphthylamide) was a better substrate than Pro-betaNA. Computer docking simulation well supports the idea that the 4-acetyloxyl group of the substrate fitted into that space. Alanine scanning mutagenesis of Phe139, Tyr149, Tyr150, Phe236, and Cys271, consisting of the hydrophobic pocket, revealed that all of these five residues are involved significantly in the formation of the hydrophobic proline pocket for the substrate. Tyr149 and Cys271 may be important for the extra space and may orient the acetyl derivative of hydroxyproline to a preferable position for hydrolysis. These findings imply that the efficient degradation of collagen fragment may be achieved through an acetylation process by the bacteria.

3. LexA repressor (1LEB)
 Image: Display-Molecular Surface & Colour-Group

Experiment Methods: SOLUTION NMR
Compounds Involved: 1 Polymer
Authors: Fogh, R.H., Ottleben, G., Rueterjans, H., Schnarr, M., Boelens, R., Kaptein, R.
Classification: Transciption Regulation
Caption: Solution structure of the LexA repressor DNA binding domain determined by 1H NMR spectroscopy
 PubMed Abstract: The structure of the 84 residue DNA binding domain of the Escherichia coli LexA repressor has been determined from NMR data using distance geometry and restrained molecular dynamics. The assignment of the 1H NMR spectrum of the molecule, derived from 2- and 3-D homonuclear experiments, is also reported. A total of 613 non-redundant distance restraints were used to give a final family of 28 structures. The structured region of the molecule consisted of residues 4-69 and yielded a r.m.s. deviation from an average of 0.9 A for backbone and 1.6 A for all heavy atoms. The structure contains three regular alpha-helices at residues 6-21 (I), 28-35 (II) and 41-52 (III), and an antiparallel beta-sheet at residues 56-58 and 66-68. Helices II and III form a variant helix-turn-helix DNA binding motif, with an unusual one residue insert at residue 38. The topology of the LexA DNA binding domain is found to be the same as for the DNA binding domains of the catabolic activator protein, human histone 5, the HNF-3/fork head protein and the Kluyveromyces lactis heat shock transcription factor.


You should click on these links to have a better understanding of the Protein Data Bank:
 Wikipedia (Protein Data Bank)
Wikipedia (Worldwide Protein Data Bank)
RCSB Protein Data Bank
Protein Data Bank Japan (PDBj)
Protein Data Bank Europe (PDBe)
Biological Magnetic Resonance Data Bank (BMRB)

Tuesday, December 28, 2010

ChemSketch

ACD/ChemSketch Freeware is a drawing package that allows you to draw chemical structures including organics, organometallics, polymers, and Markush structures. It also includes features such as calculation of molecular properties (e.g., molecular weight, density, molar refractivity etc.), 2D and 3D structure cleaning and viewing, functionality for naming structures (fewer than 50 atoms and 3 rings), and prediction of logP. The freeware version of ChemSketch does not include all of the functionality of the commercial version.

The difference between ACD/ChemSketch and ACD/ChemSketch Freeware:
Option Freeware Commercial
ACD/Dictionary No Yes
Search for Structure in Different Computer File No Yes
Restricted Version of ACD/ChemFolder (SDfile
Viewer)
No Yes
Export to Adobe® PDF Yes Yes
ACD/Tautomers Yes Yes
ACD/Name Freeware Yes Yes
ACD/3D Viewer Yes Yes
ACD/I-Lab add-on for ACD/Labs Online Yes
[must be
installed
separately]
Yes
Instructions for Authors Yes
[must be
installed
separately]
Yes
Advanced drawing tools (polymers, Markush,
reactions, and other features made available
in versions 6–10)
Yes Yes
ACD/Labs Extension for ChemDraw No Yes
ACD/ChemCoder 2D barcode capability No Yes
Technical Support No Yes
Source: http://www2.acdlabs.com/download/chemsketch/chemsk_tech.html

Here are the preview of the software:


For more information, you can click on the links provided below:
ACD/Labs.com
ACD/Labs.com (Freeware)
The Chemical Database Service
Chemsketch - An Introductory Guide

Tuesday, December 21, 2010

HyperText Markup Language (HTML)

Hi, there. Welcome to my new blog. This week assignment is HTML.Hope you enjoy....


HTML is a computer language devised to allow website creation. These websites can then be viewed by anyone else connected to the Internet. It is relatively easy to learn, with the basics being accessible to most people in one sitting; and quite powerful in what it allows you to create. It is constantly undergoing revision and evolution to meet the demands and requirements of the growing Internet audience under the direction of the W3C, the organisation charged with designing and maintaining the language.
The definition of HTML is HyperText Markup Language.
  • HyperText is the method by which you move around on the web — by clicking on special text called hyperlinks which bring you to the next page. The fact that it is hyper just means it is not linear — i.e. you can go to any place on the Internet whenever you want by clicking on links — there is no set order to do things in.
  • Markup is what HTML tags do to the text inside them. They mark it as a certain type of text (italicised text, for example).
  • HTML is a Language, as it has code-words and syntax like any other language.

There are many websites out there who can show you how to use HTML. One of them in which my instructor recommended is W3School.Here is the links http://www.w3schools.com/. You can even practice yourselves through Tryit Editor provided by the website.You will be amazed once you try it.
HTML.svg
Example of HTML

One important aspects in HTML is Colour Codes.Here I have attached pictures showing the colour with its code in HTML.You can click on the images to enlarge.


There are 16 of basic colour codes for an HTML Attribute:

Aqua #00FFFF
Black #000000
Blue #0000FF
Fuchsia #FF00FF
Gray #808080
Green #008000
Lime #00FF00
Maroon #800000
Navy #000080
Olive #808000
Purple #800080
Red #FF0000
Silver #C0C0C0
Teal #008080
White #FFFFFF
Yellow #FFFF00

A few sources that you can check: