Sunday, June 24, 2007

Illumina data analysis

I am working on the first dataset from Charlie to pilot the analysis of Illumina data analysis.

Lumi is a package to use. Initial problems I had:

1. Different version of BeadStudio have different format. Charlie is using BeadStudio 3.0.14. One interesting thing is current version give STD_ERR. Early version gave STD_DEV.
2. Annotation package should use lumiMouseV1. This is for mouse chip v1.1
3. The output file directly generated by BeadStudio did not put \t for rows that do not have data at the end of the row. So each row have different number of columns. For example, some has 50 columns (including DEFINATION, SYNONYM), but others that do not have those data at the last several columns may only have 48 columns or less. The way to solve this problem is to modify read.table like the following:
read.table("pmt1_all.txt", sep = "\t", fill = TRUE, header = TRUE, skip =8)

4. The pmt1_all.txt is too big. When I read into my laptop with 1.5G memory. It can only read in 3817 lines. I have to trim most of the columns to read in all the data.



R Script:

1. Generate a summary table

for(i in 1:542) {
tid = upTargetIDAB[[i]]
ABresults[i,1] = pmt1[which(pmt1$TargetID == tid),2]
}

Wednesday, April 4, 2007

Plone as a website template

I am thinking about to use Plone to do the website management. For UC Genome Website, it might be a better choice. There will be some non-programming people to edit the content of the Website.

One advantage Plone over Wiki is that it does not require to enter any HTML or wiki code.

Need more thinking on that

Monday, March 26, 2007

Fink is not as easy as it sounds like

I did not realize that in MacOS there are still lot of things that you can not do easily.

Such as fink. If you can not find fink in a stable list ( I am trying to do it for unstable list), it is not trival to install a packages. Because a lot of time, a package may have dependencies. It is tedious to install them manually one by one.

I am trying to find a way to look for unstable packages (unstable does not means bad, it refers sometimes to "not enough tested" in fink.

First attempt to find "fink list" only has mysql4, but after fink selfupdate and select rsync, it did update the list to use mysql5. But I still can not install biopython, which is in unstable package list.

Things needs to be careful:

1. After default fink install even from the latest version, it still needs to run
fink selfupdate, so the package can udpate by itself.

Sunday, March 25, 2007

Bioinformatics Infrastructure

1. LIMS
I have already listed in a post before

2. Web Interface for customize Program release
We need a web system to allow biologist to access the program we develop through a Web interface. This is to avoid programer to install on different platform. The advanges is that the program can be used easily by other people.
Most of the programs in this category will need input file from users and get result back.
Things need to be done:
a. Web interface
b. Backend data processing
c. Privacy protection.

PISE package is a good starting point: http://www.pasteur.fr/recherche/unites/sis/Pise/

3. Computation Pipelines
a. EST Clustering and analysis
b. Sequence calling and assembly (Phred/Phrap/Consid)
c. Association Studies
d. Data Integration (iProtein)
e. Network and Pathway analysis
f. Structure prediciton and modelling
g. Data submition to public databases (NCBI, PDB, GEO)
h. SNP analysis
i. Workflow (Keppler)

4. Local databases
a. NCBI
b. SRS
c. Human Genome Browser

5. Software tools
a. GeneString (microarray analysis)
b. Spotfire (Multi-dimentional data analysis and visualization)
c. Matlab
d. Mathmatica

6. Project managment tools, Time Management and tracking, CRM

7. Cluster Computing Infrastructure
a. open source software in cluster environment (some of them from ROCK distribution)
b. local databases

8. Computing infrastructure

a. Servers
b. Development environment (Test, Develop, production environment)

9. IT infrastructure
a. Network (DNS, DHCP)
b. Security (Kerberos, Firewall)

Configure My MacPro Book

I was trying to install biopython releated module into my Mac. I am still new to MacOX.

I found Fink to install think. From the first tried, I failed to install bioperl. So it is not as easy as I expected.
I also need to install Apple developer package which require a free registration.

So far I experienced MacOS:

1. Wireless connection is good: reliable but somehow I felt the connection speed is slow.
2. The mouse is hard to click, very rigid.
3. Multi-media is good. Once put a CD in, it can automatically recoginzed it. Same things for DV camera. I tried this on Windows, after install required software, it is still hard to do.
4. The screen is bright, expecially good to work outside
5. No developer program installed by default. Xcode, fink needed
6. Dash board is handy, especially the dictionary
7. I can not use Kingword, which is a English-Chinese dictionary that use mouse to capture the work from screen can translate into Chinese automatically
8. Sound is good.
9. Not every program is available on MacOS. such as camtaria
10. Powerbutton is too obvious. It is not good to have little kids around. My son, 1 year old contantly cause me trouble
11. Email program Entourage is slow

Wednesday, March 21, 2007

template LIMS thoughts

It is apparant that various forms of Laboratory Informaion Management System are needed by different people in UCD campus. It has to be Web based. So I am thinking to estiblish some template systems for different purposes.

Wikipedia defined LIMS as:

A Laboratory Information Management System (LIMS) is computer software that is used in the laboratory for the management of samples, laboratory users, instruments, standards and other laboratory functions such as invoicing, plate management, and work flow automation. A LIMS and a Laboratory Information System (LIS) perform similar functions. The primary difference is that LIMS are generally targeted toward environmental, research or commercial analysis, such as pharmaceutical or petrochemical, and LIS are targeted toward the clinical market (hospitals and other clinical labs).

I found one nice blog about several Web based systems at

http://labsoftnews.typepad.com/lab_soft_news/2006/03/the_first_brows.html

There is also a website has a lot of discussions (http://www.limsfinder.com/)

Most existing system especially commercial ones are for pharmacedutical or biotech to track chemicals and data high throughput instruments. They are not only expensive, but also not what most genomics based biologists.

There are some open source solutions too, like:

Bika: A South Africa company. A system developed on Plone and Python
Website: www.bikalabs.com/

Halx: A system developed for Structural Genomics activities. I like the way they develop their design that seperate presentation and data model.
Website: http://halx.genomics.eu.org/

BASE: It is for Microarray.
Website: http://base.thep.lu.se/

My idea is to see if we can develop different modules that can simplify the process of building customized system. For example:

User admination: User authentication and authorization modules
Data and File management: Data upload and download
Billing
Communications
Report
Visulization
Backend bulk upload