.. This document is written in reStructuredText.
.. Build command:
   rst2html.py --date --time README README.html

=============================================
 easysvm - A front end to the shogun toolbox
=============================================

.. contents::

Introduction
============

This is a demo corresponding to the PLoS tutorial
"Support Vector Machines for Sequence Analysis". It is also meant as a
user "quick start" to using shogun (http://www.shogun-toolbox.org).


Installation
============

Install
-------

For a global install, for which you need root permissions

    python setup.py install

For a local install

    python setup.py install --prefix=$HOME

See distutils-help.txt for more details.

Dependencies
------------

- `numpy`_ (>=1.0.1)
- `pylab`_ (>=0.87.7) [optional]
- `shogun`_ (>=0.7.3)
- `arff`_ [optional]

.. _numpy: http://numpy.scipy.org/
.. _pylab: http://matplotlib.sourceforge.net/
.. _shogun: http://www.shogun-toolbox.org/
.. _arff: http://www.mit.edu/~sav/arff/


Usage
=====

The results in the paper were produced by tutorial_example.py. Execute
it in the data directory::

    cd data
    python ../splicesites/tutorial_example.py

Galaxy interface
----------------

The following command line arguments are what is behind the galaxy
interface, which is available as a web service from
http://galaxy.fml.tuebingen.mpg.de/

There are three types of data creation methods::

    datagen.py motif arff gattaca 10 50 10-15 0.1 tttt 100 50 15 0.1 testmotif1.arff
    datagen.py cloud 100 3 0.6 1.3 testcloud1.arff
    datagen.py motif arff gattaca 100 50 10-15 0.1 tttt 1000 50 15 0.1 testmotif2.arff
    datagen.py cloud 1000 3 0.6 1.3 testcloud2.arff

    datagen.py motif fasta gattaca 10 50 10-15 0.1 testmotifpos.fasta
    datagen.py motif fasta tttt 100 50 15 0.1 testmotifneg.fasta
    datagen.py motif fasta gattaca 100 50 10-15 0.1 tm1.fasta
    datagen.py motif fasta tttt 1000 50 15 0.1 tm2.fasta

Clean up::

    cat tm1.fasta tm2.fasta > testmotiftest.fasta
    rm tm1.fasta tm2.fasta


Cross validation and evaluation on a independent validation set::

    easysvm.py cv 5 10 gauss 0.6 arff testcloud1.arff cv_cloud.txt
    easysvm.py eval cv_cloud.txt arff testcloud1.arff cv_cloud_eval.txt roc roc_cloud_cv.png
    easysvm.py cv 5 10 wd 10 2 arff testmotif1.arff cv_motif.txt
    easysvm.py eval cv_motif.txt arff testmotif1.arff cv_motif_eval.txt roc roc_motif_cv.png

Predict on a test set::

    easysvm.py pred 10 gauss 0.6 arff testcloud1.arff testcloud2.arff pred_cloud.txt
    easysvm.py pred 10 linear arff testcloud1.arff testcloud2.arff pred_cloud.txt
    easysvm.py pred 10 poly 3 true true arff testcloud1.arff testcloud2.arff pred_cloud.txt

    easysvm.py pred 10 wd 10 2 arff testmotif1.arff testmotif2.arff pred_motif.txt
    easysvm.py pred 10 localalign arff testmotif1.arff testmotif2.arff pred_motif.txt
    easysvm.py pred 10 localimprove 10 1 1 arff testmotif1.arff testmotif2.arff pred_motif.txt

For some kernels, investigate the importance of different motives::

    easysvm.py poim 10 6 wd 10 2 arff testmotif1.arff poims.png

We also support the fasta format::

    easysvm.py cv 5 10 wd 10 2 fasta testmotifpos.fasta testmotifneg.fasta cv_motif.txt
    easysvm.py eval cv_motif.txt fasta testmotifpos.fasta testmotifneg.fasta cv_motif_eval.txt roc roc_motif_cv.png
    easysvm.py pred 10 wd 10 2 fasta testmotifpos.fasta testmotifneg.fasta testmotiftest.fasta pred_motif.txt
    easysvm.py poim 10 6 wd 10 2 fasta testmotifpos.fasta testmotifneg.fasta poims.png

Clean up::

    rm testmotif1.arff testmotif2.arff testcloud1.arff testcloud2.arff
    rm cv_cloud.txt roc_cloud_cv.png cv_motif.txt roc_motif_cv.png
    rm pred_cloud.txt pred_motif.txt poims.png
    rm testmotifpos.fasta testmotifneg.fasta testmotiftest.fasta
    rm cv_cloud_eval.txt cv_motif_eval.txt


License
=======

GPLv3_

.. _GPLv3: http://gplv3.fsf.org/

All programs in this collection are free software:
you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

Copyright 2008 Cheng Soon Ong and Gunnar Raetsch

