.. -*- mode: rst; fill-column: 78; indent-tabs-mode: nil -*-
.. vi: set ft=rst sts=4 ts=4 sw=4 et tw=79:
  ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
  #
  #   See COPYING file distributed along with the PyMVPA package for the
  #   copyright and license terms.
  #
  ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###


.. index:: measure, sensitivity
.. _chap_measures:

********
Measures
********

.. automodule:: mvpa2.measures


.. only:: html

   Related API documentation
   =========================

   .. currentmodule:: mvpa
   .. autosummary::
      :toctree: generated

      measures.base
      measures.anova
      measures.corrcoef
      measures.corrstability
      measures.ds
      measures.irelief
      measures.noiseperturbation
      measures.searchlight
      measures.statsmodels_adaptor



PyMVPA provides a number of useful measures. The vast majority of
them are dedicated to feature selection. To increase analysis
flexibility, PyMVPA distinguishes two parts of a feature selection
procedure.

First, the impact of each individual feature on a classification has to be
determined.  The resulting map reflects the sensitivities of all features with
respect to a certain decision and, therefore, algorithms generating these maps
are summarized as :class:`~mvpa2.measures.base.Sensitivity` in PyMVPA.

.. index:: feature selection

Second, once the feature sensitivities are known, they can be used as
criteria for feature selection. However, possible selection strategies
range from very simple *Go with the 10% best features* to more
complicated algorithms like :ref:`recursive_feature_elimination`.
Because :ref:`sensitivity_measures` and selections strategies can be
arbitrarily combined, PyMVPA offers a quite flexible framework for feature
selection.

.. index:: processing object

Similar to dataset splitters, all PyMVPA algorithms are implemented and behave
like :term:`processing object`\ s. To recap, this means that they are
instantiated by passing all relevant arguments to the constructor. Once
created, they can be used multiple times by calling them with different
datasets.

.. Again general overview first. What is a `SensitivityAnalyzer`, what is the
   difference between a `FeatureSelection` and an `ElementSelector`.
   Finally more detailed note and references for each larger algorithm.


.. index:: sensitivity
.. _sensitivity_measures:

Sensitivity Measures
====================

It was already mentioned that a :class:`~mvpa2.measures.base.Sensitivity`
computes a featurewise score that indicates how much interesting signal each
feature contains -- hoping that this score somehow correlates with the impact
of the features on a classifier's decision for a certain problem.

Every sensitivity analyzer object computes a one-dimensional array with the
respective score for every feature, when called with a
:class:`~mvpa2.datasets.base.Dataset`. Due to this common behavior all
:class:`~mvpa2.measures.base.Sensitivity` types are interchangeable and can be
combined with any other algorithm requiring a sensitivity analyzer.

By convention higher sensitivity values indicate more interesting features.

There are two types of sensitivity analyzers in PyMVPA. Basic sensitivity
analyzers directly compute a score from a Dataset. Meta sensitivity analyzers
on the other hand utilize another sensitivity analyzer to compute their
sensitivity maps.


Basic Sensitivity (and related Measures)
----------------------------------------

.. index:: anova, F-score, univariate, measure
.. _anova:

ANOVA
^^^^^

The :class:`~mvpa2.measures.anova.OneWayAnova` class provides a simple (and fast)
univariate measure, that can be used for feature selection, although it is not
a proper sensitivity measure. For each feature an individual F-score is
computed as the fraction of between and within group variances. Groups are
defined by samples with unique targets.

Higher F-scores indicate higher sensitivities, as with all other sensitivity
analyzers.



.. index:: classifier weights, weights, SVM, measure

Linear SVM Weights
^^^^^^^^^^^^^^^^^^

The featurewise weights of a trained support vector machine are another
possible sensitivity measure.  The
:class:`mvpa2.clfs.libsvmc.sens.LinearSVMWeights` and
:class:`mvpa2.clfs.sg.sens.LinearSVMWeights` classes can internally train all
types of *linear* support vector machines and report those weights.

In contrast to the F-scores computed by an ANOVA, the weights can be positive
or negative, with both extremes indicating higher sensitivities. To deal with
this property all subclasses of :class:`~mvpa2.measures.base.Measure`
support a `mapper` arguments in the constructor.  A mapper is just some
:class:`~mvpa2.mappers.base.Mapper`, `forward_dataset()` method of which is
called with the resultant sensitivity map as the argument.  In most of the
cases you would like just to apply some simple transformation function (taking absolution values
etc) or a bit more advanced where you would like to combine some sensitivities
(e.g. per class) into a single measure.  For that purpose
:class:`mvpa2.mappers.fx.FxMapper` was created and some convenience factory
methods where provided for most common operations.  So in the example below we
will use `maxofabs_sample()` which will return a mapper giving maximal absolute value among
multiple sensitivities (in our case it is just a single one for binary
classification).

 >>> from mvpa2.suite import *
 >>>
 >>> ds = normal_feature_dataset()
 >>> print ds
 <Dataset: 100x4@float64, <sa: chunks,targets>>
 >>>
 >>> clf = LinearCSVMC()
 >>> sensana = clf.get_sensitivity_analyzer()
 >>> sens = sensana(ds)
 >>> sens.shape
 (1, 4)
 >>> (sens.samples < 0).any()
 True
 >>> sensana_abs = clf.get_sensitivity_analyzer(postproc=absolute_features())
 >>> (sensana_abs(ds).samples < 0).any()
 False

Above example shows how to use an existing classifier instance to report
sensitivity values (a linear SVM in this case). The computed sensitivity vector
contains one element for each feature in the dataset.
:mod:`~mvpa2.misc.transformers` can be used to post-process the sensitivity
scores, e.g. reporting absolute values for feature selection purposes, instead
of raw sensitivities.

.. note::

  The `SVMWeights` classes *cannot* extract reasonable weights from non-linear
  SVMs (e.g. with RBF kernels).



Other linear Classifier Weights
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Any linear classifier in PyMVPA can report its weights. The procedure is
identical for all of them. As outlined in the example using linear SVM weights,
simply call :meth:`~mvpa2.clfs.base.Classifier.get_sensitivity_analyzer` on a
classifier instance and you'll get an appropriate
:class:`~mvpa2.measures.base.Sensitivity` object. Additionally, it is possible
to force (re)training of the underlying classifier or simply report the weights
computed during a previous training run.

Examples of other classifier-based linear sensitivity analyzers are:
:class:`~mvpa2.clfs.smlr.SMLRWeights` and
:class:`~mvpa2.clfs.gpr.GPRLinearWeights`.


.. index:: noise perturbation, measure
.. _noise_perturbation:

Noise Perturbation
^^^^^^^^^^^^^^^^^^

Noise perturbation is a generic approach to determine feature sensitivity.  The
sensitivity analyzer
:class:`~mvpa2.measures.noiseperturbation.NoisePerturbationSensitivity`)
computes a scalar :class:`~mvpa2.measures.base.Measure` using the
original dataset. Afterwards, for each single feature a noise pattern is added
to the respective feature and the dataset measure is recomputed. The
sensitivity of each feature is the difference between the dataset measure of
the original dataset and the one with added noise. The reasoning behind this
algorithm is that adding noise to *important* features will impair a dataset
measure like cross-validated classifier transfer error. However, adding noise
to a feature that already only contains noise, will not change such a measure.

Depending on the used scalar :class:`~mvpa2.measures.base.Measure` using
the sensitivity analyzer might be really CPU-intensive! Also depending on the
measure, it might be necessary to use appropriate
:mod:`~mvpa2.misc.transformers` (see :mod:`~mvpa2.misc.transformers` constructor
arguments) to ensure that higher values represent higher sensitivities.


.. index:: meta measures

Meta Sensitivity Measures
-------------------------

Meta Sensitivity Measures are FeaturewiseMeasures that internally use
one of the `Basic Sensitivity (and related Measures)`_ to compute their
sensitivity scores.


.. index:: splitting measures, measure

Splitting Measures
^^^^^^^^^^^^^^^^^^

The SplittingFeaturewiseMeasure uses a
:class:`~mvpa2.datasets.splitters.Splitter` to generate dataset splits.  A
FeaturewiseMeasure is then used to compute sensitivity maps for all
these dataset splits. At the end a `combiner` function is called with all
sensitivity maps to produce the final sensitivity map. By default the mean
sensitivity maps across all splits is computed.

.. _SplitFeaturewiseMeasure: api/mvpa2.measures.splitmeasure.SplitFeaturewiseMeasure-class.html
