presage/TODO

287 lines
9.6 KiB
Plaintext

Copyright (C) 2008 Matteo Vescovi <matteo.vescovi@yahoo.co.uk>
___________________
The Presage project
~~~~~~~~~~~~~~~~~~~
TODO list
---------
GUI apps:
* qprompter
** integrate into build system
* gprompter
** gray in and out redo and undo menu items
** toolbar icon size
** autocomp max height
** would be nice to have status bar with KSR rate
Architectural restructure:
- n-gram language model database format and database connector
The current database format stores the string in all n-grams,
i.e. for "the quick brown" fragment we'll have
1-gram: <word, count>
<the, 20>
2-gram: <word_1, word, count>
<the, brown, 10>
3-gram: <word_2, word_1, word, count>
<the, quick, brown, 1>
A possibly more time-efficient and space-efficient approach to
structuring the language model involves having n-gram records refer
to (n-1)-gram records instead of repeating the word strings, i.e.:
1-gram: <uid, word, count>
<1023, the, 20>
2-gram: <uid, 1-gram, word, count>
<2204, 1023, brown, 10>
3-gram: <uid, 2-gram, word, count>
<3452, 2204, brown, 1>
To build up the full 3-gram string "the quick brown", the references
to the 2-gram and 1-gram need to be walked. However, the predictive
algorithm needs to look at the counts for all k-grams where k is in
[1, n], so this would not be an additional time cost. The database
size would reduce as it would not need to store repetitions of the
words in each n-gram table.
- selector
should be a class similar to current PredictorActivator i.e. a class
that invokes other classes' method to perform work.
Current Selector's functionality should be broken up in Filter
objects i.e. an abstract Filter class and implementation of various
filters (repetion filter, greedy filter, etc)
- combiner
clean up the mess that is our current Predictor implementation,
particularly with regards to the Combiner handling and
implementation. Considering making Combiner a concrete class that
uses different CombinationStrategy objects to do combine
predictions. Combiner object would know how to retrieve its config
values and which Strategy to create and use.
- registry [DONE]:
Predictor class functionality should be split up. There should be
one PluginRegitry class which holds the active plugins and whose
interface consists of a call that returns an iterator to the
plugins.
Predictor would obtain an iterator from PluginRegistry and invoke
the predict() method on each Plugin pointed to by the iterator.
A new Learner class could invoke the learn() method on them when
needed.
This way, the reverse dependency that implementing learning cause
between ContextTracker and Predictor would disappear, being
substituted by a single dependency on Registry and the introduction
of a new Learner class (name still to decide).
The registry should eventually just be a simple wrapper around
plump.
Short term:
* Logger
- implement logger level inheritance from parent module
- SqliteDatabaseConnector callback: had to disable logging there because
static method, investigate on how it can be re-enabled
* test performance with different n values in n-gram
* Consider removing the following public methods from Variable
interface:
. Variable(const std::vector<std::string>& variable);
. size_t size() const;
* consider removing src/tools/ngram.* code
* smoothed n-gram predictor
- is it possible to reduce calls to count() to improve performance?
* rationalise user-specific and system data files location and config files location
- option to comply with XDG basedir spec for config files and data files
* add proper unicode support
* determine whether to enable dictionary plugin by default
(dictionary file?)
* rewrite strtoupper and strtolower utility functions to use a pointer
to function to do the individual char conversion
* add ContextTracker tests for control chars
* put everything inside the presage namespace
* write more integration tests
* write Combiner implementations (various combination strategies)
* add more tests, increase test coverage
* bug: validate string passed to sql_exec query function, unsanitized
string can cause security problems
* implement activation map predictive plugin
- try to improve reverseTokenizer::progress() accuracy
currently it uses a delta of 0.7, should try to get it down to 0.3
- Class ContextTracker could initialize Tokenizer's members separator
and blankspace on a member initializer list. Also, Tokenizer could
take references to string instead of pointers.
Medium term:
* fix character codes
* integration of the plump framework
Long term:
* use timer alarm to implement threaded predictor activator
* improve exceptions handling
* add more predictive plugins
Longer term:
* add gettext support
VARIOUS NOTES
=============
Plugins and Profiles and Managers
---------------------------------
A problem arises when a profile requires that more than one instance
of a Plugin object is created.
profile: pluginA, pluginB, pluginA
plugins: pluginA, pluginB, pluginA
libraries: libpluginA, libpluginB
We need to be able to distinguish (therefore separately manage) plugin
objects and library objects and profile objects.
libpluginA ---> pluginA
|
libpluginB -+-> pluginB
|
`-> pluginA
ProfileManager should invoke the construction of Plugin objects and
initiate their option values using a PluginFactory class.
PluginManager should manager the association between a Plugin object
and the module (library) object that contains the Plugin.
Plump, the Pluggable Lightweight Multithreaded Platform, was created
to solve this and other problems and is going to become presage's
plugin framework implementation.
Plump framework integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The dynamic loading and plugin management system currently implemented
is going to be scrapped in favour of the more general and portable
plump framework.
Plump is a Pluggable Lightweight Ubiquitous Multithreaded Platform
which makes integration, usage and deployment of a plugin framework
dead easy.
Plump integration into presage will require a number of changes to
presage architecture, affecting Predictor and PluginManager
classes in particular.
Predictor and PluginManager classes will delegate much of their
current functionality to plump. Plump will render the functionality
provided by PluginManager redundant, as everything that
PluginManager does will be done by plump. Similarly, part of the
Predictor class functionality will be replaced by plump too.
Predictor was intended to be used to execute the plugins in a
serial or parallel mode. Plump will do that. Predictor will still
be in charge of collecting the result of each plugin's run and
combining them into a global prediction.
PluginManager was in fact a lesser plump. PluginManager can be
considered a precursor to plump. Plump has been designed to solve
the same problems that PluginManager was intended to solve, plus a
bit more.
Plugins creation and initialisation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A few things should happen:
plugin objects should be instantiated based on configuration files,
that is if the configuration file uses the plugin, then an instance
of the corresponding class implementing the plugin should be
instantiated
plugin objects should be initialised with the options contained in
the configuration file
The most sensible way to achieve this requirements seems to revolve
around having a plugin factory class which:
determines which and how many instances of plugin classes need to
be instantiated from the xml configuration file
passes a pointer to the root the xml representation of the options
specific to that plugin so that the plugin constructor can
initialise its internal state accordingly
This results in:
plugins know how to initialise themselves
the information required for initizialisation is passed to the
plugin's constructor
the information is passed in xml parse tree format
Points to ponder:
(o) the plugin factory needs to be able to determine which plugin
class to instantiate a plugin from based on the content of the
configuration file (xml file). A solution could be that the module
implementing the plugin class exports a string corresponding to the
plugin type/name.
(o) it is necessary to be able to associate a plugin object with
initialisation data. In other words, each plugin class needs to
have an associated string that describes its kind. Or we can use
run-time type information.
(o) in light of all this, it is probably worth designing a versioning
system for plugin classes to be implemented as exported symbols in
the plugin module.
STEP to autoconfiscate
~~~~~~~~~~~~~~~~~~~~~~
aclocal
libtoolize --force --ltdl
autoheader
autoconf
automake -a --copy
or source the bootstrap script provided (in svn repo):
. bootstrap
########/
Copyright (C) 2008 Matteo Vescovi <matteo.vescovi@yahoo.co.uk>
Presage is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
########\