Ken (Chanoch) Bloom's Blog

30th May 2007

Idea: WekaBuilder (for Groovy)

UPDATE: I have a preliminary version of this sitting on my hard drive.

Create a Groovy builder for complicated machine learning problem setups. This would let you do things like have an AttributeSelectedClassifier or a metaclassifier (which includes other classifiers) in a clean tree-like way, without having to specify things on the command-line and get lost in the noise.

For example

Instances data = ...  // from somewhere
AttributeSelectedClassifier classifier = new AttributeSelectedClassifier();
CfsSubsetEval eval = new CfsSubsetEval();
GreedyStepwise search = new GreedyStepwise();

J48 base = new J48();
// 10-fold cross-validation
Evaluation evaluation = new Evaluation(data);
evaluation.crossValidateModel(classifier, data, 10, new Random(1));

could become

def data=//...
builder=new WekaBuilder()
    cfsSubsetEval //eval
    greedyStepWise //search
    j48 //classifier

e=new Evaluation()
e.crossValidateModel(classifier, data, 10, new Random(1))
println e.toSummaryString()

and things get simpler when you start adding options to these subclassifiers.

Permalink | ideas.
4th January 2007

Ideas: A consumer level E-commerce client

With the card I have described above comes a difficult problem -- how can one use this public key card to make an online transaction through a web server. Although one could modify the web browser to support this kind of thing, I think a better answer is to create a totally new e-commerce protocol with a separate client. The client could handle talking to the credit card through a standard interface, and also display products etc... in a consistent interface that's coded into the e-commerce client (and different version of the e-commerce client can exist to provide different kinds of accessibility).

This would also protect against things like Cross-Site Scripting vulnerabilities and Cross Site Request Forgery vulnerabilities. Generally speaking, the time has come for a rethinking of web development interfaces, with an eye toward failsafe defaults.

Permalink | ideas.
12th October 2005

Ideas: A language like XSL for designing XML editors

So I have a lexicon I'm working on in XML, with about 2000 entries that each look kinda like this:

  <entry domain="appraisal">
    <set att="polarity" value="marked"/>
    <modify att="force" type="flip"/>
    <modify att="orientation" type="flip"/>

and I have an XSL stylesheet to make it layout in a table that looks like this:

noRB-flipmarkedflip -

when I view it in Firefox. What I'd like is an XML editor interface that I can use to quickly knock together a tabular editor for this format. Ideally this should be pretty general. I should be able to specify my interface in an XSL like language, which I can knock together pretty quickly and easily, and load the interface language and the data into a program that can format the data according to the interface language.

Xample is an interesting idea, but not quite what I had in mind. XML Quick Editor by the now defunct netbryx might be closer, but I can't find any screenshots to see what it's like.

BixJ might be a good place to start.

In the true UNIX tradition, I've decided that a very easy to do this kind of XML editing on a schema-by-schema basis is to use XSH2. I've also found xmlstarlet to be very helpful for document editing and querying.

Permalink | ideas.
23rd September 2005

Idea: a private-key credit/debit card

Prevent identity theft (if you can keep your hands on your card) by using challenge-response authentication. The POS terminal sends your card a challenge, the card encrypts the challenge and sends it back, and the POS terminal checks it using your card's public key (which it fetches from the credit card company). Bonus points: put a key pad on the card, so that your key is protected with a password, and you know your password isn't going into random hostile machines.

So S-cubed also thought of this one. But we thought of it independantly.

And here's all the more reason why!

Smart cards sound like a good idea for this, but maybe a USB device similar to a thumb drive is a better idea. That way we don't need new hardware on our computers to do e-commerce.

Permalink | ideas.
23rd September 2005

Ideas: Multicast File Distribution

Save even more bandwidth than BitTorrent by multicasting popular files out from a single server to people who are interested in downloading them. Deal with the reliablity issues and latecomers by using a rateless erasure code so that they can reconstruct the file from just about any sufficiently large piece of the file that they do manage to download. (Sufficiently large will be a few percent larger than the original file.) Of course we'd need IPv6 multicast aware routers to catch on so that this actually works over the open internet.

On-the-Fly Verification of Rateless Erasure Codes for Efficient Content Distribution may be relevant.

Permalink | ideas.
My Website Archives