Interest in a tiny kmeans library

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Interest in a tiny kmeans library

Boost - Dev mailing list
Dear all,

We are finishing the cleanup of a tiny kmeans library. For those who do
not know, kmeans is a widely used data clustering algorithm.

This special implementation has a lower runtime complexity by taking
advantages of the triangle inequalities between clusters and data points
at each iteration.
This implementation is based on the paper of Charles Elkan
https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf

We have also python and matlab bindings, fully generic on the data type,
and with additional initialization heuristics.

I would be happy if we can release this library into Boost.
Do you think there is any interest for the community?

Best regards,
Jean-Claude Passy and Raffi Enficiaud




_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Interest in a tiny kmeans library

Boost - Dev mailing list
Hi,

I have proposed to provide an implementation of KMeans under uBLAS, as part
of my GSoC project this summer. Currently, I am working on designing the
API, and have not implemented anything.

*My thoughts*:
I have proposed to implement a very basic form of kmeans, with three types
of initializations - random, kmeans++, Brady-Fayyad. It would be great if
we can work together to integrate your implementation as well.

It would be helpful if we get inputs from David and Sharique (my mentors)
on how to proceed with this.

Regards,
Dattatreya Mohapatra


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Interest in a tiny kmeans library

Boost - Dev mailing list


Le 12/05/2018 à 06:38, Dattatreya Mohapatra a écrit :

> Hi,
>
> I have proposed to provide an implementation of KMeans under uBLAS, as
> part of my GSoC project this summer. Currently, I am working on
> designing the API, and have not implemented anything.
>
> *My thoughts*:
> I have proposed to implement a very basic form of kmeans, with three
> types of initializations - random, kmeans++, Brady-Fayyad. It would be
> great if we can work together to integrate your implementation as well.
>
> It would be helpful if we get inputs from David and Sharique (my
> mentors) on how to proceed with this.
>
> Regards,
> Dattatreya Mohapatra
> ‌

Hi,

I haven't seen the GSoC proposal, thanks for bringing it to my attention.
We have already a design and code that works, and the implementation is
using an efficient algorithm. We have also the kmeans++ initialization
but not the Brady-Fayyad (I am interested in any pointers).
Also I think this would be better used outside of uBlas, because kmeans
has a very general use case. The implementation that we have has no
dependency other than STL.

We are ready to release/integrate (test, doc, benchmarks are here), and
this is why I am asking if there is an interest and how to proceed.
So let's work together on this if you want.

Raffi

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Interest in a tiny kmeans library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list


> -----Original Message-----
> From: Boost [mailto:[hidden email]] On Behalf Of Raffi Enficiaud via Boost
> Sent: 11 May 2018 20:24
> To: [hidden email]
> Cc: Raffi Enficiaud
> Subject: [boost] Interest in a tiny kmeans library
>
> Dear all,
>
> We are finishing the cleanup of a tiny kmeans library. For those who do
> not know, kmeans is a widely used data clustering algorithm.
>
> This special implementation has a lower runtime complexity by taking
> advantages of the triangle inequalities between clusters and data points
> at each iteration.
> This implementation is based on the paper of Charles Elkan
> https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf
>
> We have also python and matlab bindings, fully generic on the data type,
> and with additional initialization heuristics.
>
> I would be happy if we can release this library into Boost.
> Do you think there is any interest for the community?

This is niche stuff, but I suspect useful nonetheless.

Do not be discouraged by immediate lack of interest.

But you may need to find some users to press your case.

(And don't forget the need for good Boost-style docs).

Paul

Paul A. Bristow
Prizet Farmhouse
Kendal UK LA8 8AB
+44 1539 561830
+44 7714 33 02 04
+44 7541 40 37 60
[hidden email]
[hidden email]
[hidden email]
[hidden email]
[hidden email]




_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Interest in a tiny kmeans library

Boost - Dev mailing list
On 22 May 2018 at 15:02, Paul A. Bristow via Boost
<[hidden email]> wrote:

>> -----Original Message-----
>> From: Boost [mailto:[hidden email]] On Behalf Of Raffi Enficiaud via Boost
>> Sent: 11 May 2018 20:24
>> To: [hidden email]
>> Cc: Raffi Enficiaud
>> Subject: [boost] Interest in a tiny kmeans library
>>
>> Dear all,
>>
>> We are finishing the cleanup of a tiny kmeans library. For those who do
>> not know, kmeans is a widely used data clustering algorithm.
>>
>> This special implementation has a lower runtime complexity by taking
>> advantages of the triangle inequalities between clusters and data points
>> at each iteration.
>> This implementation is based on the paper of Charles Elkan
>> https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf
>>
>> We have also python and matlab bindings, fully generic on the data type,
>> and with additional initialization heuristics.
>>
>> I would be happy if we can release this library into Boost.
>> Do you think there is any interest for the community?
>
> This is niche stuff, but I suspect useful nonetheless.
>
> Do not be discouraged by immediate lack of interest.
>
> But you may need to find some users to press your case.
>
> (And don't forget the need for good Boost-style docs).


I second that.
I could potentially use it myself, so I'd be interested in seeing it proposed.
(with good docs, of course :))

Best regards,
--
Mateusz Loskot, http://mateusz.loskot.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Interest in a tiny kmeans library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list

Hi,

Fri, May 11, 2018 at 09:23:35PM +0200, Raffi Enficiaud via Boost wrote:

> Dear all,
>
> We are finishing the cleanup of a tiny kmeans library. For those who do
> not know, kmeans is a widely used data clustering algorithm.
>
> This special implementation has a lower runtime complexity by taking
> advantages of the triangle inequalities between clusters and data points
> at each iteration.
> This implementation is based on the paper of Charles Elkan
> https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf
>
> We have also python and matlab bindings, fully generic on the data type,
> and with additional initialization heuristics.

Bindings to matlab are an often requested feature, at least from my
experience.

>
> I would be happy if we can release this library into Boost.
> Do you think there is any interest for the community?
>

I am interested in such a library and consider it very useful. Is there
a possibility to have a look at it somewhere and to experiment with it a
bit?

Thanks in advance
   Philipp


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost