History: Machine Learning
Source of version: 64
Copy to clipboard
{img src="display1737" link="display1737" width="1000" rel="box[g]" imalign="center" desc="Click to expand" align="center" styleimage="border"} Machine Learning has been added to ((Tiki23)) as a built-in (but optional) feature, using the Rubix ML library, which provides 40+ supervised and unsupervised learning algorithms. ! Introduction In this basic documentation to machine learning feature in Tiki, you'll discover how to create Machine leaning (ML) models, train against data stored in trackers and query models. ! What is Machine Learning? Machine Learning (ML) is a form of Artificial Intelligence (AI) that uses data to train a computer to perform tasks. Unlike traditional programming, in which rules are programmed explicitly, machine learning uses algorithms to build rulesets automatically. At a high level, machine learning is a collection of techniques borrowed from many disciplines including statistics, probability theory, and neuroscience combined with novel ideas for the purpose of gaining insight through data and computation. Machine Learning is further broken down into subcategories based on how the learners are trained and the tasks they handle. In this documentation we assume that you already have a basic understanding of the different types of machine learning such as classification and regression. If not, we recommend the section on [https://docs.rubixml.com/en/latest/what-is-machine-learning.html|What is Machine Learning?] to start with. ! What is Rubix ML? [https://rubixml.com/|Rubix ML] is a high-level machine learning and deep learning library for the PHP language that includes implementations of several machine learning algorithms, so you can define a model object in a single line or a few lines of code, then use it to fit a set of points or predict a value. It is Open source and free to use commercially. The library provides tools for the entire machine learning life cycle from ETL (extracting, transforming, loading, manipulating and summarising data) to training, cross-validation, and production with over 40 supervised and unsupervised learning algorithms. A number of algorithms in the library support Deep Learning including the Multilayer Perceptron classifier and MLP Regressor. This library has been added in tiki starting with version 23 in order to give Tiki users access to the power of artificial intelligence and machine learning . ! How can this be used in Tiki? It opens up so many possibilities for Tiki users : * can be used to build machine learning pipelines, train classifiers, and run evaluations without having to write a single line of code. * To get an idea of the diversity of the use cases, look at the "Tutorials and Example Projects" section here: https://rubixml.com/ (Yes, we will now be able to do this in Tiki!) * Machine Learning will be deployed all over Tiki to leverage both ** system data (ex. logs of user activity) and ** content managed by Tiki users, like ((Trackers)) and ((Webmail)). ! How to get started? !!! step 01: Enable ML feature in admin features. {img src="display1739" link="display1739" width="800" rel="box[g]" imalign="center" desc="Click to expand" align="center" styleimage="border"} !!! step 02: Give permissions to admin ML models to relevant users and use the models by (other) users. {img src="display1740" link="display1740" width="800" rel="box[g]" imalign="center" desc="Click to expand" align="center" styleimage="border"} !!! step 03: Preparing or Importing a Dataset using Tiki tracker To import a dataset into tiki, we need to create a [Trackers|tiki tracker] with the corresponding data fields then [https://doc.tiki.org/Tracker-Tabular|import the csv file]. The tracker will be used as our data set to create a machine learning model. we can even create a fake dataset (tracker) to train a model by populating fields and items manually. so trackers are define as input sources - one or more tracker fields would be the feature set and item ID will be the label. {img src="display1742" link="display1742" width="800" rel="box[g]" imalign="center" desc="Click to expand" align="center" styleimage="border"} !!! step 04: !! Video {kaltura id="1_1nf2dfkt" width="100%" type="html5"} !! Code * ((Tiki23)) will be released in June 2021 and you can use and test it today via a ((dev:Daily Build)). After sufficient testing, the machine learning code could be backported to ((Tiki22)) (ex: in 22.3 or 22.4) given that it's pretty self-contained (if you don't activate the feature, nothing changes) * Here is some of the code: ** https://gitlab.com/tikiwiki/tiki/-/blob/master/lib/core/Services/ML/Controller.php ** https://gitlab.com/tikiwiki/tiki/-/blob/master/lib/ml/mllib.php !! How to configure (better documentation is being worked on) {QUOTE(replyto="Victor")}!! Quick summary of how it works: # Enable ML feature in admin features. # Give permissions to admin ML models to relevant users and use the models by (other) users. # Go to ML models (new menu entry when feature is enabled and you have permission to see it). # Start with a new model. You can use MLT template for bootstrap. # Select relevant tracker and field. Depending on the model, you can select more than one field but MLT works best with one field at the moment. # Check model creation process - you can choose from basically all Transformers from rubix base and extras packages, then specify a learner (regressor, classifier, etc.) then tweak the parameters and save. # You can test models at every time against the real data. # All exceptions/errors are shown during params tweaking and testing/training. # If data source is not big, you can train via web interface. # Otherwise, it is better to use the new console command (ml:train) and do this with a scheduler - e.g. train every night. # After model is trained you can use it - with MLT that consists of entering query content and seeing top 20 (by default) relevant matches. All parameters are tweak-able including this 20 more like these entries. The MLT template uses the best approach we came up with Andrew up to now. If we figure out something better, I can modify the default template. The plan is to add such templates for each use-case we handle but also give people the opportunity to experiment with their own models. We need help pages on doc site - Machine+Learning and Machine+Learning+Models. It is probably best to describe implementation details, have suitable links to rubixml docs and such things. Something important I see here is that trained models are stored as cached content on disk. Clearing Tiki cache will require to re-train the model. That's in short. I suppose you will have a lot of feedback, so curious to hear how to continue/improve here. One way to experiment is different source options - more than trackers, e.g. wiki pages, action log items, calendar items, files, etc. Files should probably give us the ability to upload a csv of a sample data to train against.{QUOTE} See also ((Machine Learning Models)) %%% {HTML()} <style> .thumbcaption { display: none; } #page-data > p:first-of-type, .wikipreview .wikitext > p:first-of-type { font-size: 120%; padding: 2rem; background: #f0f0f0; border-radius: .0rem; } </style> {HTML}