Data instances (Instance)

Class Orange.data.Instance holds a data instance. Each data instance corresponds to a domain, which defines its length, data types and values for symbolic indices.

Features

The data instance is described by a list of features defined by the domain descriptor (Orange.data.domain). Instances support indexing with either integer indices, strings or variable descriptors.

Since “age” is the the first attribute in dataset lenses, the below statements are equivalent:

>>> data = Orange.data.Table("lenses")
>>> age = data.domain["age"]
>>> example = data[0]
>>> print example[0]
young
>>> print example[age]
young
>>> print example["age"]
young

Negative indices do not work as usual in Python, since they refer to the values of meta attributes.

The last element of data instance is the class label, if the domain has a class. It should be accessed using get_class() and set_class().

The list has a fixed length that equals the number of variables.

Meta attributes

Meta attributes provide a way to attach additional information to data instances, such as, for example, an id of a patient or the number of times the instance was missclassified during some test procedure. The most common additional information is the instance’s weight. These attributes do not appear in induced models.

Instances from the same domain do not need to have the same meta attributes. Meta attributes are hence not addressed by positions, but by their id’s, which are represented by negative indices. Id’s are generated by function Orange.feature.Descriptor.new_meta_id(). Id’s can be reused for multiple domains.

Domain descriptor can, but doesn’t need to know about meta descriptors. See documentation on Orange.data.Domain for more on that.

If there is a particular descriptor associated with the meta attribute for the domain, attribute or its name can also be used for indexing. When registering meta attributes with domains, it is recommended to use the same id for the same attribute in all domains.

Meta values can also be loaded from files in tab-delimited format.

Meta attributes are often used as weights. Many procedures, such as learning algorithms, accept the id of the meta attribute defining the weights of instances as an additional argument.

The following example adds a meta attribute with a random value to each data instance.

import random
import Orange
random.seed(42)
lenses = Orange.data.Table("lenses")
id = Orange.feature.Descriptor.new_meta_id()
for inst in lenses:
    inst[id] = random.random()
print lenses[0]

The code prints out:

['young', 'myope', 'no', 'reduced', 'none'], {-2:0.84}

(except for a different random value). Data instance now consists of two parts, ordinary features that resemble a list since they are addressed by positions (eg. the first value is “psby”), and meta values that are more like dictionaries, where the id (-2) is a key and 0.84 is a value (of type Orange.data.Value).

To tell the learning algorithm to use the weights, the id needs to be passed along with the data:

bayes = Orange.classification.bayes.NaiveLearner(data, id)

Many other functions accept weights in similar fashion.

Code

print orange.getClassDistribution(data)
print orange.getClassDistribution(data, id)

prints out

<15.000, 5.000, 4.000>
<9.691, 3.232, 1.969>

where the first line is the actual distribution and the second a distribution with random weights assigned to the instances.

Registering the meta attribute using Orange.data.Domain.add_meta changes how the data instance is printed out and how it can be accessed:

w = Orange.feature.Continuous("w")
data.domain.addmeta(id, w)

Meta-attribute can now be indexed just like ordinary features. The following three statements are equivalent:

print data[0][id]
print data[0][w]
print data[0]["w"]

Another consequence of registering the meta attribute is that it allows for conversion from Python native types:

ok = Orange.feature.Discrete("ok?", values=["no", "yes"])
ok_id = Orange.feature.Descriptor.new_meta_id()
data.domain.addmeta(ok_id, ok)
data[0][ok_id] = "yes"

The last line fails unless the attribute is registered since Orange does not know which variable descriptor to use to convert the string “yes” to an attribute value.

Hashing

Data instances compute hashes using CRC32 and can thus be used for keys in dictionaries or collected to Python data sets.

class Orange.data.Instance
domain

The domain to which the data instance corresponds. This attribute is read-only.

__init__(domain[, values])

Construct a data instance with the given domain and initialize the values. Values are given as a list of objects that can be converted into values of corresponding variables: strings and integer indices (for discrete varaibles), strings or numbers (for continuous variables), or instances of Orange.data.Value.

If values are omitted, they are set to unknown.

Parameters:

The following example loads data on lenses and constructs another data instance from the same domain.

import Orange
lenses = Orange.data.Table("lenses")
domain = lenses.domain
inst = Orange.data.Instance(domain, ["young", "myope",
                               "yes", "reduced", "soft"])

Same can be done using other representations of values

inst = Orange.data.Instance(domain, ["young", 0, 1,
            Orange.data.Value(domain[3], "reduced"), "soft"])
__init__([domain, ]instance)

Construct a new data instance as a shallow copy of the original. If a domain descriptor is given, the instance is converted to another domain.

Parameters:

The following examples constructs a reduced domain and a data instance in this domain.

domain_red = Orange.data.Domain(["age", "lenses"], domain)
inst_red = Orange.data.Instance(domain_red, inst)
__init__(domain, instances)

Construct a new data instance for the given domain, where the feature values are found in the provided instances using both their ordinary features and meta attributes that are registered with their corresponding domains. The new instance also includes the meta attributes that appear in the provided instances and whose values are not used for the instance’s features.

Parameters:
  • domain (Orange.data.domain) – domain descriptor
  • instances – data instances
import Orange

data1 = Orange.data.Table("merge1")
data2 = Orange.data.Table("merge2")

a1, a2 = data1.domain.attributes

metas = data1.domain.getmetas()
m1, m2 = data1.domain["m1"], data1.domain["m2"]
m1i, m2i = data1.domain.metaid(m1), data1.domain.metaid(m2)

a1, a3 = data2.domain.attributes
n1 = Orange.feature.Continuous("n1")
n2 = Orange.feature.Continuous("n2")

new_domain = Orange.data.Domain([a1, a3, m1, n1])
new_domain.addmeta(m2i, m2)
new_domain.addmeta(Orange.feature.Descriptor.new_meta_id(), a2)
new_domain.addmeta(Orange.feature.Descriptor.new_meta_id(), n2)

merge = Orange.data.Instance(new_domain, [data1[0], data2[0]])
print "First example: ", data1[0]
print "Second example: ", data2[0]
print "Merge: ", merge

The new domain consists of variables from data1 and data2: a1, a3 and m1 are ordinary features, and m2 and a2 are meta attributes in the new domain. m2 has the same meta attribute id as it has in data1, while a2 gets a new meta id. In addition, the new domain has two new attributes, n1 and n2.

Here is the output:

First example:  [1, 2], {"m1":3, "m2":4}
Second example:  [1, 2.5], {"m1":3, "m3":4.5}
Merge:  [1, 2.5, 3, ?], {"a2":2, "m2":4, -5:4.50, "n2":?}

Since attributes a1 and m1 appear in domains of both original instance, the new instance can only be constructed if these values match. a3 comes from the second instance, and meta attributes a2 and m1 come from the first one. The meta attribute m3 is also copied from the second instance; since it is not registered within the new domain, it is printed out with an id (-5) instead of with a name. Values of the two new attributes are left undefined.

native([level])

Convert the instance into an ordinary Python list. If the optional argument level is 1 (default), the result is a list of instances of Orange.data.Value. If it is 0, it contains pure Python objects, that is, strings for discrete variables and numbers for continuous ones.

compatible(other, ignore_class=False)

Return True if the two instances are compatible, that is, equal in all features which are not missing in one of them. The optional second argument can be used to omit the class from comparison.

get_class()

Return the instance’s class as Orange.data.Value.

get_classes()

Return the values of multiple classes as a list of Orange.data.Value.

set_class(value)

Set the instance’s class.

Parameters:value (Orange.data.Value, number or string) – the new instance’s class
set_classes(values)

Set the values of multiple classes.

Parameters:values (list) – a list of values; the length must match the number of multiple classes
get_metas([key_type])

Return a dictionary containing meta values of the data instance. The argument key_type can be int (default), str or Orange.feature.Descriptor and determines whether the dictionary keys are meta ids, variables names or variable descriptors. In the latter two cases, only registered attributes are returned.

data = Orange.data.Table("inquisition2")
example = data[4]
print example.get_metas()
print example.get_metas(int)
print example.get_metas(str)
print example.get_metas(Orange.feature.Descriptor)
Parameters:key_type (type) – the key type; either int, str or Descriptor
get_metas(optional[, key_type])

Similar to above, but return a dictionary that contains only non-optional attributes (if optional is 0) or only optional attributes.

Parameters:
  • optional (bool) – tells whether to return optional or non-optional attributes
  • key_type (type`) – the key type; either int, str or Descriptor
has_meta(attr)

Return True if the data instance has the specified meta attribute.

Parameters:attr (id, str or Descriptor) – meta attribute
remove_meta(attr)

Remove the specified meta attribute.

Parameters:attr (id, str or Descriptor) – meta attribute
get_weight(attr)

Return the value of the specified meta attribute. The attribute’s value must be continuous and is returned as float.

Parameters:attr (id, str or Descriptor) – meta attribute
set_weight(attr, weight=1)

Set the value of the specified meta attribute to weight.

Parameters:
  • attr (id, str or Descriptor) – meta attribute
  • weight (float) – weight of instance