apoor package

Module contents

A small personal package created to store code and data I often reuse.

I’ll continue to update it with useful functions that I find myself reusing. The apoor.data module has some common datasets and functions for reading them in as pandas DataFrames.

apoor.fdir(o: Any) → List[str]

Filtered dir(). Same as builtin dir() function without private attributes.

Parameters:o – Object being inspected
Returns:“Public attributes” of o
apoor.ibuff(itr: Iterable[T_co], bsize: int = 1) → Iterable[List[T]]

Creates an iterable that yields elements from itr grouped into lists of size bsize.

If itr can’t evenly be grouped into lists of size bsize, the final list will have the remaining elements.

Parameters:
  • itr – The interable to be buffered.
  • bsize

    Positive integer, representing the number of values from itr to be yielded together.

    The final list yielded may not be of size bsize if len(itr) doesn’t evenly divide into groups of bsize.

Yields:

Buffered elements from itr, grouped into lists of size up to bsize.

Raises:
  • TypeError – If bsize isn’t an integer.
  • ValueError – If bsize isn’t positive.

Examples

>>> for b in apoor.ibuff(range(10),3):
...     print(b)
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]
apoor.make_scale(dmin: float, dmax: float, rmin: float, rmax: float, clamp: bool = False) → Callable[[float], float]

Scale function factory.

Creates a scale function to map a number from a domain to a range.

Parameters:
  • dmin – Domain’s start value
  • dmax – Domain’s end value
  • rmin – Range’s start value
  • rmax – Range’s end value
  • clamp – If the result is outside the range, return clamped value (default: False)
Returns:

A scale function taking one numeric argument and returns the value mapped from the domain to the range (and clamped if clamp flag is set).

Examples

>>> s = make_scale(0,1,0,10)
>>> s(0.1)
1.0
>>> s = make_scale(0,10,10,0)
>>> s(1.0)
9.0
>>> s = make_scale(0,1,0,1,clamp=True)
>>> s(100)
1.0
apoor.set_seed(n: int)

Sets numpy’s random seed.

Parameters:n (int) – The value used to set numpy’s random seed.
apoor.to_onehot(y: numpy.ndarray, num_classes: int = None, dtype='float32') → numpy.ndarray

Expands a 1D categorical vector to a 2D, onehot-encoded categorical matrix.

Parameters:
  • y – 1D categorical vector
  • num_classes

    Number of categories in (and width of) the output matrix.

    If num_classes is None, setsto max(y) + 1.

  • dtype – Data type of output matrix
Returns:

2D one-hot encoded category matrix

Examples

>>> data = np.array([0,2,1,3])
>>> apoor.to_onehot(data)
array([[1., 0., 0., 0.],
       [0., 0., 1., 0.],
       [0., 1., 0., 0.],
       [0., 0., 0., 1.]])
apoor.train_test_split(*arrays, test_pct: float = 0.15, val_set: bool = False, val_pct: float = 0.15) → Tuple[numpy.ndarray]

Splits arrays into train & test sets.

Splits arrays into train, test, and (optionally) validation sets using the supplied percentages.

Parameters:
  • *arrays

    An arbitrary number of sequences to be split into train, test, and (optionally) validation sets. Must have at least one array.

  • test_pct

    Float in the range [0,1]. Percent of total n values to include in test set.

    The train set will have 1.0 - test_pct pct of values (or 1.0 - test_pct - val_pct pct of values if val_set == True).

  • val_set – Whether or not to return a validation set, in addition to a test set.
  • val_pct

    float in the range [0,1]. Percent of total n values to include in test set.

    Ignored if val_set == False.

    The train set will have 1.0 - test_pct - val_pct pct of values.

Returns:

splits tuple of numpy arrays. Input arrays split into train, test, val sets.

If val_set == False, len(splits) == 2 * len(arrays), or if val_set == True, len(splits) == 3 * len(arrays).

Example

>>> x = np.arange(10)
>>> train_test_split(x)
(array([3, 9, 4, 2, 1, 0, 7, 5, 8]), array([6]))
>>> x = np.arange(10)
>>> y = x[::-1]
>>> x_train, x_test, y_train, y_test = train_test_split(x,y)
>>> x_train, x_test, y_train, y_test
(array([1, 3, 5, 8, 4, 7, 6, 9]),
 array([0, 2]),
 array([8, 6, 4, 1, 5, 2, 3, 0]),
 array([9, 7]))
>>> train_test_split(x,test_pct=0.3,val_set=True,val_pct=0.2)
(array([0, 9, 5, 7, 6, 2, 8]),
 array([1, 3, 4]),
 array([3, 4]))