12.4. Introduction to Variables#

Variables are essential components of TensorFlow that represent shared and persistent state that your program can manipulate. This guide covers how to create, modify, and manage tf.Variable instances, which are the core objects that store and update variables in TensorFlow [TensorFlow Developers, 2023].

A tf.Variable is a wrapper around a tensor that allows changing its value by applying operations. There are specific operations for reading and writing the values of a variable. Many higher-level frameworks like tf.keras use tf.Variable to store model parameters, which shows how important variables are for various machine learning tasks [TensorFlow Developers, 2023].

12.4.1. Creating a Variable#

To create a variable, you need to provide an initial value, which can be a tensor or any object that can be converted to a tensor. The tf.Variable you create will have the same data type (dtype) and shape as the initial value [TensorFlow Developers, 2023].

import tensorflow as tf

# Create a function to print bold text
def print_bold(txt, c=31):
    """
    Display text in bold with optional color.

    Parameters:
    - txt (str): The text to be displayed.
    - c (int): Color code for the text (default is 31 for red).
    """
    print(f"\033[1;{c}m" + txt + "\033[0m")

# Create variables with different data types as initializations
int_variable = tf.Variable(42, name="int_variable", dtype=tf.int32)
float_variable = tf.Variable(3.14, name="float_variable", dtype=tf.float32)
bool_variable = tf.Variable(True, name="bool_variable", dtype=tf.bool)
string_variable = tf.Variable("Hello, TensorFlow!", name="string_variable", dtype=tf.string)
complex_variable = tf.Variable(2 + 3j, name="complex_variable", dtype=tf.complex64)

# Perform operations with variables
float_result = float_variable * 2
bool_result = tf.logical_not(bool_variable)
string_concat = tf.strings.join([string_variable, tf.constant(" Welcome!")])

# Print information about the variables and results
print_bold("Integer Variable:")
print(int_variable)
print_bold("Float Variable:")
print(float_variable)
print_bold("Boolean Variable:")
print(bool_variable)
print_bold("String Variable:")
print(string_variable)
print_bold("Complex Variable:")
print(complex_variable)
print_bold("Result of Float Variable Multiplication:")
print(float_result)
print_bold("Logical NOT of Boolean Variable:")
print(bool_result)
print_bold("Concatenated String Variable:")
print(string_concat)
Integer Variable:
<tf.Variable 'int_variable:0' shape=() dtype=int32, numpy=42>
Float Variable:
<tf.Variable 'float_variable:0' shape=() dtype=float32, numpy=3.14>
Boolean Variable:
<tf.Variable 'bool_variable:0' shape=() dtype=bool, numpy=True>
String Variable:
<tf.Variable 'string_variable:0' shape=() dtype=string, numpy=b'Hello, TensorFlow!'>
Complex Variable:
<tf.Variable 'complex_variable:0' shape=() dtype=complex64, numpy=(2+3j)>
Result of Float Variable Multiplication:
tf.Tensor(6.28, shape=(), dtype=float32)
Logical NOT of Boolean Variable:
tf.Tensor(False, shape=(), dtype=bool)
Concatenated String Variable:
tf.Tensor(b'Hello, TensorFlow! Welcome!', shape=(), dtype=string)

Most tensor operations behave as expected when used with variables. However, it’s important to note that variables cannot be reshaped using standard tensor reshaping operations.

import tensorflow as tf

# Create a variable
my_variable = tf.Variable([1, 2, 3, 4], name="my_variable")

# Print the variable, its conversion to a tensor, and the index of the highest value
print_bold("A variable:")
print(my_variable)
print_bold("\nViewed as a tensor:")
print(tf.convert_to_tensor(my_variable))
print_bold("\nIndex of highest value:")
print(tf.math.argmax(my_variable))

# Attempt to reshape the variable
# This creates a new tensor; it does not reshape the variable.
reshaped_variable = tf.reshape(my_variable, [1, 4])

# Print the reshaped tensor
print_bold("\nCopying and reshaping:")
print(reshaped_variable)
A variable:
<tf.Variable 'my_variable:0' shape=(4,) dtype=int32, numpy=array([1, 2, 3, 4])>

Viewed as a tensor:
tf.Tensor([1 2 3 4], shape=(4,), dtype=int32)

Index of highest value:
tf.Tensor(3, shape=(), dtype=int64)

Copying and reshaping:
tf.Tensor([[1 2 3 4]], shape=(1, 4), dtype=int32)

In this example:

  • We create a variable my_variable with shape (4,).

  • We print the variable itself, its conversion to a tensor using tf.convert_to_tensor, and the index of the highest value using tf.math.argmax.

  • We attempt to reshape the variable using tf.reshape, which actually creates a new tensor with the desired shape.

  • We print the reshaped tensor.

This illustrates that while operations like reshaping are possible with tensors, they do not directly reshape variables; instead, they create new tensors with the specified shape.

12.4.2. Lifecycles, Naming, and Watching#

In TensorFlow for Python, instances of tf.Variable share the same lifecycle characteristics as other standard Python objects. When no references to a variable remain, it is automatically deallocated by Python’s garbage collector. Furthermore, variables can be assigned names, facilitating the tracking and debugging process. It is possible to assign the same name to two separate variables [TensorFlow Developers, 2023].

However, this does not mean that the variables are identical or share the same value. Each variable is a distinct object with its own identity and attributes. To access the value of a variable, you can use the .value() method or the .numpy() method, which returns a NumPy array [TensorFlow Developers, 2023].

# Import TensorFlow
import tensorflow as tf

# Create a constant tensor
my_tensor = tf.constant([1, 2, 3])

# Create variables 'a' and 'b' with the same name
a = tf.Variable(my_tensor, name="ENGG")
# A new variable with the same name but different value
b = tf.Variable(my_tensor + 1, name="ENGG")

# Check if the variables are elementwise-unequal, despite having the same name
print_bold("Variables 'a' and 'b' are elementwise-unequal:")
print(a == b)

# To access the value of a variable, use .value() or .numpy()
print_bold("\nValue of variable 'a':")
print(a.value())
print_bold("\nValue of variable 'b':")
print(b.numpy())
Variables 'a' and 'b' are elementwise-unequal:
tf.Tensor([False False False], shape=(3,), dtype=bool)

Value of variable 'a':
tf.Tensor([1 2 3], shape=(3,), dtype=int32)

Value of variable 'b':
[2 3 4]

In this example:

  • We create a constant tensor my_tensor.

  • We create two variables, a and b, both with the name “ENGG”.

  • Variable a is initialized with my_tensor, while variable b is initialized with my_tensor + 1.

  • Despite having the same name, variables a and b have different values, as shown by the comparison a == b.

  • We also show how to access the value of a variable using the .value() method or the .numpy() method.

12.4.3. Variable Names, Saving, and Gradients#

When saving and loading models in TensorFlow, variable names are preserved. By default, when variables are used within models, they are assigned unique names automatically. This eliminates the need for manual naming, unless you have specific requirements. For example, you can use the name argument to specify a custom name for a variable, or use the name_scope function to create a hierarchical name for a group of variables [TensorFlow Developers, 2023].

While variables play a crucial role in differentiation, not all variables need to be included in the gradient computations. You can prevent gradients from being calculated for a variable by setting its trainable attribute to False during creation. An example of a variable that typically doesn’t need gradients is a training step counter [TensorFlow Developers, 2023].

Here’s a concise example:

import tensorflow as tf

# Create a non-trainable variable (training step counter)
train_step_counter = tf.Variable(0, trainable=False)

# Set trainable=False for a variable
non_trainable_variable = tf.Variable(42, trainable=False)

# Print information about the variables
print_bold("Train step counter:")
print(train_step_counter)
print_bold("Non-trainable variable:")
print(non_trainable_variable)
Train step counter:
<tf.Variable 'Variable:0' shape=() dtype=int32, numpy=0>
Non-trainable variable:
<tf.Variable 'Variable:0' shape=() dtype=int32, numpy=42>

In this example:

  • We create a non-trainable variable train_step_counter which will track the training steps.

  • We create another variable non_trainable_variable with trainable=False.

  • We print information about these variables. The train_step_counter variable, even though it’s a variable, is not intended for differentiation and is not trainable. The non_trainable_variable is explicitly set to be non-trainable.

12.4.3.1. Placing Variables and Tensors#

In TensorFlow for Python, instances of tf.Variable share the same lifecycle characteristics as other standard Python objects. When no references to a variable remain, it is automatically deallocated by Python’s garbage collector. Furthermore, variables can be assigned names, facilitating the tracking and debugging process. It is possible to assign the same name to two separate variables [TensorFlow Developers, 2023].

However, this does not mean that the variables are identical or share the same value. Each variable is a distinct object with its own identity and attributes. To access the value of a variable, you can use the .value() method or the .numpy() method, which returns a NumPy array [TensorFlow Developers, 2023].

For optimal performance, TensorFlow strives to place tensors and variables on the fastest compatible device according to their dtype. Consequently, most variables are placed on a GPU if one is available. However, you have the ability to override this placement behavior. In the following code snippet, a float tensor and a variable are explicitly placed on the CPU, even if a GPU is accessible. If you enable device placement logging, you can observe where the variable is actually placed [TensorFlow Developers, 2023].

Note

Although manual placement is possible, employing distribution strategies can be a more streamlined and scalable approach to enhancing computation efficiency.

import tensorflow as tf

# Create a float tensor placed on the CPU
with tf.device('/CPU:0'):
    cpu_tensor = tf.constant([1.0, 2.0, 3.0])

# Create a variable placed on the CPU
with tf.device('/CPU:0'):
    cpu_variable = tf.Variable([1.0, 2.0, 3.0])

# Print information about the placement
print_bold("CPU Tensor:")
print(cpu_tensor)
print_bold("CPU Variable:")
print(cpu_variable)
CPU Tensor:
tf.Tensor([1. 2. 3.], shape=(3,), dtype=float32)
CPU Variable:
<tf.Variable 'Variable:0' shape=(3,) dtype=float32, numpy=array([1., 2., 3.], dtype=float32)>

In this example:

  • We explicitly place a float tensor cpu_tensor and a variable cpu_variable on the CPU using the tf.device context manager.

  • We print information about these CPU-placed objects.

By reviewing the log outputs with device placement logging enabled, you can determine the actual placement of these variables and tensors.

12.4.3.2. Cross-Device Placement#

It is feasible to designate the location of a variable or tensor on one device while performing computations on another device. However, this approach introduces latency since data must be copied between devices. This strategy may be employed when dealing with scenarios involving multiple GPU workers, yet you desire only a single instance of the variables across these workers [TensorFlow Developers, 2023].

To illustrate, consider the following code snippet:

import tensorflow as tf

# Create a variable placed on the CPU
cpu_variable = tf.Variable([1.0, 2.0, 3.0], name="cpu_variable")

# Place the computation on a GPU
with tf.device('/GPU:0'):
    gpu_result = cpu_variable * 2.0

# Print the GPU result
print_bold("GPU Result:")
print(gpu_result)
GPU Result:
tf.Tensor([2. 4. 6.], shape=(3,), dtype=float32)

In this example:

  • We create a variable cpu_variable placed on the CPU.

  • We then perform a computation, multiplying the cpu_variable by 2.0, within a tf.device context manager to place the computation on a GPU.

  • We print the GPU result.

While such cross-device placement can be advantageous for specific distributed scenarios, it’s important to note the associated data copying overhead. To avoid this overhead, you can use the tf.distribute API to distribute your variables and computations across multiple devices in a more efficient and scalable way [TensorFlow Developers, 2023].