If the model is not available then the agent learns the model and optimal policy by trial and error. When the model is not available, the agent uses a Q function, which is defined as follows:
The Q function basically maps the pairs of states and actions to a real number that denotes the expected total reward if the agent at state s selects an action a.