In dog training a variable ratio reinforcement schedule, also known as intermittent reinforcement schedule, is one form of several reinforcement schedules. Reinforcement schedules are the rules that pertain "how many or which responses will be reinforced" (Burch, Bailey 1999). A variable ratio reinforcement schedule is the schedule that follows a continuous reinforcement schedule. There are several reasons why moving from a continuous schedule to a variable ratio reinforcement schedule is important. In order to better understand the goals of a variable ratio reinforcement schedule, it helps to take a closer look at continuous reinforcement schedules and the consequences associated with its prolonged use.
Continuous Reinforcement Schedules
During the acquisition stage of dog learning, dogs are introduced to a new task or skill. Just like it happens in people learning a new skill, it's totally normal for dogs to be hesitant or uncertain. Dogs at this stage may only be able to perform a part of the skill.
A lot or repetition is needed and feedback must be given for correct responses or close approximations, albeit not yet in perfect form. Dogs, therefore, must receive continuous feedback on their progress through a continuous schedule of reinforcement (reinforcing every single correct response or approximation of response).
This schedule is not limited to dog training. As humans, we can see plenty of examples of continuous schedules happening in our everyday lives. Every time we press the power button on our remotes, our T.V. turns on, every time we flip a switch, the light turns on, every time we insert a dollar bill in the vending machine, it releases our favorite soda.
While a continuous reinforcement schedule works well during the acquisition stage, if we continue rewarding the dog every single time for every correct response, we will eventually end up rewarding also below average responses, points out Ian Dunbar. For example, when we reward our dog for sitting correctly all the time, most likely among those sits are also slow-to-respond sits, and we may expect even some sloppy sits (with the legs spread out to the side) to mix in every now and then.
By continuing to dole out treats for every single correct response we will be therefore removing opportunities for improvement and the quality of the behavior is affected. On top of that, the longer the dog is rewarded for every correct response, the harder it becomes to start phasing out all those rewards when a dog has relied on them for so long. This results in a dog who expects a reward every single time and risks getting frustrated when he doesn't get it. Not to mention, satiation: if you give a treat for every single response, your dog will get full quickly and motivation will fall.
" If you reward a dog for every correct response, approximately 50% of the time you will reward the dog for above–average responses and 50% of the time you will reward a dog for below average responses. It is simply too silly to reward a dog for below-average responses." Ian Dunbar
Moving to a Variable Ratio Reinforcement Schedule
In dog training, the term "stretching the ratio" refers to the procedure of gradually increasing the number of responses required for the dog to earn reinforcement. When using food to reinforce behaviors, we don't want to phase it out completely, otherwise the behavior risks becoming extinct, eventually disappearing from the dog's behavior repertoire.
So at some point, once the dog shows signs of responding at a steady rate during the proficiency stage of learning, it's time to stretch the ratio and start moving from a continuous schedule to an intermittent one, where behavior is rewarded randomly on some occasions and not others, which works great for maintaining behavior and preventing it from becoming extinct.
This schedule indeed leads to permanence of the behavior allowing the dog to perform at a consistent high rate. Indeed, a dog on a variable ratio schedule of reinforcement will continue to work in the hope that the next behavior will produce the treat. A variable ratio reinforcement schedule also works great for gradually thinning out those food rewards, so that the dog doesn't rely on them too much. Yes, gradually is the important keyword here!
If the schedule is stretched too fast, just like an elastic band may break when stretched too much, your dog's behavior may start breaking apart. Ratio strain is the technical term used to depict the phenomenon when a dog's pattern of responding begins breaking apart because of stretching that ratio too much.
By the way, this phenomenon is present in humans as well: just watch what happens in workplaces across the globe when workers are overworked and underpaid! Rebellion and subsequent strikes take place. In dogs, asking too much and giving a low rate of reinforcement frequency may lead to dogs getting frustrated, showing displacement behaviors and even losing interest, walking away and giving up. Some animals may even become aggressive.
Preventing Ratio Strain
When moving from a continuous schedule of reinforcement to an intermittent one, care must be therefore taken to do this gradually. Just picture what often happens when a person is used to the remote predictably turning on the TV at a touch of a button every time, every single day.
That day, when the remote fails to work, (coincidentally right when a big football match is on), watch that person pressing and shaking the remote, pressing harder, and perhaps, in a person with a low threshold for frustration, watch him/her cursing and possibly even tossing the remote against the wall!
So to prevent this frustration from happening in ur companions, we must stretch the ratio very gradually. In a dog in the process of being trained to sit, we would therefore, start by giving a treat to the dog for every successful sit at first (CRF, continuous reinforcement), and then, as the dog responds at a steady rate, we can start giving the treat every other sit, then we can start rewarding randomly like the third sit, the second sit, the fifth sit, etc.
This is a good time to start raising criteria, basically, raising the bar and paying attention to what the dog does so we can start picking out only the best sits to reward, so that we can improve quality. Once we have successfully stretched the ratio, we should see a dog who is on his toes and eager to work for that random reward, yes, just like a gambler playing the slots at Vegas!
Did you know? Stretching the ratio is astutely used in gambling establishments. Card sharks will let you win frequently during the early stages of play and then once you're hooked, they'll stretch the ratio gradually and then start winning more and more of the games, explains Paul Chance in the book "Learning and Behavior."
"Slot machines are designed to operate on a variable schedule because this schedule will maintain the highest rate of responding in relation to the number of reinforced trials provided. Unlike the vending machine, the slot machine delivers its payout in a seemingly random schedule."~James O' Heare, The Science and Technology of Dog Training
An Up and Down Process
Moving from a continuous schedule to an intermittent one is not a clear cut process like turning on a light switch. When should I move from a continuous schedule to a variable one?
Generally, dog owners are instructed to do so when the dog has learned to sit on cue and responds to the verbal cue "sit" most of the time. To have a better ballpark figure, you should expect to move to a variable schedule once your dog performs the behavior on cue at least 80 percent of the time. However, watch your schedule when exposing your dog to new criteria that may cause the behavior to break apart.
For example, when your dog learns to sit reliably in your living room (like at least eight times out of ten,) you may start giving treats randomly, but then, once you're out in the yard, where there are more distractions around, your best bet is to move back to a continuous schedule just temporarily until your dog responds reliably in spite of those distractions.
Also, when training a dog to perform a behavior when using shaping (a training method that entails rewarding successive approximations of the final behavior) you'll also find yourself rewarding continuously and then intermittently at times as you establish new criterion as you progress.
"Reinforcement may go from predictable to a little unpredictable back to predictable, as you climb, step by step, toward your ultimate goal...Marian Breland Bailey told me she called this a "shaping schedule." It's a natural part of the shaping process."~ Karen Pryor
Tip: If you couple giving a reward with praise (eg. good boy!), your dog will associate those words with something good, so that when you're not giving treats, praise will still have good value to communicate a job well done!
Variable Ratio Reinforcement Schedule or Reinforcement Variety?
Recently, Ken Ramirez pointed out the use of "reinforcement variety." Turns out, what many trainers use is reinforcement variety rather than a variable ratio reinforcement schedule.
Reinforcement variety is preferable because it helps prevent the frustration associated with ratio strain and the process of moving to a variable schedule. What's the difference among the two? Let's take a closer look.
A variable ratio reinforcement schedule as already mentioned, entails reinforcing responses only some of the time. Mary Burch and Jon S. Bailey, in the book "How Dogs Learn" compare the unpredictability of reinforcement delivered, as seen in variable ratio schedule, to the way slot machines, fishing and the lottery work. This means no reinforcement at all is delivered at times and this can cause frustration, perhaps in part because dogs in their heart know they are performing correctly and therefore come to expect it.
However, reinforcement variety offers the opportunity to reinforce the dog at all times, only that the type of reinforcement varies from one time to another and doesn't always involve food. You can switch between different types of primary reinforcers that dogs are naturally drawn to such as food (chicken, hot dogs, freeze-dried liver) and natural activities the dog perceives as reinforcing (going out the yard to explore and exercise, playing with other dogs, chasing a tossed ball, sniffing a bush).
Some reinforcement substitutes can also be mixed in and these include several secondary reinforcers most of us are familiar with (praise, pats, belly rubs or even the opportunity to perform another behavior). Such reinforcers though need to have a strong conditioning history consisting of being paired consistently with primary reinforcers before being used on their own and they also need to be maintained to preserve their reinforcing power.
Did you know? Although playing is a primary reinforcer, toys are considered secondary reinforcers because to a dog, a motionless ball is not rewarding in itself, but becomes valuable once it's associated with play (being wiggled, being tossed).
"If you want an animal to accept variety in the types of reinforcement you offer, that acceptance must be taught. I encourage the use of reinforcement variety early in every animal’s training."~Ken Ramirez
- How Dogs Learn, Mary R. Burch, Jon S. Bailey. Howell Book House 1999
- The Science and Technology of Dog Training, Dogwise Publishing, 2015.