Psychologist B. F. Skinner saw that classical conditioning is limited to existing behaviors that are reflexively elicited, and it doesn’t account for new behaviors such as riding a bike. He proposed a theory about how such behaviors come about. Skinner believed that behavior is motivated by the consequences we receive for the behavior: the reinforcements and punishments. His idea that learning is the result of consequences is based on the law of effect, which was first proposed by psychologist Edward Thorndike . According to the law of effect , behaviors that are followed by consequences that are satisfying to the organism are more likely to be repeated, and behaviors that are followed by unpleasant consequences are less likely to be repeated (Thorndike, 1911). Essentially, if an organism does something that brings about a desired result, the organism is more likely to do it again. If an organism does something that does not bring about a desired result, the organism is less likely to do it again. An example of the law of effect is in employment. One of the reasons (and often the main reason) we show up for work is because we get paid to do so. If we stop getting paid, we will likely stop showing up—even if we love our job.
Working with Thorndike’s law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning (Skinner, 1938). He placed these animals inside an operant conditioning chamber, which has come to be known as a “Skinner box” ( [link] ). A Skinner box contains a lever (for rats) or disk (for pigeons) that the animal can press or peck for a food reward via the dispenser. Speakers and lights can be associated with certain behaviors. A recorder counts the number of responses made by the animal.
Watch this brief video clip to learn more about operant conditioning: Skinner is interviewed, and operant conditioning of pigeons is demonstrated.
In discussing operant conditioning, we use several everyday words—positive, negative, reinforcement, and punishment—in a specialized manner. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something, and negative means you are taking something away. Reinforcement means you are increasing a behavior, and punishment means you are decreasing a behavior. Reinforcement can be positive or negative, and punishment can also be positive or negative. All reinforcers (positive or negative) increase the likelihood of a behavioral response. All punishers (positive or negative) decrease the likelihood of a behavioral response. Now let’s combine these four terms: positive reinforcement, negative reinforcement, positive punishment, and negative punishment ( [link] ).
Reinforcement | Punishment | |
---|---|---|
Positive | Something is to the likelihood of a behavior. | Something is to the likelihood of a behavior. |
Negative | Something is to the likelihood of a behavior. | Something is to the likelihood of a behavior. |
The most effective way to teach a person or animal a new behavior is with positive reinforcement. In positive reinforcement , a desirable stimulus is added to increase a behavior.
For example, you tell your five-year-old son, Jerome, that if he cleans his room, he will get a toy. Jerome quickly cleans his room because he wants a new art set. Let’s pause for a moment. Some people might say, “Why should I reward my child for doing what is expected?” But in fact we are constantly and consistently rewarded in our lives. Our paychecks are rewards, as are high grades and acceptance into our preferred school. Being praised for doing a good job and for passing a driver’s test is also a reward. Positive reinforcement as a learning tool is extremely effective. It has been found that one of the most effective ways to increase achievement in school districts with below-average reading scores was to pay the children to read. Specifically, second-grade students in Dallas were paid $2 each time they read a book and passed a short quiz about the book. The result was a significant increase in reading comprehension (Fryer, 2010). What do you think about this program? If Skinner were alive today, he would probably think this was a great idea. He was a strong proponent of using operant conditioning principles to influence students’ behavior at school. In fact, in addition to the Skinner box, he also invented what he called a teaching machine that was designed to reward small steps in learning (Skinner, 1961)—an early forerunner of computer-assisted learning. His teaching machine tested students’ knowledge as they worked through various school subjects. If students answered questions correctly, they received immediate positive reinforcement and could continue; if they answered incorrectly, they did not receive any reinforcement. The idea was that students would spend additional time studying the material to increase their chance of being reinforced the next time (Skinner, 1961).
In negative reinforcement , an undesirable stimulus is removed to increase a behavior. For example, car manufacturers use the principles of negative reinforcement in their seatbelt systems, which go “beep, beep, beep” until you fasten your seatbelt. The annoying sound stops when you exhibit the desired behavior, increasing the likelihood that you will buckle up in the future. Negative reinforcement is also used frequently in horse training. Riders apply pressure—by pulling the reins or squeezing their legs—and then remove the pressure when the horse performs the desired behavior, such as turning or speeding up. The pressure is the negative stimulus that the horse wants to remove.
Many people confuse negative reinforcement with punishment in operant conditioning, but they are two very different mechanisms. Remember that reinforcement, even when it is negative, always increases a behavior. In contrast, punishment always decreases a behavior. In positive punishment , you add an undesirable stimulus to decrease a behavior. An example of positive punishment is scolding a student to get the student to stop texting in class. In this case, a stimulus (the reprimand) is added in order to decrease the behavior (texting in class). In negative punishment , you remove an aversive stimulus to decrease behavior. For example, when a child misbehaves, a parent can take away a favorite toy. In this case, a stimulus (the toy) is removed in order to decrease the behavior.
Punishment, especially when it is immediate, is one way to decrease undesirable behavior. For example, imagine your four-year-old son, Brandon, hit his younger brother. You have Brandon write 100 times “I will not hit my brother” (positive punishment). Chances are he won’t repeat this behavior. While strategies like this are common today, in the past children were often subject to physical punishment, such as spanking. It’s important to be aware of some of the drawbacks in using physical punishment on children. First, punishment may teach fear. Brandon may become fearful of the street, but he also may become fearful of the person who delivered the punishment—you, his parent. Similarly, children who are punished by teachers may come to fear the teacher and try to avoid school (Gershoff et al., 2010). Consequently, most schools in the United States have banned corporal punishment. Second, punishment may cause children to become more aggressive and prone to antisocial behavior and delinquency (Gershoff, 2002). They see their parents resort to spanking when they become angry and frustrated, so, in turn, they may act out this same behavior when they become angry and frustrated. For example, because you spank Brenda when you are angry with her for her misbehavior, she might start hitting her friends when they won’t share their toys.
While positive punishment can be effective in some cases, Skinner suggested that the use of punishment should be weighed against the possible negative effects. Today’s psychologists and parenting experts favor reinforcement over punishment—they recommend that you catch your child doing something good and reward her for it.
In his operant conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target behavior, in shaping , we reward successive approximations of a target behavior. Why is shaping needed? Remember that in order for reinforcement to work, the organism must first display the behavior. Shaping is needed because it is extremely unlikely that an organism will display anything but the simplest of behaviors spontaneously. In shaping, behaviors are broken down into many small, achievable steps. The specific steps used in the process are the following:
Shaping is often used in teaching a complex behavior or chain of behaviors. Skinner used shaping to teach pigeons not only such relatively simple behaviors as pecking a disk in a Skinner box, but also many unusual and entertaining behaviors, such as turning in circles, walking in figure eights, and even playing ping pong; the technique is commonly used by animal trainers today. An important part of shaping is stimulus discrimination. Recall Pavlov’s dogs—he trained them to respond to the tone of a bell, and not to similar tones or sounds. This discrimination is also important in operant conditioning and in shaping behavior.
Here is a brief video of Skinner’s pigeons playing ping pong.
It’s easy to see how shaping is effective in teaching behaviors to animals, but how does shaping work with humans? Let’s consider parents whose goal is to have their child learn to clean his room. They use shaping to help him master steps toward the goal. Instead of performing the entire task, they set up these steps and reinforce each step. First, he cleans up one toy. Second, he cleans up five toys. Third, he chooses whether to pick up ten toys or put his books and clothes away. Fourth, he cleans up everything except two toys. Finally, he cleans his entire room.
Rewards such as stickers, praise, money, toys, and more can be used to reinforce learning. Let’s go back to Skinner’s rats again. How did the rats learn to press the lever in the Skinner box? They were rewarded with food each time they pressed the lever. For animals, food would be an obvious reinforcer.
What would be a good reinforce for humans? For your daughter Sydney, it was the promise of a toy if she cleaned her room. How about Joaquin, the soccer player? If you gave Joaquin a piece of candy every time he made a goal, you would be using a primary reinforcer . Primary reinforcers are reinforcers that have innate reinforcing qualities. These kinds of reinforcers are not learned. Water, food, sleep, shelter, sex, and touch, among others, are primary reinforcers. Pleasure is also a primary reinforcer. Organisms do not lose their drive for these things. For most people, jumping in a cool lake on a very hot day would be reinforcing and the cool lake would be innately reinforcing—the water would cool the person off (a physical need), as well as provide pleasure.
A secondary reinforcer has no inherent value and only has reinforcing qualities when linked with a primary reinforcer. Praise, linked to affection, is one example of a secondary reinforcer, as when you called out “Great shot!” every time Joaquin made a goal. Another example, money, is only worth something when you can use it to buy other things—either things that satisfy basic needs (food, water, shelter—all primary reinforcers) or other secondary reinforcers. If you were on a remote island in the middle of the Pacific Ocean and you had stacks of money, the money would not be useful if you could not spend it. What about the stickers on the behavior chart? They also are secondary reinforcers.
Sometimes, instead of stickers on a sticker chart, a token is used. Tokens, which are also secondary reinforcers, can then be traded in for rewards and prizes. Entire behavior management systems, known as token economies, are built around the use of these kinds of token reinforcers. Token economies have been found to be very effective at modifying behavior in a variety of settings such as schools, prisons, and mental hospitals. For example, a study by Cangi and Daly (2013) found that use of a token economy increased appropriate social behaviors and reduced inappropriate behaviors in a group of autistic school children. Autistic children tend to exhibit disruptive behaviors such as pinching and hitting. When the children in the study exhibited appropriate behavior (not hitting or pinching), they received a “quiet hands” token. When they hit or pinched, they lost a token. The children could then exchange specified amounts of tokens for minutes of playtime.
Parents and teachers often use behavior modification to change a child’s behavior. Behavior modification uses the principles of operant conditioning to accomplish behavior change so that undesirable behaviors are switched for more socially acceptable ones. Some teachers and parents create a sticker chart, in which several behaviors are listed ( [link] ). Sticker charts are a form of token economies, as described in the text. Each time children perform the behavior, they get a sticker, and after a certain number of stickers, they get a prize, or reinforcer. The goal is to increase acceptable behaviors and decrease misbehavior. Remember, it is best to reinforce desired behaviors, rather than to use punishment. In the classroom, the teacher can reinforce a wide range of behaviors, from students raising their hands, to walking quietly in the hall, to turning in their homework. At home, parents might create a behavior chart that rewards children for things such as putting away toys, brushing their teeth, and helping with dinner. In order for behavior modification to be effective, the reinforcement needs to be connected with the behavior; the reinforcement must matter to the child and be done consistently.
Time-out is another popular technique used in behavior modification with children. It operates on the principle of negative punishment. When a child demonstrates an undesirable behavior, she is removed from the desirable activity at hand ( [link] ). For example, say that Sophia and her brother Mario are playing with building blocks. Sophia throws some blocks at her brother, so you give her a warning that she will go to time-out if she does it again. A few minutes later, she throws more blocks at Mario. You remove Sophia from the room for a few minutes. When she comes back, she doesn’t throw blocks.
There are several important points that you should know if you plan to implement time-out as a behavior modification technique. First, make sure the child is being removed from a desirable activity and placed in a less desirable location. If the activity is something undesirable for the child, this technique will backfire because it is more enjoyable for the child to be removed from the activity. Second, the length of the time-out is important. The general rule of thumb is one minute for each year of the child’s age. Sophia is five; therefore, she sits in a time-out for five minutes. Setting a timer helps children know how long they have to sit in time-out. Finally, as a caregiver, keep several guidelines in mind over the course of a time-out: remain calm when directing your child to time-out; ignore your child during time-out (because caregiver attention may reinforce misbehavior); and give the child a hug or a kind word when time-out is over.
Remember, the best way to teach a person or animal a behavior is to use positive reinforcement. For example, Skinner used positive reinforcement to teach rats to press a lever in a Skinner box. At first, the rat might randomly hit the lever while exploring the box, and out would come a pellet of food. After eating the pellet, what do you think the hungry rat did next? It hit the lever again, and received another pellet of food. Each time the rat hit the lever, a pellet of food came out. When an organism receives a reinforcer each time it displays a behavior, it is called continuous reinforcement . This reinforcement schedule is the quickest way to teach someone a behavior, and it is especially effective in training a new behavior. Let’s look back at the dog that was learning to sit earlier in the chapter. Now, each time he sits, you give him a treat. Timing is important here: you will be most successful if you present the reinforcer immediately after he sits, so that he can make an association between the target behavior (sitting) and the consequence (getting a treat).
Watch this video clip where veterinarian Dr. Sophia Yin shapes a dog’s behavior using the steps outlined above.
Once a behavior is trained, researchers and trainers often turn to another type of reinforcement schedule—partial reinforcement. In partial reinforcement , also referred to as intermittent reinforcement, the person or animal does not get reinforced every time they perform the desired behavior. There are several different types of partial reinforcement schedules ( [link] ). These schedules are described as either fixed or variable, and as either interval or ratio. Fixed refers to the number of responses between reinforcements, or the amount of time between reinforcements, which is set and unchanging. Variable refers to the number of responses or amount of time between reinforcements, which varies or changes. Interval means the schedule is based on the time between reinforcements, and ratio means the schedule is based on the number of responses between reinforcements.
Reinforcement Schedule | Description | Result | Example |
---|---|---|---|
Fixed interval | Reinforcement is delivered at predictable time intervals (e.g., after 5, 10, 15, and 20 minutes). | Moderate response rate with significant pauses after reinforcement | Hospital patient uses patient-controlled, doctor-timed pain relief |
Variable interval | Reinforcement is delivered at unpredictable time intervals (e.g., after 5, 7, 10, and 20 minutes). | Moderate yet steady response rate | Checking Facebook |
Fixed ratio | Reinforcement is delivered after a predictable number of responses (e.g., after 2, 4, 6, and 8 responses). | High response rate with pauses after reinforcement | Piecework—factory worker getting paid for every x number of items manufactured |
Variable ratio | Reinforcement is delivered after an unpredictable number of responses (e.g., after 1, 4, 5, and 9 responses). | High and steady response rate | Gambling |
Now let’s combine these four terms. A fixed interval reinforcement schedule is when behavior is rewarded after a set amount of time. For example, June undergoes major surgery in a hospital. During recovery, she is expected to experience pain and will require prescription medications for pain relief. June is given an IV drip with a patient-controlled painkiller. Her doctor sets a limit: one dose per hour. June pushes a button when pain becomes difficult, and she receives a dose of medication. Since the reward (pain relief) only occurs on a fixed interval, there is no point in exhibiting the behavior when it will not be rewarded.
With a variable interval reinforcement schedule , the person or animal gets the reinforcement based on varying amounts of time, which are unpredictable. Say that Manuel is the manager at a fast-food restaurant. Every once in a while someone from the quality control division comes to Manuel’s restaurant. If the restaurant is clean and the service is fast, everyone on that shift earns a $20 bonus. Manuel never knows when the quality control person will show up, so he always tries to keep the restaurant clean and ensures that his employees provide prompt and courteous service. His productivity regarding prompt service and keeping a clean restaurant are steady because he wants his crew to earn the bonus.
With a fixed ratio reinforcement schedule , there are a set number of responses that must occur before the behavior is rewarded. Carla sells glasses at an eyeglass store, and she earns a commission every time she sells a pair of glasses. She always tries to sell people more pairs of glasses, including prescription sunglasses or a backup pair, so she can increase her commission. She does not care if the person really needs the prescription sunglasses, Carla just wants her bonus. The quality of what Carla sells does not matter because her commission is not based on quality; it’s only based on the number of pairs sold. This distinction in the quality of performance can help determine which reinforcement method is most appropriate for a particular situation. Fixed ratios are better suited to optimize the quantity of output, whereas a fixed interval, in which the reward is not quantity based, can lead to a higher quality of output.
In a variable ratio reinforcement schedule , the number of responses needed for a reward varies. This is the most powerful partial reinforcement schedule. An example of the variable ratio reinforcement schedule is gambling. Imagine that Sarah—generally a smart, thrifty woman—visits Las Vegas for the first time. She is not a gambler, but out of curiosity she puts a quarter into the slot machine, and then another, and another. Nothing happens. Two dollars in quarters later, her curiosity is fading, and she is just about to quit. But then, the machine lights up, bells go off, and Sarah gets 50 quarters back. That’s more like it! Sarah gets back to inserting quarters with renewed interest, and a few minutes later she has used up all her gains and is $10 in the hole. Now might be a sensible time to quit. And yet, she keeps putting money into the slot machine because she never knows when the next reinforcement is coming. She keeps thinking that with the next quarter she could win $50, or $100, or even more. Because the reinforcement schedule in most types of gambling has a variable ratio schedule, people keep trying and hoping that the next time they will win big. This is one of the reasons that gambling is so addictive—and so resistant to extinction.
In operant conditioning, extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very slowly, as described above. But in the other reinforcement schedules, extinction may come quickly. For example, if June presses the button for the pain relief medication before the allotted time her doctor has approved, no medication is administered. She is on a fixed interval reinforcement schedule (dosed hourly), so extinction occurs quickly when reinforcement doesn’t come at the expected time. Among the reinforcement schedules, variable ratio is the most productive and the most resistant to extinction. Fixed interval is the least productive and the easiest to extinguish ( [link] ).
Skinner (1953) stated, “If the gambling establishment cannot persuade a patron to turn over money with no return, it may achieve the same effect by returning part of the patron’s money on a variable-ratio schedule” (p. 397).
Skinner uses gambling as an example of the power and effectiveness of conditioning behavior based on a variable ratio reinforcement schedule. In fact, Skinner was so confident in his knowledge of gambling addiction that he even claimed he could turn a pigeon into a pathological gambler (“Skinner’s Utopia,” 1971). Beyond the power of variable ratio reinforcement, gambling seems to work on the brain in the same way as some addictive drugs. The Illinois Institute for Addiction Recovery (n.d.) reports evidence suggesting that pathological gambling is an addiction similar to a chemical addiction ( [link] ). Specifically, gambling may activate the reward centers of the brain, much like cocaine does. Research has shown that some pathological gamblers have lower levels of the neurotransmitter (brain chemical) known as norepinephrine than do normal gamblers (Roy, et al., 1988). According to a study conducted by Alec Roy and colleagues, norepinephrine is secreted when a person feels stress, arousal, or thrill; pathological gamblers use gambling to increase their levels of this neurotransmitter. Another researcher, neuroscientist Hans Breiter, has done extensive research on gambling and its effects on the brain. Breiter (as cited in Franzen, 2001) reports that “Monetary reward in a gambling-like experiment produces brain activation very similar to that observed in a cocaine addict receiving an infusion of cocaine” (para. 1). Deficiencies in serotonin (another neurotransmitter) might also contribute to compulsive behavior, including a gambling addiction.
It may be that pathological gamblers’ brains are different than those of other people, and perhaps this difference may somehow have led to their gambling addiction, as these studies seem to suggest. However, it is very difficult to ascertain the cause because it is impossible to conduct a true experiment (it would be unethical to try to turn randomly assigned participants into problem gamblers). Therefore, it may be that causation actually moves in the opposite direction—perhaps the act of gambling somehow changes neurotransmitter levels in some gamblers’ brains. It also is possible that some overlooked factor, or confounding variable, played a role in both the gambling addiction and the differences in brain chemistry.
Although strict behaviorists such as Skinner and Watson refused to believe that cognition (such as thoughts and expectations) plays a role in learning, another behaviorist, Edward C. Tolman , had a different opinion. Tolman’s experiments with rats demonstrated that organisms can learn even if they do not receive immediate reinforcement (Tolman & Honzik, 1930; Tolman, Ritchie, & Kalish, 1946). This finding was in conflict with the prevailing idea at the time that reinforcement must be immediate in order for learning to occur, thus suggesting a cognitive aspect to learning.
In the experiments, Tolman placed hungry rats in a maze with no reward for finding their way through it. He also studied a comparison group that was rewarded with food at the end of the maze. As the unreinforced rats explored the maze, they developed a cognitive map : a mental picture of the layout of the maze ( [link] ). After 10 sessions in the maze without reinforcement, food was placed in a goal box at the end of the maze. As soon as the rats became aware of the food, they were able to find their way through the maze quickly, just as quickly as the comparison group, which had been rewarded with food all along. This is known as latent learning : learning that occurs but is not observable in behavior until there is a reason to demonstrate it.
Latent learning also occurs in humans. Children may learn by watching the actions of their parents but only demonstrate it at a later date, when the learned material is needed. For example, suppose that Ravi’s dad drives him to school every day. In this way, Ravi learns the route from his house to his school, but he’s never driven there himself, so he has not had a chance to demonstrate that he’s learned the way. One morning Ravi’s dad has to leave early for a meeting, so he can’t drive Ravi to school. Instead, Ravi follows the same route on his bike that his dad would have taken in the car. This demonstrates latent learning. Ravi had learned the route to school, but had no need to demonstrate this knowledge earlier.
Have you ever gotten lost in a building and couldn’t find your way back out? While that can be frustrating, you’re not alone. At one time or another we’ve all gotten lost in places like a museum, hospital, or university library. Whenever we go someplace new, we build a mental representation—or cognitive map—of the location, as Tolman’s rats built a cognitive map of their maze. However, some buildings are confusing because they include many areas that look alike or have short lines of sight. Because of this, it’s often difficult to predict what’s around a corner or decide whether to turn left or right to get out of a building. Psychologist Laura Carlson (2010) suggests that what we place in our cognitive map can impact our success in navigating through the environment. She suggests that paying attention to specific features upon entering a building, such as a picture on the wall, a fountain, a statue, or an escalator, adds information to our cognitive map that can be used later to help find our way out of the building.
Watch this video to learn more about Carlson’s studies on cognitive maps and navigation in buildings.
Operant conditioning is based on the work of B. F. Skinner. Operant conditioning is a form of learning in which the motivation for a behavior happens after the behavior is demonstrated. An animal or a human receives a consequence after performing a specific behavior. The consequence is either a reinforcer or a punisher. All reinforcement (positive or negative) increases the likelihood of a behavioral response. All punishment (positive or negative) decreases the likelihood of a behavioral response. Several types of reinforcement schedules are used to reward behavior depending on either a set or variable period of time.
________ is when you take away a pleasant stimulus to stop a behavior.
Which of the following is not an example of a primary reinforcer?
Rewarding successive approximations toward a target behavior is ________.
Slot machines reward gamblers with money according to which reinforcement schedule?
What is a Skinner box and what is its purpose?
A Skinner box is an operant conditioning chamber used to train animals such as rats and pigeons to perform certain behaviors, like pressing a lever. When the animals perform the desired behavior, they receive a reward: food or water.
What is the difference between negative reinforcement and punishment?
In negative reinforcement you are taking away an undesirable stimulus in order to increase the frequency of a certain behavior (e.g., buckling your seat belt stops the annoying beeping sound in your car and increases the likelihood that you will wear your seatbelt). Punishment is designed to reduce a behavior (e.g., you scold your child for running into the street in order to decrease the unsafe behavior.)
What is shaping and how would you use shaping to teach a dog to roll over?
Shaping is an operant conditioning method in which you reward closer and closer approximations of the desired behavior. If you want to teach your dog to roll over, you might reward him first when he sits, then when he lies down, and then when he lies down and rolls onto his back. Finally, you would reward him only when he completes the entire sequence: lying down, rolling onto his back, and then continuing to roll over to his other side.
Explain the difference between negative reinforcement and punishment, and provide several examples of each based on your own experiences.
Think of a behavior that you have that you would like to change. How could you use behavior modification, specifically positive reinforcement, to change your behavior? What is your positive reinforcer?
Operant Conditioning Copyright © 2014 by OpenStaxCollege is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.
Operant conditioning, sometimes called instrumental conditioning or Skinnerian conditioning , is a method of learning that uses rewards and punishment to modify behavior. Through operant conditioning, behavior that is rewarded is likely to be repeated, while behavior that is punished is prone to happen less.
For example, when you are rewarded at work with a performance bonus for exceptional work, you will want to continue performing at a higher level in hopes of receiving another bonus in the future. Because this behavior was followed by a positive outcome, the behavior will likely be repeated.
Operant conditioning was first described by psychologist B.F. Skinner. His theory was based on two assumptions. First, the cause of human behavior is something in a person’s environment. Second, the consequences of a behavior determine the possibility of it being repeated. Behaviors followed by a pleasant consequence are likely to be repeated and those followed by an unpleasant consequence are less likely to be repeated.
Through his experiments, Skinner identified three types of responses that followed behavior:
History of the theory
Though Skinner introduced the theory of operant conditioning, he was influenced by the work of another psychologist, Edward Lee Thorndike.
In 1905, Thorndike proposed a theory of behavior called the “law of effect.” It stated that if you behave in a certain way and you like the result of your behavior, you’re likely to behave that way again. If you don’t like the result of your behavior, you’re less likely to repeat it.
Thorndike put cats in a box to test his theory. If the cat found and pushed a lever, the box would open, and the cat would be rewarded with a piece of fish. The more they repeated this behavior, the more they were rewarded. So, the cats quickly learned to go right to the lever and push it.
The idea: positive results reinforce behaviors, making you more likely to repeat the same behaviors later on.
John B. Watson was another psychologist who influenced Skinner and his theory of operant conditioning. He studied behavior that could be observed and how that behavior could be controlled, as well as the ways that behaviors are learned. In fact, he coined the term “behaviorism,” a field of psychology focused on how things are learned.
When Skinner came along to advance this theory, he created his own box. In went pigeons and rats -- though not at the same time -- who quickly learned that certain behaviors brought them rewards of food.
He described his pigeons and rats as “free operants.” That meant they were free to behave how they wanted in their environment (the box). However, their behaviors were shaped or conditioned by what happened after their previous displays of those behaviors.
These two are very different. In operant conditioning, the results of your past behaviors have conditioned you to either repeat or avoid those behaviors. For example, your parents reward you for getting an ‘A’ on a test that requires you to study hard. As a result, you become more likely to study hard in the future in anticipation of more rewards.
Classical conditioning is used to train people or animals to respond automatically to certain triggers. The most famous example -- Pavlov’s dogs.
Ivan Pavlov was a Russian psychologist. He observed that dogs salivated when food was put in front of them. That’s natural, or what’s called an unconditioned response.
But then Pavlov noticed that the dogs began to salivate shortly before their food arrived, possibly because the sound of the food cart triggered their anticipation of mealtime. In his experiment, at mealtimes, he sounded a bell shortly before the food arrived. Eventually, the dogs began to salivate when they heard the bell. That was a trained, or conditioned, response to the sound of the bell.
You likely experience classical conditioning every day. How? Advertising. Companies use advertisements in hopes that you will associate something positive with their product, leading you to spend money on it.
In operant behavior, the way you choose to behave today is influenced by the consequences of that behavior in the past. Those consequences will either encourage and reinforce that behavior, or they will discourage and punish that behavior.
An example: When you were a kid, did you get sent to your room when you hit your sibling? That consequence, your parents hoped, would discourage you from doing that again.
Reinforcement and punishment in operant conditioning
Reinforcement and punishment are two ways to encourage or discourage behaviors. In the example above, the punishment of being sent to your room ideally will discourage you from behaving in the same way in the future.
But what if you behave in a way that your parents want to encourage, such as sharing toys with a younger sibling? Your parents can reinforce that behavior by rewarding you, perhaps with praise.
Reinforcement and punishment both can be positive or negative. Let’s take a quick look at each.
Types of operant behaviors
B.F. Skinner divided behavior into two different types: respondent and operant.
Respondent behavior. This is the type of behavior that you can’t control. It’s Skinner’s term for what happened with Pavlov’s dogs -- when they heard a bell, they responded by salivating. It was a reflex, not a choice. People have respondent behaviors, too. If someone puts your favorite food in front of you, you likely will start salivating, just like Pavlov’s dogs.
Operant behavior. These are voluntary behaviors that you choose to do based on previous consequences. You choose to behave in a certain way to get an expected result. For example, you study hard in anticipation of a reward from your parents. Or if you get punished for talking back to your parents, you are more likely to choose not to do that in the future.
Positive reinforcement involves providing a pleasant stimulus to increase the likelihood of a behavior happening in the future. For example, if your child does chores without being asked, you can reward them by taking them to a park or giving them a treat.
Skinner used a hungry rat in a Skinner box to show how positive reinforcement works. The box contained a lever on the side, and as the rat moved about the box, it would accidentally knock the lever. Immediately after it did so, a food pellet would drop into a container next to the lever. The consequence of receiving food every time the rat hit the lever ensured that the animal repeated the action again and again.
Positive reinforcement doesn't have to involve tangible items. Instead, you can positively reinforce your child through:
In negative reinforcement , something unpleasant happens in response to a stimulus. Over time, the behavior increases with the expectation that the aversive stimulant will be taken away. If, for example, a child refuses to eat vegetables at dinner time and a parent responds by taking the vegetables away, the removal of the vegetables is negative reinforcement.
A reinforcement schedule is a component of operant conditioning that states which behaviors will be reinforced. It involves a set of rules determined by the time and number of responses required to present or remove a reinforcer.
Different patterns of reinforcement have specific effects on the speed of learning. Schedules of reinforcement include:
In operant conditioning, punishment is defined as any change to the surrounding environment that reduces the probability of responses or behavior happening again. Punishment can work either by directly applying an unpleasant stimulus such as scolding or by removing a potentially rewarding stimulus, such as deducting someone’s daily allowance to punish undesirable behavior.
While punishment is efficient in decreasing undesirable behavior, it is associated with many problems such as:
How does operant conditioning work in real life? Let’s look at a few examples in different scenarios.
Operant condition in parenting
If you have children, you know they don’t always behave as you want them to. To change that behavior, you’ve probably tried operant conditioning even if you didn't know the name for it. For example, you have a child who doesn’t clean his room, so you offer an extra 15 minutes of screen time or another reward for time spent cleaning. Your child begins to associate keeping a clean room with getting something they want. That’s positive reinforcement operant conditioning, in which something is given in order to encourage a behavior.
Operant conditioning at school
The above was an example of positive reinforcement operant conditioning. Here’s an example of negative punishment operant conditioning, in which something gets taken away in order to discourage a behavior. This one’s set in the classroom. You’ve been acting up, talking nonstop during class. Your teacher tells you to stop or you won’t be allowed to go outside with your classmates during recess. That’s not something you want, so you change your behavior.
Operant conditioning at work
Here’s an example of negative reinforcement operant conditioning in the workplace. Remember that negative reinforcement means removing an unpleasant stimulus to encourage a behavior. You’re past your deadline for a big project, but you’ve been procrastinating. Your boss keeps emailing you to ask when you’ll be done. The only way to stop those emails is to get to work and finish the assignment.
Operant conditioning in relationships
Now let’s turn to positive punishment, in which something is added in order to discourage a behavior. Let’s say it’s your turn to take out the trash, which has really begun to smell, but you are glued to your phone. Eventually, your partner snaps at you and demands you get the trash out of the house now . Next time, to avoid getting scolded, you will be more likely to take the trash out earlier.
Operant conditioning in therapy
Some types of behavior therapy will use operant conditioning to help patients change their behaviors. For example, operant conditioning has been an effective method to help children with autism . The rewards they receive for behaving in a specific way will encourage them to continue that behavior.
The token economy is a system used in behavioral modification programs where desirable behaviors are reinforced using tangible rewards such as tokens, fake money, food, stickers, poker chips, or buttons that are later exchanged for rewards. In a hospital setting, for example, rewards of token money may be offered in exchange for food, access to television, and other bonuses.
A token economy has not only proven effective in managing psychiatric patients but also in school. This system can be used in classrooms to reduce disruptive behavior and increase academic engagement.
Behaviors can be complex, and teaching them requires what Skinner called shaping. In shaping, complex behaviors get broken down into several simpler behaviors that make up the complex behavior. These are then taught in succession, with reinforcement along the way, until the complex behavior is learned. An example will demonstrate how this works.
You want your child to learn to use the toilet independently, but that’s not a simple behavior. You take them through the individual behaviors, or steps, such as sitting on the potty with their clothes on. When they do this, they get a reward. Next, you want them to sit on the toilet without a diaper. That behavior again earns them a reward. With rewards at each successive step to shape their behavior, they will eventually reach the goal behavior of using the toilet by themselves.
Operant conditioning is a powerful tool for changing behaviors. It’s based on a psychological theory that states that the consequences of our behavior will either encourage or discourage us from continuing that behavior. Parents and teachers, for example, use operant conditioning to help kids learn how to behave appropriately.
What are the four types of operant conditioning?
Find more top doctors on, related links.
A look at operant conditioning as a process of learning, and how skinner's box experiments demonstrated the effect of reinforcements on behavior..
Permalink Print |
Best online psychology theory resource.
Learn psychology.
© 2024 Psychologist World. Home About Contact Us Terms of Use Privacy & Cookies Hypnosis Scripts Sign Up
Learning objectives.
By the end of this section, you will be able to:
The previous section of this chapter focused on the type of associative learning known as classical conditioning. Remember that in classical conditioning, something in the environment triggers a reflex automatically, and researchers train the organism to react to a different stimulus. Now we turn to the second type of associative learning, operant conditioning . In operant conditioning, organisms learn to associate a behavior and its consequence ( Table 6.1 ). A pleasant consequence makes that behavior more likely to be repeated in the future. For example, Spirit, a dolphin at the National Aquarium in Baltimore, does a flip in the air when her trainer blows a whistle. The consequence is that she gets a fish.
Classical Conditioning | Operant Conditioning | |
---|---|---|
Conditioning approach | An unconditioned stimulus (such as food) is paired with a neutral stimulus (such as a bell). The neutral stimulus eventually becomes the conditioned stimulus, which brings about the conditioned response (salivation). | The target behavior is followed by reinforcement or punishment to either strengthen or weaken it, so that the learner is more likely to exhibit the desired behavior in the future. |
Stimulus timing | The stimulus occurs immediately before the response. | The stimulus (either reinforcement or punishment) occurs soon after the response. |
Psychologist B. F. Skinner saw that classical conditioning is limited to existing behaviors that are reflexively elicited, and it doesn’t account for new behaviors such as riding a bike. He proposed a theory about how such behaviors come about. Skinner believed that behavior is motivated by the consequences we receive for the behavior: the reinforcements and punishments. His idea that learning is the result of consequences is based on the law of effect, which was first proposed by psychologist Edward Thorndike . According to the law of effect , behaviors that are followed by consequences that are satisfying to the organism are more likely to be repeated, and behaviors that are followed by unpleasant consequences are less likely to be repeated (Thorndike, 1911). Essentially, if an organism does something that brings about a desired result, the organism is more likely to do it again. If an organism does something that does not bring about a desired result, the organism is less likely to do it again. An example of the law of effect is in employment. One of the reasons (and often the main reason) we show up for work is because we get paid to do so. If we stop getting paid, we will likely stop showing up—even if we love our job.
Working with Thorndike’s law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning (Skinner, 1938). He placed these animals inside an operant conditioning chamber, which has come to be known as a “Skinner box” ( Figure 6.10 ). A Skinner box contains a lever (for rats) or disk (for pigeons) that the animal can press or peck for a food reward via the dispenser. Speakers and lights can be associated with certain behaviors. A recorder counts the number of responses made by the animal.
Watch this brief video to see Skinner's interview and a demonstration of operant conditioning of pigeons to learn more.
In discussing operant conditioning, we use several everyday words—positive, negative, reinforcement, and punishment—in a specialized manner. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something, and negative means you are taking something away. Reinforcement means you are increasing a behavior, and punishment means you are decreasing a behavior. Reinforcement can be positive or negative, and punishment can also be positive or negative. All reinforcers (positive or negative) increase the likelihood of a behavioral response. All punishers (positive or negative) decrease the likelihood of a behavioral response. Now let’s combine these four terms: positive reinforcement, negative reinforcement, positive punishment, and negative punishment ( Table 6.2 ).
Reinforcement | Punishment | |
---|---|---|
Positive | Something is to the likelihood of a behavior. | Something is to the likelihood of a behavior. |
Negative | Something is to the likelihood of a behavior. | Something is to the likelihood of a behavior. |
The most effective way to teach a person or animal a new behavior is with positive reinforcement. In positive reinforcement , a desirable stimulus is added to increase a behavior.
For example, you tell your five-year-old son, Jerome, that if he cleans his room, he will get a toy. Jerome quickly cleans his room because he wants a new art set. Let’s pause for a moment. Some people might say, “Why should I reward my child for doing what is expected?” But in fact we are constantly and consistently rewarded in our lives. Our paychecks are rewards, as are high grades and acceptance into our preferred school. Being praised for doing a good job and for passing a driver’s test is also a reward. Positive reinforcement as a learning tool is extremely effective. It has been found that one of the most effective ways to increase achievement in school districts with below-average reading scores was to pay the children to read. Specifically, second-grade students in Dallas were paid $2 each time they read a book and passed a short quiz about the book. The result was a significant increase in reading comprehension (Fryer, 2010). What do you think about this program? If Skinner were alive today, he would probably think this was a great idea. He was a strong proponent of using operant conditioning principles to influence students’ behavior at school. In fact, in addition to the Skinner box, he also invented what he called a teaching machine that was designed to reward small steps in learning (Skinner, 1961)—an early forerunner of computer-assisted learning. His teaching machine tested students’ knowledge as they worked through various school subjects. If students answered questions correctly, they received immediate positive reinforcement and could continue; if they answered incorrectly, they did not receive any reinforcement. The idea was that students would spend additional time studying the material to increase their chance of being reinforced the next time (Skinner, 1961).
In negative reinforcement , an undesirable stimulus is removed to increase a behavior. For example, car manufacturers use the principles of negative reinforcement in their seatbelt systems, which go “beep, beep, beep” until you fasten your seatbelt. The annoying sound stops when you exhibit the desired behavior, increasing the likelihood that you will buckle up in the future. Negative reinforcement is also used frequently in horse training. Riders apply pressure—by pulling the reins or squeezing their legs—and then remove the pressure when the horse performs the desired behavior, such as turning or speeding up. The pressure is the negative stimulus that the horse wants to remove.
Many people confuse negative reinforcement with punishment in operant conditioning, but they are two very different mechanisms. Remember that reinforcement, even when it is negative, always increases a behavior. In contrast, punishment always decreases a behavior. In positive punishment , you add an undesirable stimulus to decrease a behavior. An example of positive punishment is scolding a student to get the student to stop texting in class. In this case, a stimulus (the reprimand) is added in order to decrease the behavior (texting in class). In negative punishment , you remove a pleasant stimulus to decrease behavior. For example, when a child misbehaves, a parent can take away a favorite toy. In this case, a stimulus (the toy) is removed in order to decrease the behavior.
Punishment, especially when it is immediate, is one way to decrease undesirable behavior. For example, imagine your five-year-old son, Brandon, runs out into the street to chase a ball. You have Brandon write 100 times “I will not run into the street" (positive punishment). Chances are he won’t repeat this behavior. While strategies like this are common today, in the past children were often subject to physical punishment, such as spanking. It’s important to be aware of some of the drawbacks in using physical punishment on children. First, punishment may teach fear. Brandon may become fearful of the street, but he also may become fearful of the person who delivered the punishment—you, his parent. Similarly, children who are punished by teachers may come to fear the teacher and try to avoid school (Gershoff et al., 2010). Consequently, most schools in the United States have banned corporal punishment. Second, punishment may cause children to become more aggressive and prone to antisocial behavior and delinquency (Gershoff, 2002). They see their parents resort to spanking when they become angry and frustrated, so, in turn, they may act out this same behavior when they become angry and frustrated. For example, if you spank your child when you are angry with them for their misbehavior, they might start hitting their friends when they won’t share their toys.
While positive punishment can be effective in some cases, Skinner suggested that the use of punishment should be weighed against the possible negative effects. Today’s psychologists and parenting experts favor reinforcement over punishment—they recommend that you catch your child doing something good and reward them for it.
In his operant conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target behavior, in shaping , we reward successive approximations of a target behavior. Why is shaping needed? Remember that in order for reinforcement to work, the organism must first display the behavior. Shaping is needed because it is extremely unlikely that an organism will display anything but the simplest of behaviors spontaneously. In shaping, behaviors are broken down into many small, achievable steps. The specific steps used in the process are the following:
Shaping is often used in teaching a complex behavior or chain of behaviors. Skinner used shaping to teach pigeons not only such relatively simple behaviors as pecking a disk in a Skinner box, but also many unusual and entertaining behaviors, such as turning in circles, walking in figure eights, and even playing ping pong; the technique is commonly used by animal trainers today. An important part of shaping is stimulus discrimination. Recall Pavlov’s dogs—he trained them to respond to the tone of a bell, and not to similar tones or sounds. This discrimination is also important in operant conditioning and in shaping behavior.
Watch this brief video of Skinner's pigeons playing ping pong to learn more.
It’s easy to see how shaping is effective in teaching behaviors to animals, but how does shaping work with humans? Let’s consider parents whose goal is to have their child learn to clean his room. They use shaping to help him master steps toward the goal. Instead of performing the entire task, they set up these steps and reinforce each step. First, he cleans up one toy. Second, he cleans up five toys. Third, he chooses whether to pick up ten toys or put his books and clothes away. Fourth, he cleans up everything except two toys. Finally, he cleans his entire room.
Rewards such as stickers, praise, money, toys, and more can be used to reinforce learning. Let’s go back to Skinner’s rats again. How did the rats learn to press the lever in the Skinner box? They were rewarded with food each time they pressed the lever. For animals, food would be an obvious reinforcer.
What would be a good reinforcer for humans? For your child cleaning the room, it was the promise of a toy. How about Sydney, the soccer player? If you gave Sydney a piece of candy every time Sydney scored a goal, you would be using a primary reinforcer . Primary reinforcers are reinforcers that have innate reinforcing qualities. These kinds of reinforcers are not learned. Water, food, sleep, shelter, sex, and touch, among others, are primary reinforcers. Pleasure is also a primary reinforcer. Organisms do not lose their drive for these things. For most people, jumping in a cool lake on a very hot day would be reinforcing and the cool lake would be innately reinforcing—the water would cool the person off (a physical need), as well as provide pleasure.
A secondary reinforcer has no inherent value and only has reinforcing qualities when linked with a primary reinforcer. Praise, linked to affection, is one example of a secondary reinforcer, as when you called out “Great shot!” every time Sydney made a goal. Another example, money, is only worth something when you can use it to buy other things—either things that satisfy basic needs (food, water, shelter—all primary reinforcers) or other secondary reinforcers. If you were on a remote island in the middle of the Pacific Ocean and you had stacks of money, the money would not be useful if you could not spend it. What about the stickers on the behavior chart? They are also secondary reinforcers.
Sometimes, instead of stickers on a sticker chart, a token is used. Tokens, which are also secondary reinforcers, can then be traded in for rewards and prizes. Entire behavior management systems, known as token economies, are built around the use of these kinds of token reinforcers. Token economies have been found to be very effective at modifying behavior in a variety of settings such as schools, prisons, and mental hospitals. For example, a study by Adibsereshki and Abkenar (2014) found that use of a token economy increased appropriate social behaviors and reduced inappropriate behaviors in a group of eight grade students. Similar studies show demonstrable gains on behavior and academic achievement for groups ranging from first grade to high school, and representing a wide array of abilities and disabilities. For example, during studies involving younger students, when children in the study exhibited appropriate behavior (not hitting or pinching), they received a “quiet hands” token. When they hit or pinched, they lost a token. The children could then exchange specified amounts of tokens for minutes of playtime.
Behavior modification in children.
Parents and teachers often use behavior modification to change a child’s behavior. Behavior modification uses the principles of operant conditioning to accomplish behavior change so that undesirable behaviors are switched for more socially acceptable ones. Some teachers and parents create a sticker chart, in which several behaviors are listed ( Figure 6.11 ). Sticker charts are a form of token economies, as described in the text. Each time children perform the behavior, they get a sticker, and after a certain number of stickers, they get a prize, or reinforcer. The goal is to increase acceptable behaviors and decrease misbehavior. Remember, it is best to reinforce desired behaviors, rather than to use punishment. In the classroom, the teacher can reinforce a wide range of behaviors, from students raising their hands, to walking quietly in the hall, to turning in their homework. At home, parents might create a behavior chart that rewards children for things such as putting away toys, brushing their teeth, and helping with dinner. In order for behavior modification to be effective, the reinforcement needs to be connected with the behavior; the reinforcement must matter to the child and be done consistently.
Time-out is another popular technique used in behavior modification with children. It operates on the principle of negative punishment. When a child demonstrates an undesirable behavior, they are removed from the desirable activity at hand ( Figure 6.12 ). For example, say that Sophia and her brother Mario are playing with building blocks. Sophia throws some blocks at her brother, so you give her a warning that she will go to time-out if she does it again. A few minutes later, she throws more blocks at Mario. You remove Sophia from the room for a few minutes. When she comes back, she doesn’t throw blocks.
There are several important points that you should know if you plan to implement time-out as a behavior modification technique. First, make sure the child is being removed from a desirable activity and placed in a less desirable location. If the activity is something undesirable for the child, this technique will backfire because it is more enjoyable for the child to be removed from the activity. Second, the length of the time-out is important. The general rule of thumb is one minute for each year of the child’s age. Sophia is five; therefore, she sits in a time-out for five minutes. Setting a timer helps children know how long they have to sit in time-out. Finally, as a caregiver, keep several guidelines in mind over the course of a time-out: remain calm when directing your child to time-out; ignore your child during time-out (because caregiver attention may reinforce misbehavior); and give the child a hug or a kind word when time-out is over.
Remember, the best way to teach a person or animal a behavior is to use positive reinforcement. For example, Skinner used positive reinforcement to teach rats to press a lever in a Skinner box. At first, the rat might randomly hit the lever while exploring the box, and out would come a pellet of food. After eating the pellet, what do you think the hungry rat did next? It hit the lever again, and received another pellet of food. Each time the rat hit the lever, a pellet of food came out. When an organism receives a reinforcer each time it displays a behavior, it is called continuous reinforcement . This reinforcement schedule is the quickest way to teach someone a behavior, and it is especially effective in training a new behavior. Let’s look back at the dog that was learning to sit earlier in the chapter. Now, each time he sits, you give him a treat. Timing is important here: you will be most successful if you present the reinforcer immediately after he sits, so that he can make an association between the target behavior (sitting) and the consequence (getting a treat).
Watch this video clip of veterinarian Dr. Sophia Yin shaping a dog's behavior using the steps outlined above to learn more.
Once a behavior is trained, researchers and trainers often turn to another type of reinforcement schedule—partial reinforcement. In partial reinforcement , also referred to as intermittent reinforcement, the person or animal does not get reinforced every time they perform the desired behavior. There are several different types of partial reinforcement schedules ( Table 6.3 ). These schedules are described as either fixed or variable, and as either interval or ratio. Fixed refers to the number of responses between reinforcements, or the amount of time between reinforcements, which is set and unchanging. Variable refers to the number of responses or amount of time between reinforcements, which varies or changes. Interval means the schedule is based on the time between reinforcements, and ratio means the schedule is based on the number of responses between reinforcements.
Reinforcement Schedule | Description | Result | Example |
---|---|---|---|
Fixed interval | Reinforcement is delivered at predictable time intervals (e.g., after 5, 10, 15, and 20 minutes). | Moderate response rate with significant pauses after reinforcement | Hospital patient uses patient-controlled, doctor-timed pain relief |
Variable interval | Reinforcement is delivered at unpredictable time intervals (e.g., after 5, 7, 10, and 20 minutes). | Moderate yet steady response rate | Checking social media |
Fixed ratio | Reinforcement is delivered after a predictable number of responses (e.g., after 2, 4, 6, and 8 responses). | High response rate with pauses after reinforcement | Piecework—factory worker getting paid for every x number of items manufactured |
Variable ratio | Reinforcement is delivered after an unpredictable number of responses (e.g., after 1, 4, 5, and 9 responses). | High and steady response rate | Gambling |
Now let’s combine these four terms. A fixed interval reinforcement schedule is when behavior is rewarded after a set amount of time. For example, June undergoes major surgery in a hospital. During recovery, they are expected to experience pain and will require prescription medications for pain relief. June is given an IV drip with a patient-controlled painkiller. Their doctor sets a limit: one dose per hour. June pushes a button when pain becomes difficult, and they receive a dose of medication. Since the reward (pain relief) only occurs on a fixed interval, there is no point in exhibiting the behavior when it will not be rewarded.
With a variable interval reinforcement schedule , the person or animal gets the reinforcement based on varying amounts of time, which are unpredictable. Say that Manuel is the manager at a fast-food restaurant. Every once in a while someone from the quality control division comes to Manuel’s restaurant. If the restaurant is clean and the service is fast, everyone on that shift earns a $20 bonus. Manuel never knows when the quality control person will show up, so he always tries to keep the restaurant clean and ensures that his employees provide prompt and courteous service. His productivity regarding prompt service and keeping a clean restaurant are steady because he wants his crew to earn the bonus.
With a fixed ratio reinforcement schedule , there are a set number of responses that must occur before the behavior is rewarded. Carla sells glasses at an eyeglass store, and she earns a commission every time she sells a pair of glasses. She always tries to sell people more pairs of glasses, including prescription sunglasses or a backup pair, so she can increase her commission. She does not care if the person really needs the prescription sunglasses, Carla just wants her bonus. The quality of what Carla sells does not matter because her commission is not based on quality; it’s only based on the number of pairs sold. This distinction in the quality of performance can help determine which reinforcement method is most appropriate for a particular situation. Fixed ratios are better suited to optimize the quantity of output, whereas a fixed interval, in which the reward is not quantity based, can lead to a higher quality of output.
In a variable ratio reinforcement schedule , the number of responses needed for a reward varies. This is the most powerful partial reinforcement schedule. An example of the variable ratio reinforcement schedule is gambling. Imagine that Sarah—generally a smart, thrifty woman—visits Las Vegas for the first time. She is not a gambler, but out of curiosity she puts a quarter into the slot machine, and then another, and another. Nothing happens. Two dollars in quarters later, her curiosity is fading, and she is just about to quit. But then, the machine lights up, bells go off, and Sarah gets 50 quarters back. That’s more like it! Sarah gets back to inserting quarters with renewed interest, and a few minutes later she has used up all her gains and is $10 in the hole. Now might be a sensible time to quit. And yet, she keeps putting money into the slot machine because she never knows when the next reinforcement is coming. She keeps thinking that with the next quarter she could win $50, or $100, or even more. Because the reinforcement schedule in most types of gambling has a variable ratio schedule, people keep trying and hoping that the next time they will win big. This is one of the reasons that gambling is so addictive—and so resistant to extinction.
In operant conditioning, extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very slowly, as described above. But in the other reinforcement schedules, extinction may come quickly. For example, if June presses the button for the pain relief medication before the allotted time the doctor has approved, no medication is administered. They are on a fixed interval reinforcement schedule (dosed hourly), so extinction occurs quickly when reinforcement doesn’t come at the expected time. Among the reinforcement schedules, variable ratio is the most productive and the most resistant to extinction. Fixed interval is the least productive and the easiest to extinguish ( Figure 6.13 ).
Gambling and the brain.
Skinner (1953) stated, “If the gambling establishment cannot persuade a patron to turn over money with no return, it may achieve the same effect by returning part of the patron's money on a variable-ratio schedule” (p. 397).
Skinner uses gambling as an example of the power of the variable-ratio reinforcement schedule for maintaining behavior even during long periods without any reinforcement. In fact, Skinner was so confident in his knowledge of gambling addiction that he even claimed he could turn a pigeon into a pathological gambler (“Skinner’s Utopia,” 1971). It is indeed true that variable-ratio schedules keep behavior quite persistent—just imagine the frequency of a child’s tantrums if a parent gives in even once to the behavior. The occasional reward makes it almost impossible to stop the behavior.
Recent research in rats has failed to support Skinner’s idea that training on variable-ratio schedules alone causes pathological gambling (Laskowski et al., 2019). However, other research suggests that gambling does seem to work on the brain in the same way as most addictive drugs, and so there may be some combination of brain chemistry and reinforcement schedule that could lead to problem gambling ( Figure 6.14 ). Specifically, modern research shows the connection between gambling and the activation of the reward centers of the brain that use the neurotransmitter (brain chemical) dopamine (Murch & Clark, 2016). Interestingly, gamblers don’t even have to win to experience the “rush” of dopamine in the brain. “Near misses,” or almost winning but not actually winning, also have been shown to increase activity in the ventral striatum and other brain reward centers that use dopamine (Chase & Clark, 2010). These brain effects are almost identical to those produced by addictive drugs like cocaine and heroin (Murch & Clark, 2016). Based on the neuroscientific evidence showing these similarities, the DSM-5 now considers gambling an addiction, while earlier versions of the DSM classified gambling as an impulse control disorder.
In addition to dopamine, gambling also appears to involve other neurotransmitters, including norepinephrine and serotonin (Potenza, 2013). Norepinephrine is secreted when a person feels stress, arousal, or thrill. It may be that pathological gamblers use gambling to increase their levels of this neurotransmitter. Deficiencies in serotonin might also contribute to compulsive behavior, including a gambling addiction (Potenza, 2013).
It may be that pathological gamblers’ brains are different than those of other people, and perhaps this difference may somehow have led to their gambling addiction, as these studies seem to suggest. However, it is very difficult to ascertain the cause because it is impossible to conduct a true experiment (it would be unethical to try to turn randomly assigned participants into problem gamblers). Therefore, it may be that causation actually moves in the opposite direction—perhaps the act of gambling somehow changes neurotransmitter levels in some gamblers’ brains. It also is possible that some overlooked factor, or confounding variable, played a role in both the gambling addiction and the differences in brain chemistry.
Strict behaviorists like Watson and Skinner focused exclusively on studying behavior rather than cognition (such as thoughts and expectations). In fact, Skinner was such a staunch believer that cognition didn't matter that his ideas were considered radical behaviorism . Skinner considered the mind a "black box"—something completely unknowable—and, therefore, something not to be studied. However, another behaviorist, Edward C. Tolman, had a different opinion. Tolman’s experiments with rats demonstrated that organisms can learn even if they do not receive immediate reinforcement (Tolman & Honzik, 1930; Tolman, Ritchie, & Kalish, 1946). This finding was in conflict with the prevailing idea at the time that reinforcement must be immediate in order for learning to occur, thus suggesting a cognitive aspect to learning.
In the experiments, Tolman placed hungry rats in a maze with no reward for finding their way through it. He also studied a comparison group that was rewarded with food at the end of the maze. As the unreinforced rats explored the maze, they developed a cognitive map : a mental picture of the layout of the maze ( Figure 6.15 ). After 10 sessions in the maze without reinforcement, food was placed in a goal box at the end of the maze. As soon as the rats became aware of the food, they were able to find their way through the maze quickly, just as quickly as the comparison group, which had been rewarded with food all along. This is known as latent learning : learning that occurs but is not observable in behavior until there is a reason to demonstrate it.
Latent learning also occurs in humans. Children may learn by watching the actions of their parents but only demonstrate it at a later date, when the learned material is needed. For example, suppose that Ravi’s dad drives him to school every day. In this way, Ravi learns the route from his house to his school, but he’s never driven there himself, so he has not had a chance to demonstrate that he’s learned the way. One morning Ravi’s dad has to leave early for a meeting, so he can’t drive Ravi to school. Instead, Ravi follows the same route on his bike that his dad would have taken in the car. This demonstrates latent learning. Ravi had learned the route to school, but had no need to demonstrate this knowledge earlier.
Have you ever gotten lost in a building and couldn’t find your way back out? While that can be frustrating, you’re not alone. At one time or another we’ve all gotten lost in places like a museum, hospital, or university library. Whenever we go someplace new, we build a mental representation—or cognitive map—of the location, as Tolman’s rats built a cognitive map of their maze. However, some buildings are confusing because they include many areas that look alike or have short lines of sight. Because of this, it’s often difficult to predict what’s around a corner or decide whether to turn left or right to get out of a building. Psychologist Laura Carlson (2010) suggests that what we place in our cognitive map can impact our success in navigating through the environment. She suggests that paying attention to specific features upon entering a building, such as a picture on the wall, a fountain, a statue, or an escalator, adds information to our cognitive map that can be used later to help find our way out of the building.
Watch this video about Carlson's studies on cognitive maps and navigation in buildings to learn more.
This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.
Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.
Access for free at https://openstax.org/books/psychology-2e/pages/1-introduction
© Jun 26, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.
After the retirement of John B. Watson from the world of Academic psychology, psychologists and behaviorists were eager to propose new forms of learning other than the classical conditioning. The most important among these theories was Operant Conditioning proposed by Burrhus Frederic Skinner, commonly known as B.F. Skinner.
Skinner based his theory in the simple fact that the study of observable behavior is much simpler than trying to study internal mental events. Skinner’s works concluded a study far less extreme than those of Watson (1913), and it deemed classical conditioning as too simplistic of a theory to be a complete explanation of complex human behavior.
B.F. Skinner is famous for his pioneering research in the field of learning and behavior. He proposed the theory to study complex human behavior by studying the voluntary responses shown by an organism when placed in the certain environment. He named these behaviors or responses as operant. He is also called the father of Operant Conditioning Learning, but he based his theory known as “Law of Effect”, discovered by Edward Thorndike in 1905.
B.F. Skinner proposed his theory on operant conditioning by conducting various experiments on animals. He used a special box known as “Skinner Box” for his experiment on rats.
As the first step to his experiment, he placed a hungry rat inside the Skinner box. The rat was initially inactive inside the box, but gradually as it began to adapt to the environment of the box, it began to explore around. Eventually, the rat discovered a lever, upon pressing which; food was released inside the box. After it filled its hunger, it started exploring the box again, and after a while it pressed the lever for the second time as it grew hungry again. This phenomenon continued for the third, fourth and the fifth time, and after a while, the hungry rat immediately pressed the lever once it was placed in the box. Then the conditioning was deemed to be complete.
Here, the action of pressing the lever is an operant response/behavior, and the food released inside the chamber is the reward. The experiment is also known as Instrumental Conditioning Learning as the response is instrumental in getting food.
This experiment also deals with and explains the effects of positive reinforcement. Upon pressing the lever, the hungry rat was served with food, which filled its hunger; hence, it’s a positive reinforcement.
B.F. Skinner also conducted an experiment that explained negative reinforcement. Skinner placed a rat in a chamber in the similar manner, but instead of keeping it hungry, he subjected the chamber to an unpleasant electric current. The rat having experienced the discomfort started to desperately move around the box and accidentally knocked the lever. Pressing of the lever immediately seized the flow of unpleasant current. After a few times, the rat had smartened enough to go directly to the lever in order to prevent itself from the discomfort.
The electric current reacted as the negative reinforcement, and the consequence of escaping the electric current made sure that the rat repeated the action again and again. Here too, the pressing of the lever is an operant response, and the complete stop of the electric current flow is its reward.
Both the experiment clearly explains the working of operant conditioning. The important part in any operant conditioning learning is to recognize the operant behavior and the consequence resulted in that particular environment.
Before the works of Skinner, the namesake of the Skinner box, instrumental learning was typically studied using a maze or puzzle box.
Learning in these settings is well-suited to examining discrete trials or episodes of behavior instead of a continuous stream of behavior.
The Skinner box, meanwhile, was designed as an experimental environment better suited to examine the more natural flow of behavior in animals.
The design of the Skinner Box varies heavily depending on the type of animal enclosed within it and experimental variables.
Nonetheless, it includes, at minimum, at least one lever, bar, or key that an animal can manipulate. Besides the reinforcer and tracker, a skinner box can include other variables, such as lights, sounds, or images. In some cases, the floor of the chamber may even be electrified (Boulay, 2019).
The design of the Skinner box is intended to keep an animal from experiencing other stimuli, allowing researchers to carefully study behavior in a very controlled environment.
This allows researchers to, for example, determine which schedule of reinforcement — or relation of rewards and punishment to the reinforcer — leads to the highest rate of response in the animal being studied (Boulay, 2019).
The reinforcer is the part of the Skinner box that provides, naturally, reinforcement for an action. For instance, a lever may provide a pellet of food when pressed a certain number of times. This lever is the reinforcer (Boulay, 2019).
The tracker, meanwhile, provides quantitative data regarding the reinforcer. For example, the tracker may count the number of times that a lever is pressed or the number of electric shocks or pellets dispensed (Boulay, 2019).
Partial reinforcement occurs when reinforcement is only given under particular circumstances. For example, a pellet or shock may only be dispensed after a pigeon has pressed a lever a certain number of times.
There are several types of partial reinforcement schedules (Boulay, 2019):
Once data has been obtained from the Skinner box, researchers can look at the rate of response depending on the schedule.
Modified versions of the operant conditioning chamber, or Skinner box, are still widely used in research settings today.
Skinner developed his theory of operant conditioning by identifying four different types of punishment or reward.
To test the effect of these outcomes, he constructed a device called the “Skinner Box,” a cage in which a rat could be placed, with a small lever (which the rat would be trained to press), a chute that would release pellets of food, and a floor which could be electrified.
For example, a hungry rat was placed in a cage. Every time he activated the lever, a food pellet fell into the food dispenser (positive reinforcement). The rats quickly learned to go straight to the lever after a few times of being put in the box.
This suggests that positive reinforcement increases the likelihood of the behavior being repeated.
In another experiment, a rat was placed in a cage in which they were subjected to an uncomfortable electrical current (see diagram above).
As they moved around the cage, the rat hit the lever, which immediately switched off the electrical current (negative reinforcement). The rats quickly learned to go straight to the lever after a few times of being put in the box.
This suggests that negative reinforcement increases the likelihood of the behavior being repeated.
The device allowed Skinner to deliver each of his four potential outcomes, which are:
The application of operant and classical conditioning and the corresponding idea of the Skinner Box in commercial settings is widespread, particularly with regard to advertising and video games.
Advertisers use a number of techniques based on operant conditioning to influence consumer behavior, such as variable-ratio reinforcement schedule (the so-called “slot machine effect”), which encourages viewers to keep watching a particular channel in the hope of seeing a desirable outcome (e.g., winning a prize) (Vu, 2017).
Similarly, video game designers often employ Skinnerian principles in order to keep players engaged in gameplay.
For instance, many games make use of variable-ratio schedules of reinforcement, whereby players are given rewards (e.g., points, new levels) at random intervals.
This encourages players to keep playing in the hope of receiving a reward. In addition, many games make use of Skinner’s principle of shaping, whereby players are gradually given more difficult tasks as they master the easy ones. This encourages players to persevere in the face of frustration in order to see results.
There are a number of potential problems with using operant conditioning principles in commercial settings.
First, advertisers and video game designers may inadvertently create addictive behaviors in consumers.
Second, operant conditioning is a relatively short-term phenomenon; that is, it only affects behavior while reinforcement is being given.
Once reinforcement is removed (e.g., the TV channel is changed, the game is turned off), the desired behavior is likely to disappear as well.
As such, operant conditioning techniques may backfire, leading to addiction without driving the game-playing experiences developers hoped for (Vu, 2017).
In 1945, B. F. Skinner invented the air crib, a metal crib with walls and a ceiling made of removable safety glass.
The front pane of the crib was also made of safety glass, and the entire structure was meant to sit on legs so that it could be moved around easily.
The air crib was designed to create a climate-controlled, healthier environment for infants. The air crib was not commercially successful, but it did receive some attention from the media.
In particular, Time magazine ran a story about the air crib in 1947, which described it as a “baby tender” that would “give infant care a new scientific basis.” (Joyce & Fay, 2010).
The general lack of publicity around Skinner’s air crib, however, resulted in the perpetuation of the myth that Skinner’s air crib was a Skinner Box and that the infants placed in the crib were being conditioned.
In reality, the air crib was nothing more than a simple bassinet with some features that were meant to make it easier for parents to care for their infants.
There is no evidence that Skinner ever used the air crib to condition children, and in fact, he later said that it was never his intention to do so.
One famous myth surrounding the Skinner Crib was that Skinner’s daughter, Deborah Skinner, was Raised in a Skinner Box.
According to this rumor, Deborah Skinner had become mentally ill, sued her father, and committed suicide as a result of her experience. These rumors persisted until she publicly denied the stories in 2004 (Joyce & Fay, 2010).
One of the most common criticisms of the Skinner box is that it does not allow animals to understand their actions.
Because behaviorism does not require that an animal understand its actions, this theory can be somewhat misleading about the degree to which an animal actually understands what it is doing (Boulay, 2019).
Another criticism of the Skinner box is that it can be quite stressful for the animals involved. The design of the Skinner box is intended to keep an animal from experiencing other stimuli, which can lead to stress and anxiety.
Finally, some critics argue that the data obtained from Skinner boxes may not be generalizable to real-world situations.
Because the environment in a Skinner box is so controlled, it may not accurately reflect how an animal would behave in an environment outside the lab.
There are very few learning environments in the real world that replicate a perfect operant conditioning environment, with a single action or sequence of actions leading to a stimulus (Boulay, 2019).
Bandura, A. (1977). Social learning theory. Englewood Cliffs, NJ: Prentice Hall.
Dezfouli, A., & Balleine, B. W. (2012). Habits, action sequences and reinforcement learning. European Journal of Neuroscience, 35 (7), 1036-1051.
Du Boulay, B. (2019). Escape from the Skinner Box: The case for contemporary intelligent learning environments. British Journal of Educational Technology, 50 (6), 2902-2919.
Chen, C., Zhang, K. Z., Gong, X., & Lee, M. (2019). Dual mechanisms of reinforcement reward and habit in driving smartphone addiction: the role of smartphone features. Internet Research.
Dad, H., Ali, R., Janjua, M. Z. Q., Shahzad, S., & Khan, M. S. (2010). Comparison of the frequency and effectiveness of positive and negative reinforcement practices in schools. Contemporary Issues in Education Research, 3 (1), 127-136.
Diedrich, J. L. (2010). Motivating students using positive reinforcement (Doctoral dissertation).
Dozier, C. L., Foley, E. A., Goddard, K. S., & Jess, R. L. (2019). Reinforcement. T he Encyclopedia of Child and Adolescent Development, 1-10.
Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement . New York: Appleton-Century-Crofts.
Gunter, P. L., & Coutinho, M. J. (1997). Negative reinforcement in classrooms: What we’re beginning to learn. Teacher Education and Special Education, 20 (3), 249-264.
Joyce, N., & Faye, C. (2010). Skinner Air Crib. APS Observer, 23 (7).
Kamery, R. H. (2004, July). Motivation techniques for positive reinforcement: A review. I n Allied Academies International Conference. Academy of Legal, Ethical and Regulatory Issues. Proceedings (Vol. 8, No. 2, p. 91). Jordan Whitney Enterprises, Inc.
Kohler, W. (1924). The mentality of apes. London: Routledge & Kegan Paul.
Staddon, J. E., & Niv, Y. (2008). Operant conditioning. Scholarpedia, 3 (9), 2318.
Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. New York: Appleton-Century.
Skinner, B. F. (1948). Superstition” in the pigeon . Journal of Experimental Psychology, 38, 168-172.
Skinner, B. F. (1951). How to teach animals. Freeman.
Skinner, B. F. (1953). Science and human behavior. SimonandSchuster.com.
Skinner, B. F. (1963). Operant behavior. American psychologist, 18 (8), 503.
Smith, S., Ferguson, C. J., & Beaver, K. M. (2018). Learning to blast a way into crime, or just good clean fun? Examining aggressive play with toy weapons and its relation with crime. Criminal behaviour and mental health, 28 (4), 313-323.
Staddon, J. E., & Cerutti, D. T. (2003). Operant conditioning. Annual Review of Psychology, 54 (1), 115-144.
Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. Psychological Monographs: General and Applied, 2(4), i-109.
Vu, D. (2017). An Analysis of Operant Conditioning and its Relationship with Video Game Addiction.
Watson, J. B. (1913). Psychology as the behaviorist views it . Psychological Review, 20, 158–177.
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
Operant behavior is behavior “controlled” by its consequences. In practice, operant conditioning is the study of reversible behavior maintained by reinforcement schedules. We review empirical studies and theoretical approaches to two large classes of operant behavior: interval timing and choice. We discuss cognitive versus behavioral approaches to timing, the “gap” experiment and its implications, proportional timing and Weber's law, temporal dynamics and linear waiting, and the problem of simple chain-interval schedules. We review the long history of research on operant choice: the matching law, its extensions and problems, concurrent chain schedules, and self-control. We point out how linear waiting may be involved in timing, choice, and reinforcement schedules generally. There are prospects for a unified approach to all these areas.
The term operant conditioning 1 was coined by B. F. Skinner in 1937 in the context of reflex physiology, to differentiate what he was interested in—behavior that affects the environment—from the reflex-related subject matter of the Pavlovians. The term was novel, but its referent was not entirely new. Operant behavior , though defined by Skinner as behavior “controlled by its consequences” is in practice little different from what had previously been termed “instrumental learning” and what most people would call habit. Any well-trained “operant” is in effect a habit. What was truly new was Skinner's method of automated training with intermittent reinforcement and the subject matter of reinforcement schedules to which it led. Skinner and his colleagues and students discovered in the ensuing decades a completely unsuspected range of powerful and orderly schedule effects that provided new tools for understanding learning processes and new phenomena to challenge theory.
A reinforcement schedule is any procedure that delivers a reinforcer to an organism according to some well-defined rule. The usual reinforcer is food for a hungry rat or pigeon; the usual schedule is one that delivers the reinforcer for a switch closure caused by a peck or lever press. Reinforcement schedules have also been used with human subjects, and the results are broadly similar to the results with animals. However, for ethical and practical reasons, relatively weak reinforcers must be used—and the range of behavioral strategies people can adopt is of course greater than in the case of animals. This review is restricted to work with animals.
Two types of reinforcement schedule have excited the most interest. Most popular are time-based schedules such as fixed and variable interval, in which the reinforcer is delivered after a fixed or variable time period after a time marker (usually the preceding reinforcer). Ratio schedules require a fixed or variable number of responses before a reinforcer is delivered.
Trial-by-trial versions of all these free-operant procedures exist. For example, a version of the fixed-interval schedule specifically adapted to the study of interval timing is the peak-interval procedure, which adds to the fixed interval an intertrial interval (ITI) preceding each trial and a percentage of extra-long “empty” trials in which no food is given.
For theoretical reasons, Skinner believed that operant behavior ought to involve a response that can easily be repeated, such as pressing a lever, for rats, or pecking an illuminated disk (key) for pigeons. The rate of such behavior was thought to be important as a measure of response strength ( Skinner 1938 , 1966 , 1986 ; Killeen & Hall 2001 ). The current status of this assumption is one of the topics of this review. True or not, the emphasis on response rate has resulted in a dearth of experimental work by operant conditioners on nonrecurrent behavior such as movement in space.
Operant conditioning differs from other kinds of learning research in one important respect. The focus has been almost exclusively on what is called reversible behavior, that is, behavior in which the steady-state pattern under a given schedule is stable, meaning that in a sequence of conditions, XAXBXC…, where each condition is maintained for enough days that the pattern of behavior is locally stable, behavior under schedule X shows a pattern after one or two repetitions of X that is always the same. For example, the first time an animal is exposed to a fixed-interval schedule, after several daily sessions most animals show a “scalloped” pattern of responding (call it pattern A): a pause after each food delivery—also called wait time or latency —followed by responding at an accelerated rate until the next food delivery. However, some animals show negligible wait time and a steady rate (pattern B). If all are now trained on some other procedure—a variable-interval schedule, for example—and then after several sessions are returned to the fixed-interval schedule, almost all the animals will revert to pattern A. Thus, pattern A is the stable pattern. Pattern B, which may persist under unchanging conditions but does not recur after one or more intervening conditions, is sometimes termed metastable ( Staddon 1965 ). The vast majority of published studies in operant conditioning are on behavior that is stable in this sense.
Although the theoretical issue is not a difficult one, there has been some confusion about what the idea of stability (reversibility) in behavior means. It should be obvious that the animal that shows pattern A after the second exposure to procedure X is not the same animal as when it showed pattern A on the first exposure. Its experimental history is different after the second exposure than after the first. If the animal has any kind of memory, therefore, its internal state 2 following the second exposure is likely to be different than after the first exposure, even though the observed behavior is the same. The behavior is reversible; the organism's internal state in general is not. The problems involved in studying nonreversible phenomena in individual organisms have been spelled out elsewhere (e.g., Staddon 2001a , Ch. 1); this review is mainly concerned with the reversible aspects of behavior.
Once the microscope was invented, microorganisms became a new field of investigation. Once automated operant conditioning was invented, reinforcement schedules became an independent subject of inquiry. In addition to being of great interest in their own right, schedules have also been used to study topics defined in more abstract ways such as timing and choice. These two areas constitute the majority of experimental papers in operant conditioning with animal subjects during the past two decades. Great progress has been made in understanding free-operant choice behavior and interval timing. Yet several theories of choice still compete for consensus, and much the same is true of interval timing. In this review we attempt to summarize the current state of knowledge in these two areas, to suggest how common principles may apply in both, and to show how these principles may also apply to reinforcement schedule behavior considered as a topic in its own right.
Interval timing is defined in several ways. The simplest is to define it as covariation between a dependent measure such as wait time and an independent measure such as interreinforcement interval (on fixed interval) or trial time-to-reinforcement (on the peak procedure). When interreinforcement interval is doubled, then after a learning period wait time also approximately doubles ( proportional timing ). This is an example of what is sometimes called a time production procedure: The organism produces an approximation to the to-be-timed interval. There are also explicit time discrimination procedures in which on each trial the subject is exposed to a stimulus and is then required to respond differentially depending on its absolute ( Church & Deluty 1977 , Stubbs 1968 ) or even relative ( Fetterman et al. 1989 ) duration. For example, in temporal bisection , the subject (e.g., a rat) experiences either a 10-s or a 2-s stimulus, L or S . After the stimulus goes off, the subject is confronted with two choices. If the stimulus was L , a press on the left lever yields food; if S , a right press gives food; errors produce a brief time-out. Once the animal has learned, stimuli of intermediate duration are presented in lieu of S and L on test trials. The question is, how will the subject distribute its responses? In particular, at what intermediate duration will it be indifferent between the two choices? [Answer: typically in the vicinity of the geometric mean, i.e., √( L.S ) − 4.47 for 2 and 10.]
Wait time is a latency; hence (it might be objected) it may vary on time-production procedures like fixed interval because of factors other than timing—such as degree of hunger (food deprivation). Using a time-discrimination procedure avoids this problem. It can also be mitigated by using the peak procedure and looking at performance during “empty” trials. “Filled” trials terminate with food reinforcement after (say) T s. “Empty” trials, typically 3 T s long, contain no food and end with the onset of the ITI. During empty trials the animal therefore learns to wait, then respond, then stop (more or less) until the end of the trial ( Catania 1970 ). The mean of the distribution of response rates averaged over empty trials ( peak time ) is then perhaps a better measure of timing than wait time because motivational variables are assumed to affect only the height and spread of the response-rate distribution, not its mean. This assumption is only partially true ( Grace & Nevin 2000 , MacEwen & Killeen 1991 , Plowright et al. 2000 ).
There is still some debate about the actual pattern of behavior on the peak procedure in each individual trial. Is it just wait, respond at a constant rate, then wait again? Or is there some residual responding after the “stop” [yes, usually (e.g., Church et al. 1991 )]? Is the response rate between start and stop really constant or are there two or more identifiable rates ( Cheng & Westwood 1993 , Meck et al. 1984 )? Nevertheless, the method is still widely used, particularly by researchers in the cognitive/psychophysical tradition. The idea behind this approach is that interval timing is akin to sensory processes such as the perception of sound intensity (loudness) or luminance (brightness). As there is an ear for hearing and an eye for seeing, so (it is assumed) there must be a (real, physiological) clock for timing. Treisman (1963) proposed the idea of an internal pacemaker-driven clock in the context of human psychophysics. Gibbon (1977) further developed the approach and applied it to animal interval-timing experiments.
The major similarity between acknowledged sensory processes, such as brightness perception, and interval timing is Weber's law . Peak time on the peak procedure is not only proportional to time-to-food ( T ), its coefficient of variation (standard deviation divided by mean) is approximately constant, a result similar to Weber's law obeyed by most sensory dimensions. This property has been called scalar timing ( Gibbon 1977 ). Most recently, Gallistel & Gibbon (2000) have proposed a grand principle of timescale invariance , the idea that the frequency distribution of any given temporal measure (the idea is assumed to apply generally, though in fact most experimental tests have used peak time) scales with the to-be-timed-interval. Thus, given the normalized peak-time distribution for T =60 s, say; if the x -axis is divided by 2, it will match the distribution for T = 30 s. In other words, the frequency distribution for the temporal dependent variable, normalized on both axes, is asserted to be invariant.
Timescale invariance is in effect a combination of Weber's law and proportional timing. Like those principles, it is only approximately true. There are three kinds of evidence that limit its generality. The simplest is the steady-state pattern of responding (key-pecking or lever-pressing) observed on fixed-interval reinforcement schedules. This pattern should be the same at all fixed-interval values, but it is not. Gallistel & Gibbon wrote, “When responding on such a schedule, animals pause after each reinforcement and then resume responding after some interval has elapsed. It was generally supposed that the animals' rate of responding accelerated throughout the remainder of the interval leading up to reinforcement. In fact, however, conditioned responding in this paradigm … is a two-state variable (slow, sporadic pecking vs. rapid, steady pecking), with one transition per interreinforcement interval ( Schneider 1969 )” (p. 293).
This conclusion over-generalizes Schneider's result. Reacting to reports of “break-and-run” fixed-interval performance under some conditions, Schneider sought to characterize this feature more objectively than the simple inspection of cumulative records. He found a way to identify the point of maximum acceleration in the fixed-interval “scallop” by using an iterative technique analogous to attaching an elastic band to the beginning of an interval and the end point of the cumulative record, then pushing a pin, representing the break point, against the middle of the band until the two resulting straight-line segments best fit the cumulative record (there are other ways to achieve the same result that do not fix the end points of the two line-segments). The postreinforcement time ( x -coordinate) of the pin then gives the break point for that interval. Schneider showed that the break point is an orderly dependent measure: Break point is roughly 0.67 of interval duration, with standard deviation proportional to the mean (the Weber-law or scalar property).
This finding is by no means the same as the idea that the fixed-interval scallop is “a two-state variable” ( Hanson & Killeen 1981 ). Schneider showed that a two-state model is an adequate approximation; he did not show that it is the best or truest approximation. A three- or four-line approximation (i.e., two or more pins) might well have fit significantly better than the two-line version. To show that the process is two-state, Schneider would have had to show that adding additional segments produced negligibly better fit to the data.
The frequent assertion that the fixed-interval scallop is always an artifact of averaging flies in the face of raw cumulative-record data“the many nonaveraged individual fixed-interval cumulative records in Ferster & Skinner (1957 , e.g., pp. 159, 160, 162), which show clear curvature, particularly at longer fixed-interval values (> ∼2 min). The issue for timescale invariance, therefore, is whether the shape, or relative frequency of different-shaped records, is the same at different absolute intervals.
The evidence is that there is more, and more frequent, curvature at longer intervals. Schneider's data show this effect. In Schneider's Figure 3, for example, the time to shift from low to high rate is clearly longer at longer intervals than shorter ones. On fixed-interval schedules, apparently, absolute duration does affect the pattern of responding. (A possible reason for this dependence of the scallop on fixed-interval value is described in Staddon 2001a , p. 317. The basic idea is that greater curvature at longer fixed-interval values follows from two things: a linear increase in response probability across the interval, combined with a nonlinear, negatively accelerated, relation between overall response rate and reinforcement rate.) If there is a reliable difference in the shape, or distribution of shapes, of cumulative records at long and short fixed-interval values, the timescale-invariance principle is violated.
A second dataset that does not agree with timescale invariance is an extensive set of studies on the peak procedure by Zeiler & Powell (1994 ; see also Hanson & Killeen 1981) , who looked explicitly at the effect of interval duration on various measures of interval timing. They conclude, “Quantitative properties of temporal control depended on whether the aspect of behavior considered was initial pause duration, the point of maximum acceleration in responding [break point], the point of maximum deceleration, the point at which responding stopped, or several different statistical derivations of a point of maximum responding … . Existing theory does not explain why Weber's law [the scalar property] so rarely fit the results …” (p. 1; see also Lowe et al. 1979 , Wearden 1985 for other exceptions to proportionality between temporal measures of behavior and interval duration). Like Schneider (1969) and Hanson & Killeen (1981) , Zeiler & Powell found that the break point measure was proportional to interval duration, with scalar variance (constant coefficient of variation), and thus consistent with timescale invariance, but no other measure fit the rule.
Moreover, the fit of the breakpoint measure is problematic because it is not a direct measure of behavior but is itself the result of a statistical fitting procedure. It is possible, therefore, that the fit of breakpoint to timescale invariance owes as much to the statistical method used to arrive at it as to the intrinsic properties of temporal control. Even if this caveat turns out to be false, the fact that every other measure studied by Zeiler & Powell failed to conform to timescale invariance surely rules it out as a general principle of interval timing.
The third and most direct test of the timescale invariance idea is an extensive series of time-discrimination experiments carried out by Dreyfus et al. (1988) and Stubbs et al. (1994) . The usual procedure in these experiments was for pigeons to peck a center response key to produce a red light of one duration that is followed immediately by a green light of another duration. When the green center-key light goes off, two yellow side-keys light up. The animals are reinforced with food for pecking the left side-key if the red light was longer, the right side-key if the green light was longer.
The experimental question is, how does discrimination accuracy depend on relative and absolute duration of the two stimuli? Timescale invariance predicts that accuracy depends only on the ratio of red and green durations: For example, accuracy should be the same following the sequence red:10, green:20 as the sequence red:30, green:60, but it is not. Pigeons are better able to discriminate between the two short durations than the two long ones, even though their ratio is the same. Dreyfus et al. and Stubbs et al. present a plethora of quantitative data of the same sort, all showing that time discrimination depends on absolute as well as relative duration.
Timescale invariance is empirically indistinguishable from Weber's law as it applies to time, combined with the idea of proportional timing: The mean of a temporal dependent variable is proportional to the temporal independent variable. But Weber's law and proportional timing are dissociable—it is possible to have proportional timing without conforming to Weber's law and vice versa (cf. Hanson & Killeen 1981 , Zeiler & Powell 1994 ), and in any case both are only approximately true. Timescale invariance therefore does not qualify as a principle in its own right.
The cognitive approach to timing dates from the late 1970s. It emphasizes the psychophysical properties of the timing process and the use of temporal dependent variables as measures of (for example) drug effects and the effects of physiological interventions. It de-emphasizes proximal environmental causes. Yet when timing (then called temporal control; see Zeiler 1977 for an early review) was first discovered by operant conditioners (Pavlov had studied essentially the same phenomenon— delay conditioning —many years earlier), the focus was on the time marker , the stimulus that triggered the temporally correlated behavior. (That is one virtue of the term control : It emphasizes the fact that interval timing behavior is usually not free-running. It must be cued by some aspect of the environment.) On so-called spaced-responding schedules, for example, the response is the time marker: The subject must learn to space its responses more than T s apart to get food. On fixed-interval schedules the time marker is reinforcer delivery; on the peak procedure it is the stimulus events associated with trial onset. This dependence on a time marker is especially obvious on time-production procedures, but on time-discrimination procedures the subject's choice behavior must also be under the control of stimuli associated with the onset and offset of the sample duration.
Not all stimuli are equally effective as time markers. For example, an early study by Staddon & Innis (1966a ; see also 1969) showed that if, on alternate fixed intervals, 50% of reinforcers (F) are omitted and replaced by a neutral stimulus (N) of the same duration, wait time following N is much shorter than after F (the reinforcement-omission effect ). Moreover, this difference persists indefinitely. Despite the fact that F and N have the same temporal relationship to the reinforcer, F is much more effective as a time marker than N. No exactly comparable experiment has been done using the peak procedure, partly because the time marker there involves ITI offset/trial onset rather than the reinforcer delivery, so that there is no simple manipulation equivalent to reinforcement omission.
These effects do not depend on the type of behavior controlled by the time marker. On fixed-interval schedules the time marker is in effect inhibitory: Responding is suppressed during the wait time and then occurs at an accelerating rate. Other experiments ( Staddon 1970 , 1972 ), however, showed that given the appropriate schedule, the time marker can control a burst of responding (rather than a wait) of a duration proportional to the schedule parameters ( temporal go–no-go schedules) and later experiments have shown that the place of responding can be controlled by time since trial onset in the so-called tri-peak procedure ( Matell & Meck 1999 ).
A theoretical review ( Staddon 1974 ) concluded, “Temporal control by a given time marker depends on the properties of recall and attention, that is, on the same variables that affect attention to compound stimuli and recall in memory experiments such as delayed matching-to-sample.” By far the most important variable seems to be “the value of the time-marker stimulus—Stimuli of high value … are more salient …” (p. 389), although the full range of properties that determine time-marker effectiveness is yet to be explored.
Reinforcement omission experiments are transfer tests , that is, tests to identify the effective stimulus. They pinpoint the stimulus property controlling interval timing—the effective time marker—by selectively eliminating candidate properties. For example, in a definitive experiment, Kello (1972) showed that on fixed interval the wait time is longest following standard reinforcer delivery (food hopper activated with food, hopper light on, house light off, etc.). Omission of any of those elements caused the wait time to decrease, a result consistent with the hypothesis that reinforcer delivery acquires inhibitory temporal control over the wait time. The only thing that makes this situation different from the usual generalization experiment is that the effects of reinforcement omission are relatively permanent. In the usual generalization experiment, delivery of the reinforcer according to the same schedule in the presence of both the training stimulus and the test stimuli would soon lead all to be responded to in the same way. Not so with temporal control: As we just saw, even though N and F events have the same temporal relationship to the next food delivery, animals never learn to respond similarly after both. The only exception is when the fixed-interval is relatively short, on the order of 20 s or less ( Starr & Staddon 1974 ). Under these conditions pigeons are able to use a brief neutral stimulus as a time marker on fixed interval.
The closest equivalent to fixed-interval reinforcement–omission using the peak procedure is the so-called gap experiment ( Roberts 1981 ). In the standard gap paradigm the sequence of stimuli in a training trial (no gap stimulus) consists of three successive stimuli: the intertrial interval stimulus (ITI), the fixed-duration trial stimulus (S), and food reinforcement (F), which ends each training trial. The sequence is thus ITI, S, F, ITI. Training trials are typically interspersed with empty probe trials that last longer than reinforced trials but end with an ITI only and no reinforcement. The stimulus sequence on such trials is ITI, S, ITI, but the S is two or three times longer than on training trials. After performance has stabilized, gap trials are introduced into some or all of the probe trials. On gap trials the ITI stimulus reappears for a while in the middle of the trial stimulus. The sequence on gap trials is therefore ITI, S, ITI, S, ITI. Gap trials do not end in reinforcement.
What is the effective time marker (i.e., the stimulus that exerts temporal control) in such an experiment? ITI offset/trial onset is the best temporal predictor of reinforcement: Its time to food is shorter and less variable than any other experimental event. Most but not all ITIs follow reinforcement, and the ITI itself is often variable in duration and relatively long. So reinforcer delivery is a poor temporal predictor. The time marker therefore has something to do with the transition between ITI and trial onset, between ITI and S. Gap trials also involve presentation of the ITI stimulus, albeit with a different duration and within-trial location than the usual ITI, but the similarities to a regular trial are obvious. The gap experiment is therefore a sort of generalization (of temporal control) experiment. Buhusi & Meck (2000) presented gap stimuli more or less similar to the ITI stimulus during probe trials and found results resembling generalization decrement, in agreement with this analysis.
However, the gap procedure was not originally thought of as a generalization test, nor is it particularly well designed for that purpose. The gap procedure arose directly from the cognitive idea that interval timing behavior is driven by an internal clock ( Church 1978 ). From this point of view it is perfectly natural to inquire about the conditions under which the clock can be started or stopped. If the to-be-timed interval is interrupted—a gap—will the clock restart when the trial stimulus returns (reset)? Will it continue running during the gap and afterwards? Or will it stop and then restart (stop)?
“Reset” corresponds to the maximum rightward shift (from trial onset) of the response-rate peak from its usual position t s after trial onset to t + G E , where G E is the offset time (end) of the gap stimulus. Conversely, no effect (clock keeps running) leaves the peak unchanged at t , and “stop and restart” is an intermediate result, a peak shift to G E − G B + t , where G B is the time of onset (beginning) of the gap stimulus.
Both gap duration and placement within a trial have been varied. The results that have been obtained so far are rather complex (cf. Buhusi & Meck 2000 , Cabeza de Vaca et al. 1994 , Matell & Meck 1999 ). In general, the longer the gap and the later it appears in the trial, the greater the rightward peak shift. All these effects can be interpreted in clock terms, but the clock view provides no real explanation for them, because it does not specify which one will occur under a given set of conditions. The results of gap experiments can be understood in a qualitative way in terms of the similarity of the gap presentation to events associated with trial onset; the more similar, the closer the effect will be to reset, i.e., the onset of a new trial. Another resemblance between gap results and the results of reinforcement-omission experiments is that the effects of the gap are also permanent: Behavior on later trials usually does not differ from behavior on the first few ( Roberts 1981 ). These effects have been successfully simulated quantitatively by a neural network timing model ( Hopson 1999 , 2002 ) that includes the assumption that the effects of time-marker presentation decay with time ( Cabeza de Vaca et al. 1994 ).
The original temporal control studies were strictly empirical but tacitly accepted something like the psychophysical view of timing. Time was assumed to be a sensory modality like any other, so the experimental task was simply to explore the different kinds of effect, excitatory, inhibitory, discriminatory, that could come under temporal control. The psychophysical view was formalized by Gibbon (1977) in the context of animal studies, and this led to a static information-processing model, scalar expectancy theory (SET: Gibbon & Church 1984 , Meck 1983 , Roberts 1983 ), which comprised a pacemaker-driven clock, working and reference memories, a comparator, and various thresholds. A later dynamic version added memory for individual trials (see Gallistel 1990 for a review). This approach led to a long series of experimental studies exploring the clocklike properties of interval timing (see Gallistel & Gibbon 2000 , Staddon & Higa 1999 for reviews), but none of these studies attempted to test the assumptions of the SET approach in a direct way.
SET was for many years the dominant theoretical approach to interval timing. In recent years, however, its limitations, of parsimony and predictive range, have become apparent and there are now a number of competitors such as the behavioral theory of timing ( Killeen & Fetterman 1988 , MacEwen & Killeen 1991 , Machado 1997 ), spectral timing theory ( Grossberg & Schmajuk 1989 ), neural network models ( Church & Broadbent 1990 , Hopson 1999 , Dragoi et al. 2002 ), and the habituation-based multiple time scale theory (MTS: Staddon & Higa 1999 , Staddon et al. 2002 ). There is as yet no consensus on the best theory.
A separate series of experiments in the temporal-control tradition, beginning in the late 1980s, studied the real-time dynamics of interval timing (e.g., Higa et al. 1991 , Lejeune et al. 1997 , Wynne & Staddon 1988 ; see Staddon 2001a for a review). These experiments have led to a simple empirical principle that may have wide application. Most of these experiments used the simplest possible timing schedule, a response-initiated delay (RID) schedule 3 . In this schedule the animal (e.g., a pigeon) can respond at any time, t , after food. The response changes the key color and food is delivered after a further T s. Time t is under the control of the animal; time T is determined by the experimenter. These experiments have shown that wait time on these and similar schedules (such as fixed interval) is strongly determined by the duration of the previous interfood interval (IFI). For example, wait time will track a cyclic sequence of IFIs, intercalated at a random point in a sequence of fixed ( t + T =constant) intervals, with a lag of one interval; a single short IFI is followed by a short wait time in the next interval (the effect of a single long interval is smaller), and so on (see Staddon et al. 2002 for a review and other examples of temporal tracking). To a first approximation, these results are consistent with a linear relation between wait time in IFI N + 1 and the duration of IFI N :
where I is the IFI, a is a constant less than one, and b is usually negligible. This relation has been termed linear waiting ( Wynne & Staddon 1988 ). The principle is an approximation: an expanded model, incorporating the multiple time scale theory, allows the principle to account for the slower effects of increases as opposed to decreases in IFI (see Staddon et al. 2002 ).
Most importantly for this discussion, the linear waiting principle appears to be obligatory. That is, organisms seem to follow the linear waiting rule even if they delay or even prevent reinforcer delivery by doing so. The simplest example is the RID schedule itself. Wynne & Staddon (1988) showed that it makes no difference whether the experimenter holds delay time T constant or the sum of t + T constant ( t + T = K ): Equation 1 holds in both cases, even though the optimal (reinforcement-rate-maximizing) strategy in the first case is for the animal to set t equal to zero, whereas in the second case reinforcement rate is maximized so long as t < K . Using a version of RID in which T in interval N + 1 depended on the value of t in the preceding interval, Wynne & Staddon also demonstrated two kinds of instability predicted by linear waiting.
The fact that linear waiting is obligatory allows us to look for its effects on schedules other than the simple RID schedule. The most obvious application is to ratio schedules. The time to emit a fixed number of responses is approximately constant; hence the delay to food after the first response in each interval is also approximately constant on fixed ratio (FR), as on fixed- T RID ( Powell 1968 ). Thus, the optimal strategy on FR, as on fixed- T RID, is to respond immediately after food. However, in both cases animals wait before responding and, as one might expect based on the assumption of a roughly constant interresponse time on all ratio schedules, the duration of the wait on FR is proportional to the ratio requirement ( Powell 1968 ), although longer than on a comparable chain-type schedule with the same interreinforcement time ( Crossman et al. 1974 ). The phenomenon of ratio strain —the appearance of long pauses and even extinction on high ratio schedules ( Ferster & Skinner 1957 )—may also have something to do with obligatory linear waiting.
A chain schedule is one in which a stimulus change, rather than primary reinforcement, is scheduled. Thus, a chain fixed-interval–fixed-interval schedule is one in which, for example, food reinforcement is followed by the onset of a red key light in the presence of which, after a fixed interval, a response produces a change to green. In the presence of green, food delivery is scheduled according to another fixed interval. RID schedules resemble two-link chain schedules. The first link is time t , before the animal responds; the second link is time T , after a response. We may expect, therefore, that waiting time in the first link of a two-link schedule will depend on the duration of the second link. We describe two results consistent with this conjecture and then discuss some exceptions.
Davison (1974) studied a two-link chain fixed-interval–fixed-interval schedule. Each cycle of the schedule began with a red key. Responding was reinforced, on fixed-interval I 1 s, by a change in key color from red to white. In the presence of white, food reinforcement was delivered according to fixed-interval I 2 s, followed by reappearance of the red key. Davison varied I 1 and I 2 and collected steady-state rate, pause, and link-duration data. He reported that when programmed second-link duration was long in relation to the first-link duration, pause in the first link sometimes exceeded the programmed link duration. The linear waiting predictions for this procedure can therefore be most easily derived for those conditions where the second link is held constant and the first link duration is varied (because under these conditions, the first-link pause was always less than the programmed first-link duration). The prediction for the terminal link is
where a is the proportionality constant, I 2 is the duration of the terminal-link fixed-interval, and t 2 is the pause in the terminal link. Because I 2 is constant in this phase, t 2 is also constant. The pause in the initial link is given by
where I 1 is the duration of the first link. Because I 2 is constant, Equation 3 is a straight line with slope a and positive y-intercept aI 2 .
Linear waiting theory can be tested with Davison's data by plotting, for every condition, t 1 and t 2 versus time-to-reinforcement (TTR); that is, plot pause in each link against TTR for that link in every condition. Linear waiting makes a straightforward prediction: All the data points for both links should lie on the same straight line through the origin (assuming that b → 0). We show this plot in Figure 1 . There is some variability, because the data points are individual subjects, not averages, but points from first and second links fit the same line, and the deviations do not seem to be systematic.
Steady-state pause duration plotted against actual time to reinforcement in the first and second links of a two-link chain schedule. Each data point is from a single pigeon in one experimental condition (three data points from an incomplete condition are omitted). (From Davison 1974 , Table 1)
A study by Innis et al. (1993) provides a dynamic test of the linear waiting hypothesis as applied to chain schedules. Innis et al. studied two-link chain schedules with one link of fixed duration and the other varying from reinforcer to reinforcer according to a triangular cycle. The dependent measure was pause in each link. Their Figure 3, for example, shows the programmed and actual values of the second link of the constant-cycle procedure (i.e., the first link was a constant 20 s; the second link varied from 5 to 35 s according to the triangular cycle) as well as the average pause, which clearly tracks the change in second-link duration with a lag of one interval. They found similar results for the reverse procedure, cycle-constant , in which the first link varied cyclically and the second link was constant. The tracking was a little better in the first procedure than in the second, but in both cases first-link pause was determined primarily by TTR.
There are some data suggesting that linear waiting is not the only factor that determines responding on simple chain schedules. In the four conditions of Davison's experiment in which the programmed durations of the first and second links added to a constant (120 s)—which implies a constant first-link pause according to linear waiting—pause in the first link covaried with first-link duration, although the data are noisy.
The alternative to the linear waiting account of responding on chain schedules is an account in terms of conditioned reinforcement (also called secondary reinforcement)—the idea that a stimulus paired with a primary reinforcer acquires some independent reinforcing power. This idea is also the organizing principle behind most theories of free-operant choice. There are some data that seem to imply a response-strengthening effect quite apart from the linear waiting effect, but they do not always hold up under closer inspection. Catania et al. (1980) reported that “higher rates of pecking were maintained by pigeons in the middle component of three-component chained fixed-interval schedules than in that component of the corresponding multiple schedule (two extinction components followed by a fixed-interval component)” (p. 213), but the effect was surprisingly small, given that no responding at all was required in the first two components. Moreover, results of a more critical control condition, chain versus tandem (rather than multiple) schedule, were the opposite: Rate was generally higher in the middle tandem component than in the second link of the chain. (A tandem schedule is one with the same response contingencies as a chain but with the same stimulus present throughout.)
Royalty et al. (1987) introduced a delay into the peck-stimulus-change contingency of a three-link variable-interval chain schedule and found large decreases in response rate [wait time (WT) was not reported] in both first and second links. They concluded that “because the effect of delaying stimulus change was comparable to the effect of delaying primary reinforcement in a simple variable-interval schedule … the results provide strong evidence for the concept of conditioned reinforcement” (p. 41). The implications of the Royalty et al. data for linear waiting are unclear, however, ( a ) because the linear waiting hypothesis does not deal with the assignment-of-credit problem, that is, the selection of the appropriate response by the schedule. Linear waiting makes predictions about response timing—when the operant response occurs—but not about which response will occur. Response-reinforcer contiguity may be essential for the selection of the operant response in each chain link (as it clearly is during “shaping”), and diminishing contiguity may reduce response rate, but contiguity may play little or no role in the timing of the response. The idea of conditioned reinforcement may well apply to the first function but not to the second. ( b ) Moreover, Royalty et al. did not report obtained time-to-reinforcement data; the effect of the imposed delay may therefore have been via an increase in component duration rather than directly on response rate.
Williams & Royalty (1990) explicitly compared conditioned reinforcement and time to reinforcement as explanations for chain schedule performance in three-link chains and concluded “that time to reinforcement itself accounts for little if any variance in initial-link responding” (p. 381) but not timing, which was not measured. However, these data are from chain schedules with both variable-interval and fixed-interval links, rather than fixed-interval only, and with respect to response rate rather than pause measures. In a later paper Williams qualified this claim: “The effects of stimuli in a chain schedule are due partly to the time to food correlated with the stimuli and partly to the time to the next conditioned reinforcer in the sequence” (1997, p. 145).
The conclusion seems to be that linear waiting plays a relatively major, and conditioned reinforcement (however defined) a relatively minor, role in the determination of response timing on chain fixed-interval schedules. Linear waiting also provides the best available account of a striking, unsolved problem with chain schedules: the fact that in chains with several links, pigeon subjects may respond at a low level or even quit completely in early links ( Catania 1979 , Gollub 1977 ). On fixed-interval chain schedules with five or more links, responding in the early links begins to extinguish and the overall reinforcement rate falls well below the maximum possible—even if the programmed interreinforcement interval is relatively short (e.g., 6×15=90 s). If the same stimulus is present in all links (tandem schedule), or if the six different stimuli are presented in random order (scrambled-stimuli chains), performance is maintained in all links and the overall reinforcement rate is close to the maximum possible (6 I , where I is the interval length). Other studies have reported very weak responding in early components of a simple chain fixed-interval schedule (e.g., Catania et al. 1980 , Davison 1974 , Williams 1994 ; review in Kelleher & Gollub 1962 ). These studies found that chains with as few as three fixed-interval 60-s links ( Kelleher & Fry 1962 ) occasionally produce extreme pausing in the first link. No formal theory of the kind that has proliferated to explain behavior on concurrent chain schedules (discussed below) has been offered to account for these strange results, even though they have been well known for many years.
The informal suggestion is that the low or zero response rates maintained by early components of a multi-link chain are a consequence of the same discrimination process that leads to extinction in the absence of primary reinforcement. Conversely, the stimulus at the end of the chain that is actually paired with primary reinforcement is assumed to be a conditioned reinforcer; stimuli in the middle sustain responding because they lead to production of a conditioned reinforcer ( Catania et al. 1980 , Kelleher & Gollub 1962 ). Pairing also explains why behavior is maintained on tandem and scrambled-stimuli chains ( Kelleher & Fry 1962 ). In both cases the stimuli early in the chain are either invariably (tandem) or occasionally (scrambled-stimulus) paired with primary reinforcement.
There are problems with the conditioned-reinforcement approach, however. It can explain responding in link two of a three-link chain but not in link one, which should be an extinction stimulus. The explanatory problem gets worse when more links are added. There is no well-defined principle to tell us when a stimulus changes from being a conditioned reinforcer, to a stimulus in whose presence responding is maintained by a conditioned reinforcer, to an extinction stimulus. What determines the stimulus property? Is it stimulus number, stimulus duration or the durations of stimuli later in the chain? Perhaps there is some balance between contrast/extinction, which depresses responding in early links, and conditioned reinforcement, which is supposed to (but sometimes does not) elevate responding in later links? No well-defined compound theory has been offered, even though there are several quantitative theories for multiple-schedule contrast (e.g., Herrnstein 1970 , Nevin 1974 , Staddon 1982 ; see review in Williams 1988 ). There are also data that cast doubt even on the idea that late-link stimuli have a rate-enhancing effect. In the Catania et al. (1980) study, for example, four of five pigeons responded faster in the middle link of a three-link tandem schedule than the comparable chain.
The lack of formal theories for performance on simple chains is matched by a dearth of data. Some pause data are presented in the study by Davison (1974) on pigeons in a two-link fixed-interval chain. The paper attempted to fit Herrnstein's (1970) matching law between response rates and link duration. The match was poor: The pigeon's rates fell more than predicted when the terminal links (contiguous with primary reinforcement) of the chain were long, but Davison did find that “the terminal link schedule clearly changes the pause in the initial link, longer terminal-link intervals giving longer initial-link pauses” (1974, p. 326). Davison's abstract concludes, “Data on pauses during the interval schedules showed that, in most conditions, the pause duration was a linear function of the interval length, and greater in the initial link than in the terminal link” (p. 323). In short, the pause (time-to-first-response) data were more lawful than response-rate data.
Linear waiting provides a simple explanation for excessive pausing on multi-link chain fixed-interval schedules. Suppose the chief function of the link stimuli on chain schedules is simply to signal changing times to primary reinforcement 4 . Thus, in a three-link fixed-interval chain, with link duration I , the TTR signaled by the end of reinforcement (or by the onset of the first link) is 3 I . The onset of the next link signals a TTR of 2 I and the terminal, third, link signals a TTR of I . The assumptions of linear waiting as applied to this situation are that pausing (time to first response) in each link is determined entirely by TTR and that the wait time in interval N +1 is a linear function of the TTR in the preceding interval.
To see the implications of this process, consider again a three-link chain schedule with I =1 (arbitrary time units). The performance to be expected depends entirely on the value of the proportionality constant, a , that sets the fraction of time-to-primary-reinforcement that the animal waits (for simplicity we can neglect b ; the logic of the argument is unaffected). All is well so long as a is less than one-third. If a is exactly 0.333, then for unit link duration the pause in the third link is 0.33, in the second link 0.67, and in the first link 1.0 However, if a is larger, for instance 0.5, the three pauses become 0.5, 1.0, and 1.5; that is, the pause in the first link is now longer than the programmed interval, which means the TTR in the first link will be longer than 3 the next time around, so the pause will increase further, and so on until the process stabilizes (which it always does: First-link pause never goes to ∞).
The steady-state wait times in each link predicted for a five-link chain, with unit-duration links, for two values of a are shown in Figure 2 . In both cases wait times in the early links are very much longer than the programmed link duration. Clearly, this process has the potential to produce very large pauses in the early links of multilink-chain fixed-interval schedules and so may account for the data Catania (1979) and others have reported.
Wait time (pause, time to first response) in each equal-duration link of a five-link chain schedule (as a multiple of the programmed link duration) as predicted by the linear-waiting hypothesis. The two curves are for two values of parameter a in Equation 1 ( b =0). Note the very long pauses predicted in early links—almost two orders of magnitude greater than the programmed interval in the first link for a =0.67. (From Mazur 2001 )
Gollub in his dissertation research (1958) noticed the additivity of this sequential pausing. Kelleher & Gollub (1962) in their subsequent review wrote, “No two pauses in [simple fixed interval] can both postpone food-delivery; however, pauses in different components of [a] five-component chain will postpone food-delivery additively” (p. 566). However, this additivity was only one of a number of processes suggested to account for the long pauses in early chain fixed-interval links, and its quantitative implications were never explored.
Note that the linear waiting hypothesis also accounts for the relative stability of tandem schedules and chain schedules with scrambled components. In the tandem schedule, reinforcement constitutes the only available time marker. Given that responding after the pause continues at a relatively high rate until the next time marker, Equation 1 (with b assumed negligible) and a little algebra shows that the steady-state postreinforcement pause for a tandem schedule with unit links will be
where N is the number of links and a is the pause fraction. In the absence of any time markers, pauses in links after the first are necessarily short, so the experienced link duration equals the programmed duration. Thus, the total interfood-reinforcement interval will be t + N − 1 ( t ≥ 1): the pause in the first link (which will be longer than the programmed link duration for N > 1/ a ) plus the programmed durations of the succeeding links. For the case of a = 0.67 and unit link duration, which yielded a steady-state interfood interval (IFI) of 84 for the five-link chain schedule, the tandem yields 12. For a = 0.5, the two values are approximately 16 and 8.
The long waits in early links shown in Figure 2 depend critically on the value of a . If, as experience suggests (there has been no formal study), a tends to increase slowly with training, we might expect the long pausing in initial links to take some time to develop, which apparently it does ( Gollub 1958 ).
On the scrambled-stimuli chain each stimulus occasionally ends in reinforcement, so each signals a time-to-reinforcement (TTR) 5 of I , and pause in each link should be less than the link duration—yielding a total IFI of approximately N , i.e., 5 for the example in the figure. These predictions yield the order IFI in the chain > tandem > scrambled, but parametric data are not available for precise comparison. We do not know whether an N -link scrambled schedule typically stabilizes at a shorter IFI than the comparable tandem schedule, for example. Nor do we know whether steady-state pause in successive links of a multilink chain falls off in the exponential fashion shown in Figure 2 .
In the final section we explore the implications of linear waiting for studies of free-operant choice behavior.
Although we can devote only limited space to it, choice is one of the major research topics in operant conditioning (see Mazur 2001 , p. 96 for recent statistics). Choice is not something that can be directly observed. The subject does this or that and, in consequence, is said to choose. The term has unfortunate overtones of conscious deliberation and weighing of alternatives for which the behavior itself—response A or response B—provides no direct evidence. One result has been the assumption that the proper framework for all so-called choice studies is in terms of response strength and the value of the choice alternatives. Another is the assumption that procedures that are very different are nevertheless studying the same thing.
For example, in a classic series of experiments, Kahneman & Tversky (e.g., 1979) asked a number of human subjects to make a single choice of the following sort: between $400 for sure and a 50% chance of $1000. Most went for the sure thing, even though the expected value of the gamble is higher. This is termed risk aversion , and the same term has been applied to free-operant “choice” experiments. In one such experiment an animal subject must choose repeatedly between a response leading to a fixed amount of food and one leading equiprobably to either a large or a small amount with the same average value. Here the animals tend to be either indifferent or risk averse, preferring the fixed alternative ( Staddon & Innis 1966b , Bateson & Kacelnik 1995 , Kacelnik & Bateson 1996 ).
In a second example pigeons responded repeatedly to two keys associated with equal variable-interval schedules. A successful response on the left key, for example, is reinforced by a change in the color of the pecked key (the other key light goes off). In the presence of this second stimulus, food is delivered according to a fixed-interval schedule (fixed-interval X ). The first stimulus, which is usually the same on both keys, is termed the initial link ; the second stimulus is the terminal link . Pecks on the right key lead in the same way to food reinforcement on variable-interval X . (This is termed a concurrent-chain schedule.) In this case subjects overwhelmingly prefer the initial-link choice leading to the variable-interval terminal link; that is, they are apparently risk seeking rather than risk averse ( Killeen 1968 ).
The fact that these three experiments (Kahneman & Tversky and the two free-operant studies) all produce different results is sometimes thought to pose a serious research problem, but, we contend, the problem is only in the use of the term choice for all three. The procedures (not to mention the subjects) are in fact very different, and in operant conditioning the devil is very much in the details. Apparently trivial procedural differences can sometimes lead to wildly different behavioral outcomes. Use of the term choice as if it denoted a unitary subject matter is therefore highly misleading. We also question the idea that the results of choice experiments are always best explained in terms of response strength and stimulus value.
Bearing these caveats in mind, let's look briefly at the extensive history of free-operant choice research. In Herrnstein's seminal experiment (1961 ; see Davison & McCarthy 1988 , Williams 1988 for reviews; for collected papers see Rachlin & Laibson 1997 ) hungry pigeons pecked at two side-by-side response keys, one associated with variable-interval v 1 s and the other with variable-interval v 2 s ( concurrent variable-interval–variable-interval schedule). After several experimental sessions and a range of v 1 and v 2 values chosen so that the overall programmed reinforcement rate was constant (1/ v 1 + 1/ v 2 = constant), the result was matching between steady-state relative response rates and relative obtained reinforcement rates:
where x and y are the response rates on the two alternatives and R ( x ) and R ( y ) are the rates of obtained reinforcement for them. This relation has become known as Herrnstein's matching law. Although the obtained reinforcement rates are dependent on the response rates that produce them, the matching relation is not forced, because x and y can vary over quite a wide range without much effect on R ( x ) and R ( y ).
Because of the negative feedback relation intrinsic to variable-interval schedules (the less you respond, the higher the probability of payoff), the matching law on concurrent variable-interval–variable-interval is consistent with reinforcement maximization ( Staddon & Motheral 1978 ), although the maximum of the function relating overall payoff, R ( x ) + R ( y ), to relative responding, x /( x + y ), is pretty flat. However, little else on these schedules fits the maximization idea. As noted above, even responding on simple fixed- T response-initiated delay (RID) schedules violates maximization. Matching is also highly overdetermined, in the sense that almost any learning rule consistent with the law of effect—an increase in reinforcement probability causes an increase in response probability—will yield either simple matching ( Equation 5 ) or its power-law generalization ( Baum 1974 , Hinson & Staddon 1983 , Lander & Irwin 1968 , Staddon 1968 ). Matching by itself therefore reveals relatively little about the dynamic processes operating in the responding subject (but see Davison & Baum 2000 ). Despite this limitation, the strikingly regular functional relations characteristic of free-operant choice studies have attracted a great deal of experimental and theoretical attention.
Herrnstein (1970) proposed that Equation 5 can be derived from the function relating steady-state response rate, x , and reinforcement rate, R ( x ), to each response key considered separately. This function is negatively accelerated and well approximated by a hyperbola:
where k is a constant and R 0 represents the effects of all other reinforcers in the situation. The denominator and parameter k cancel in the ratio x / y , yielding Equation 5 for the choice situation.
There are numerous empirical details that are not accounted for by this formulation: systematic deviations from matching [undermatching and overmatching ( Baum 1974 )] as a function of different types of variable-interval schedules, dependence of simple matching on use of a changeover delay , extensions to concurrent-chain schedules, and so on. For example, if animals are pretrained with two alternatives presented separately, so that they do not learn to switch between them, when given the opportunity to respond to both, they fixate on the richer one rather than matching [extreme overmatching ( Donahoe & Palmer 1994 , pp. 112–113; Gallistel & Gibbon 2000 , pp. 321–322)]. (Fixation—extreme overmatching—is, trivially, matching, of course but if only fixation were observed, the idea of matching would never have arisen. Matching implies partial, not exclusive, preference.) Conversely, in the absence of a changeover delay, pigeons will often just alternate between two unequal variable-interval choices [extreme undermatching ( Shull & Pliskoff 1967 )]. In short, matching requires exactly the right amount of switching. Nevertheless, Herrnstein's idea of deriving behavior in choice experiments from the laws that govern responding to the choice alternatives in isolation is clearly worth pursuing.
In any event, Herrnstein's approach—molar data, predominantly variable-interval schedules, rate measures—set the basic pattern for subsequent operant choice research. It fits the basic presuppositions of the field: that choice is about response strength , that response strength is equivalent to response probability, and that response rate is a valid proxy for probability (e.g., Skinner 1938 , 1966 , 1986 ; Killeen & Hall 2001 ). (For typical studies in this tradition see, e.g., Fantino 1981 ; Grace 1994 ; Herrnstein 1961 , 1964 , 1970 ; Rachlin et al. 1976 ; see also Shimp 1969 , 2001 .)
We can also look at concurrent schedules in terms of linear waiting. Although published evidence is skimpy, recent unpublished data ( Cerutti & Staddon 2002 ) show that even on variable-interval schedules (which necessarily always contain a few very short interfood intervals), postfood wait time and changeover time covary with mean interfood time. It has also long been known that Equation 6 can be derived from two time-based assumptions: that the number of responses emitted is proportional to the number of reinforcers received multiplied by the available time and that available time is limited by the time taken up by each response ( Staddon 1977 , Equations 23–25). Moreover, if we define mean interresponse time as the reciprocal of mean response rate, 6 x , and mean interfood interval is the reciprocal of obtained reinforcement rate, R ( x ), then linear waiting yields
where a and b are linear waiting constants. Rearranging yields
where 1/ b = k and a / b = R 0 in Equation 6 . Both these derivations of the hyperbola in Equation 6 from a linear relation in the time domain imply a correlation between parameters k and R 0 in Equation 6 under parametric experimental variation of parameter b by (for example) varying response effort or, possibly, hunger motivation. Such covariation has been occasionally but not universally reported ( Dallery et al. 2000 , Heyman & Monaghan 1987 , McDowell & Dallery 1999 ).
Organisms can be trained to choose between sources of primary reinforcement (concurrent schedules) or between stimuli that signal the occurrence of primary reinforcement ( conditioned reinforcement : concurrent chain schedules). Many experimental and theoretical papers on conditioned reinforcement in pigeons and rats have been published since the early 1960s using some version of the concurrent chains procedure of Autor (1960 , 1969) . These studies have demonstrated a number of functional relations between rate measures and have led to several closely related theoretical proposals such as a version of the matching law, incentive theory, delay-reduction theory, and hyperbolic value-addition (e.g., Fantino 1969a , b ; Grace 1994 ; Herrnstein 1964 ; Killeen 1982 ; Killeen & Fantino 1990 ; Mazur 1997 , 2001 ; Williams 1988 , 1994 , 1997 ). Nevertheless, there is as yet no theoretical consensus on how best to describe choice between sources of conditioned reinforcement, and no one has proposed an integrated theoretical account of simple chain and concurrent chain schedules.
Molar response rate does not capture the essential feature of behavior on fixed-interval schedules: the systematic pattern of rate-change in each interfood interval, the “scallop.” Hence, the emphasis on molar response rate as a dependent variable has meant that work on concurrent schedules has emphasized variable or random intervals over fixed intervals. We lack any theoretical account of concurrent fixed-interval–fixed-interval and fixed-interval–variable-interval schedules. However, a recent study by Shull et al. (2001 ; see also Shull 1979) suggests that response rate may not capture what is going on even on simple variable-interval schedules, where the time to initiate bouts of relatively fixed-rate responding seems to be a more sensitive dependent measure than overall response rate. More attention to the role of temporal variables in choice is called for.
We conclude with a brief account of how linear waiting may be involved in several well-established phenomena of concurrent-chain schedules: preference for variable-interval versus fixed-interval terminal links, effect of initial-link duration, and finally, so-called self-control experiments.
preference for variable-interval versus fixed-interval terminal links On concurrent-chain schedules with equal variable-interval initial links, animals show a strong preference for the initial link leading to a variable-interval terminal link over the terminal-link alternative with an equal arithmetic-mean fixed interval. This result is usually interpreted as a manifestation of nonarithmetic (e.g., harmonic) reinforcement-rate averaging ( Killeen 1968 ), but it can also be interpreted as linear waiting. Minimum TTR is necessarily much less on the variable-interval than on the fixed-interval side, because some variable intervals are short. If wait time is determined by minimum TTR—hence shorter wait times on the variable-interval side—and ratios of wait times and overall response rates are (inversely) correlated ( Cerutti & Staddon 2002 ), the result will be an apparent bias in favor of the variable-interval choice.
effect of initial-link duration Preference for a given pair of terminal-link schedules depends on initial link duration. For example, pigeons may approximately match initial-link relative response rates to terminal-link relative reinforcement rates when the initial links are 60 s and the terminal links range from 15 to 45 s ( Herrnstein 1964 ), but they will undermatch when the initial-link schedule is increased to, for example, 180 s. This effect is what led to Fantino's delay-reduction modification of Herrnstein's matching law (see Fantino et al. 1993 for a review). However, the same qualitative prediction follows from linear waiting: Increasing initial-link duration reduces the proportional TTR difference between the two choices. Hence the ratio of WTs or of initial-link response rates for the two choices should also approach unity, which is undermatching. Several other well-studied theories of concurrent choice, such as delay reduction and hyperbolic value addition, also explain these results.
The prototypical self-control experiment has a subject choosing between two outcomes: not-so-good cookie now or a good cookie after some delay ( Rachlin & Green 1972 ; see Logue 1988 for a review; Mischel et al. 1989 reviewed human studies). Typically, the subject chooses the immediate, small reward, but if both delays are increased by the same amount, D , he will learn to choose the larger reward, providing D is long enough. Why? The standard answer is derived from Herrnstein's matching analysis ( Herrnstein 1981 ) and is called hyperbolic discounting (see Mazur 2001 for a review and Ainslie 1992 and Rachlin 2000 for longer accounts). The idea is that the expected value of each reward is inversely related to the time at which it is expected according to a hyperbolic function:
where A i is the undiscounted value of the reward, D i is the delay until reward is received, i denotes the large or small reward, and k is a fitted constant.
Now suppose we set D L and D S to values such that the animal shows a preference for the shorter, sooner reward. This would be the case ( k =1) if A L =6, A S =2, D L = 6 s, and D S = 1 s: V L =0.86 and V S =1—preference for the small, less-delayed reward. If 10 s is added to both delays, so that D L = 16 s and D S =11 s, the values are V L =0.35 and V S =0.17—preference for the larger reward. Thus, Equation 8 predicts that added delay—sometimes awkwardly termed pre-commitment— should enhance self-control, which it does.
The most dramatic prediction from this analysis was made and confirmed by Mazur (1987 , 2001) in an experiment that used an adjusting-delay procedure (also termed titration ). “A response on the center key started each trial, and then a pigeon chose either a standard alternative (by pecking the red key) or an adjusting alternative (by pecking the green key) … the standard alternative delivered 2 s of access to grain after a 10-s delay, and the adjusting alternative delivered 6 s of access to grain after an adjusting delay” (2001, p. 97). The adjusting delay increased (on the next trial) when it was chosen and decreased when the standard alternative was chosen. (See Mazur 2001 for other procedural details.) The relevant independent variable is TTR. The discounted value of each choice is given by Equation 8 . When the subject is indifferent does not discriminate between the two choices, V L = V S . Equating Equation 8 for the large and small choices yields
that is, an indifference curve that is a linear function relating D L and D S , with slope A L / A S > 1 and a positive intercept. The data ( Mazur 1987 ; 2001 , Figure 2 ) are consistent with this prediction, but the intercept is small.
It is also possible to look at this situation in terms of linear waiting. One assumption is necessary: that the waiting fraction, a , in Equation 1 is smaller when the upcoming reinforcer is large than when it is small ( Powell 1969 and Perone & Courtney 1992 showed this for fixed-ratio schedules; Howerton & Meltzer 1983 , for fixed-interval). Given this assumption, the linear waiting analysis is even simpler than hyperbolic discounting. The idea is that the subject will appear to be indifferent when the wait times to the two alternatives are equal. According to linear waiting, the wait time for the small alternative is given by
where b S is a small positive intercept and a S > a L . Equating the wait times for small and large alternatives yields
which is also a linear function with slope > 1 and a small positive intercept.
Equations 9 and 11 are identical in form. Thus, the linear waiting and hyperbolic discounting models are almost indistinguishable in terms of these data. However, the linear waiting approach has three potential advantages: Parameters a and b can be independently measured by making appropriate measurements in a control study that retains the reinforcement-delay properties of the self-control experiments without the choice contingency; the linear waiting approach lacks the fitted parameter k in Equation 9 ; and linear waiting also applies to a wide range of time-production experiments not covered by the hyperbolic discounting approach.
Temporal control may be involved in unsuspected ways in a wide variety of operant conditioning procedures. A renewed emphasis on the causal factors operating in reinforcement schedules may help to unify research that has hitherto been defined in terms of more abstract topics like timing and choice.
We thank Catalin Buhusi and Jim Mazur for comments on an earlier version and the NIMH for research support over many years.
1 The first and only previous Annual Review contribution on this topic was as part of a 1965 article, “Learning, Operant Conditioning and Verbal Learning” by Blough & Millward. Since then there have been (by our estimate) seven articles on learning or learning theory in animals, six on the neurobiology of learning, and three on human learning and memory, but this is the first full Annual Review article on operant conditioning. We therefore include rather more old citations than is customary (for more on the history and philosophy of Skinnerian behaviorism, both pro and con, see Baum 1994 , Rachlin 1991 , Sidman 1960 , Staddon 2001b , and Zuriff 1985 ).
2 By “internal” we mean not “physiological” but “hidden.” The idea is simply that the organism's future behavior depends on variables not all of which are revealed in its current behavior (cf. Staddon 2001b , Ch. 7).
3 When there is no response-produced stimulus change, this procedure is also called a conjunctive fixed-ratio fixed-time schedule ( Shull 1970 ).
4 This idea surfaced very early in the history of research on equal-link chain fixed-interval schedules, but because of the presumed importance of conditioned reinforcement, it was the time to reinforcement from link stimulus offset, rather than onset that was thought to be important. Thus, Gollub (1977) , echoing his 1958 Ph.D. dissertation in the subsequent Kelleher & Gollub (1962) review, wrote, “In chained schedules with more than two components … the extent to which responding is sustained in the initial components … depends on the time that elapses from the end of the components to food reinforcement” (p. 291).
5 Interpreted as time to the first reinforcement opportunity.
6 It is not of course: The reciprocal of the mean IRT is the harmonic mean rate. In practice, “mean response rate” usually means arithmetic mean, but note that harmonic mean rate usually works better for choice data than the arithmetic mean (cf. Killeen 1968 ).
CogniFit Blog: Brain Health News
Brain Training, Mental Health, and Wellness
Operant conditioning might sound like something out of a dystopian novel. But it’s not. It’s a very real thing that was forged by a brilliant, yet quirky, psychologist. Today, we will take a quick look at his work as we as a few odd experiments that went with it…
There are few names in psychology more well-known than B. F. Skinner. First-year psychology students scribble endless lecture notes on him. Doctoral candidates cite his work in their dissertations as they test whether a rat’s behavior can be used to predict behavior in humans.
Skinner is one of the most well-known psychologists of our time that was famous for his experiments on operant conditioning. But how did he become such a central figure of these Intro to Psych courses? And, how did he develop his theories and methodologies cited by those sleep-deprived Ph.D. students?
Skinner spent his life studying the way we behave and act. But, more importantly, how this behavior can be modified.
He viewed Ivan Pavlov’s classical model of behavioral conditioning as being “too simplistic a solution” to fully explain the complexities of human (and animal) behavior and learning. It was because of this, that Skinner started to look for a better way to explain why we do things.
His early work was based on Edward Thorndike’s 1989 Law of Effect . Skinner went on to expand on the idea that most of our behavior is directly related to the consequences of said behavior. His expanded model of behavioral learning would be called operant conditioning. This centered around two things…
But, it’s important to note that the term “consequences” can be misleading. This is because there doesn’t need to be a causal relationship between the behavior and the operant. Skinner broke these responses down into three parts.
1. REINFORCERS – These give the organism a desirable stimulus and serve to increase the frequency of the behavior.
2. PUNISHERS – These are environmental responses that present an undesirable stimulus and serve to reduce the frequency of the behavior.
3. NEUTRAL OPERANTS – As the name suggests, these present stimuli that neither increase nor decrease the tested behavior.
Throughout his long and storied career, Skinner performed a number of strange experiments trying to test the limits of how punishment and reinforcement affect behavior.
Though Skinner was a professional through and through, he was also quite a quirky person. And, his unique ways of thinking are very clear in the strange and interesting experiments he performed while researching the properties of operant conditioning.
The Operant Conditioning Chamber, better known as the Skinner Box , is a device that B.F. Skinner used in many of his experiments. At its most basic, the Skinner Box is a chamber where a test subject, such as a rat or a pigeon, must ‘learn’ the desired behavior through trial and error.
B.F. Skinner used this device for several different experiments. One such experiment involves placing a hungry rat into a chamber with a lever and a slot where food is dispensed when the lever is pressed. Another variation involves placing a rat into an enclosure that is wired with a slight electric current on the floor. When the current is turned on, the rat must turn a wheel in order to turn off the current.
Though this is the most basic experiment in operant conditioning research, there is an infinite number of variations that can be created based on this simple idea.
Building on the basic ideas from his work with the Operant Conditioning Chamber, B. F. Skinner eventually began designing more and more complex experiments.
One of these experiments involved teaching a pigeon to read words presented to it in order to receive food. Skinner began by teaching the pigeon a simple task, namely, pecking a colored disk, in order to receive a reward. He then began adding additional environmental cues (in this case, they were words), which were paired with a specific behavior that was required in order to receive the reward.
Through this evolving process, Skinner was able to teach the pigeon to ‘read’ and respond to several unique commands.
Though the pigeon can’t actually read English, the fact that he was able to teach a bird multiple behaviors, each one linked to a specific stimulus, by using operant conditioning shows us that this form of behavioral learning can be a powerful tool for teaching both animals and humans complex behaviors based on environmental cues.
But Skinner wasn’t only concerned with teaching pigeons how to read. It seems he also made sure they had time to play games as well. In one of his more whimsical experiments , B. F. Skinner taught a pair of common pigeons how to play a simplified version of table tennis.
The pigeons in this experiment were placed on either side of a box and were taught to peck the ball to the other bird’s side. If a pigeon was able to peck the ball across the table and past their opponent, they were rewarded with a small amount of food. This reward served to reinforce the behavior of pecking the ball past their opponent.
Though this may seem like a silly task to teach a bird, the ping-pong experiment shows that operant conditioning can be used not only for a specific, robot-like action but also to teach dynamic, goal-based behaviors.
Thought pigeons playing ping-pong was as strange as things could get? Skinner pushed the envelope even further with his work on pigeon-guided missiles.
While this may sound like the crazy experiment of a deluded mad scientist, B. F. Skinner did actually do work to train pigeons to control the flight paths of missiles for the U.S. Army during the second world war.
Skinner began by training the pigeons to peck at shapes on a screen. Once the pigeons reliably tracked these shapes, Skinner was able to use sensors to track whether the pigeon’s beak was in the center of the screen, to one side or the other, or towards the top or bottom of the screen. Based on the relative location of the pigeon’s beak, the tracking system could direct the missile towards the target location.
Though the system was never used in the field due in part to advances in other scientific areas, it highlights the unique applications that can be created using operant training for animal behaviors.
B. F. Skinner is one of the most recognizable names in modern psychology, and with good reason. Though many of his experiments seem outlandish, the science behind them continues to impact us in ways we rarely think about.
The most prominent example is in the way we train animals for tasks such as search and rescue, companion services for the blind and disabled, and even how we train our furry friends at home—but the benefits of his research go far beyond teaching Fido how to roll over.
Operant conditioning research has found its way into the way schools motivate and discipline students, how prisons rehabilitate inmates, and even in how governments handle geopolitical relationships .
Share this post with your friends!
Operant conditioning is a type of learning in which behaviors are strengthened or weakened by their consequences, called reinforcement or punishment. Operant conditioning works by applying a consequence, that is a reward or punishment, after a behavior.
There are 65 examples of operant conditioning behavior in everyday life, classroom, parenting, child development, animals, therapy, education, relationships, ABA, work, and classic experiments.
The difference between classical and operant conditioning and common misconceptions will be discussed. Other components and concepts of operant conditioning include schedules of reinforcement, extinction, extinction burst, spontaneous recovery, and resistance to extinction.
Table of Contents
Operant conditioning, or instrumental conditioning, is a type of associative learning in which behaviors are strengthened or weakened by their consequences, called reinforcement or punishment. When a behavior is paired with a consequence repeatedly, an association is formed to create new behavior.
Psychologist B.F. Skinner, the father of operant conditioning, proposed the reinforcement theory, stating that behavior could be shaped through stimuli contingencies or reinforcements.
Operant conditioning works by applying a consequence, which is a reward or punishment, after a behavior. Reward, also known as reinforcement, strengthens a behavior by increasing the likelihood that a behavior will repeat in the future. Punishment weakens a behavior by decreasing the likelihood it will repeat.
Examples of operant conditioning behavior are in everyday life, classroom, parenting, child development, at home, animals, therapy, education, relationships, ABA, work, fear conditioning, and experiments.
There are four types of operant conditioning.
The main difference between reinforcement and punishment is their goals. Reinforcement aims to increase a desired behavior, while punishment aims to decrease an undesired behavior.
Positive denotes adding a stimulus as a consequence, while negative denotes removing a stimulus as a consequence.
Positive reinforcement adds a rewarding stimulus as a positive reinforcer to strengthen a desired behavior.
A positive reinforcement example is a parent giving their child an extra allowance for completing chores. In this example, extra allowance (positive reinforcer) is added (positive) to encourage (reinforcement) completing chores (desired behavior).
Negative reinforcement removes an unpleasant stimulus to strengthen a desired behavior.
A negative reinforcement example is that a child doesn’t have to clean the table after the meal if they eat vegetables. Here, clearing the table is an averse stimulus that is removed (negative) to encourage (reinforcement) vegetable eating (desired behavior).
Positive punishment adds an unpleasant stimulus to weaken or eliminate an undesired behavior.
A positive punishment example is when a teacher gives a student extra homework for making noise in class. In this example, extra homework is an averse stimulus that is added (positive) to discourage (punishment) students from making noise in class (undesirable behavior).
Negative punishment removes a pleasant stimulus to stop undesired behavior.
A negative punishment example is when the police revoke the driver’s driving license (pleasant stimulus) to discourage (punishment) reckless driving (unwanted behavior.)
Here are 5 famous operant conditioning experiments.
The law of effect was a theory proposed by Edward Thorndike after observing the puzzle box experiment before Skinner discovered operant conditioning.
The law of effect states that if, in the presence of a stimulus, a response was followed by a satisfying event (reinforcer), the bond between stimulus and response was strengthened. Conversely, if a response-stimulus event was followed by an unsatisfying event (punisher), the bond was weakened.
What distinguishes classical conditioning from operant conditioning? Classical conditioning is a form of associative learning where a neutral stimulus becomes a conditioned stimulus through consistent pairing with an unconditioned stimulus. This process establishes an association, allowing the initially neutral stimulus to evoke a conditioned response similar to the unconditioned stimulus.
Both classical and operant conditioning involve associative learning. However, the key difference between classical and operant conditiong is that classical conditioning associates two stimuli to elicit an automatic, involuntary response, while operant conditioning uses consequences to modify a voluntary action.
The symptoms of obsessive-compulsive disorder (OCD) are believed to be linked to operant conditioning through negative reinforcement of compulsions.
People with OCD may experience intrusive, anxiety-provoking thoughts, which lead to an obsession. According to Mowrer’s two-factor theory, the obsession could also be due to neutral stimuli becoming associated with anxiety through classical conditioning.
Negative reinforcement occurs when engaging in repetitive behavior or compulsions temporarily relieves this anxiety. This teaches the brain that the compulsion “works” to reduce distress, making it more likely to repeat in the future.
Several common misconceptions about operant conditioning often arise due to oversimplification or misunderstanding of its principles.
In operant conditioning, schedules of reinforcement are the rules or plans for delivering reinforcement. These schedules can significantly impact the strength and rate of the learned behavior.
Here are the types of reinforcement schedules.
In operant conditioning, extinction refers to the gradual weakening and eventual disappearance of a learned behavior. This occurs when the reinforcements that maintained the behavior are no longer provided.
For example, an employee who regularly receives bonuses for submitting reports early stops doing so when the bonuses are discontinued.
Here are some concepts connected to extinction.
Disclaimer: The content of this article is intended for informational purposes only and should not be considered medical advice. Always consult your healthcare provider for medical concerns.
Learning objectives.
By the end of this section, you will be able to:
The previous section of this chapter focused on the type of associative learning known as classical conditioning. Remember that in classical conditioning, something in the environment triggers a reflex automatically, and researchers train the organism to react to a different stimulus. Now we turn to the second type of associative learning, operant conditioning . In operant conditioning, organisms learn to associate a behavior and its consequence ( [link] ). A pleasant consequence makes that behavior more likely to be repeated in the future. For example, Spirit, a dolphin at the National Aquarium in Baltimore, does a flip in the air when her trainer blows a whistle. The consequence is that she gets a fish.
Classical Conditioning | Operant Conditioning | |
---|---|---|
Conditioning approach | An unconditioned stimulus (such as food) is paired with a neutral stimulus (such as a bell). The neutral stimulus eventually becomes the conditioned stimulus, which brings about the conditioned response (salivation). | The target behavior is followed by reinforcement or punishment to either strengthen or weaken it, so that the learner is more likely to exhibit the desired behavior in the future. |
Stimulus timing | The stimulus occurs immediately before the response. | The stimulus (either reinforcement or punishment) occurs soon after the response. |
Psychologist B. F. Skinner saw that classical conditioning is limited to existing behaviors that are reflexively elicited, and it doesn’t account for new behaviors such as riding a bike. He proposed a theory about how such behaviors come about. Skinner believed that behavior is motivated by the consequences we receive for the behavior: the reinforcements and punishments. His idea that learning is the result of consequences is based on the law of effect, which was first proposed by psychologist Edward Thorndike . According to the law of effect , behaviors that are followed by consequences that are satisfying to the organism are more likely to be repeated, and behaviors that are followed by unpleasant consequences are less likely to be repeated (Thorndike, 1911). Essentially, if an organism does something that brings about a desired result, the organism is more likely to do it again. If an organism does something that does not bring about a desired result, the organism is less likely to do it again. An example of the law of effect is in employment. One of the reasons (and often the main reason) we show up for work is because we get paid to do so. If we stop getting paid, we will likely stop showing up—even if we love our job.
Working with Thorndike’s law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning (Skinner, 1938). He placed these animals inside an operant conditioning chamber, which has come to be known as a “Skinner box” ( [link] ). A Skinner box contains a lever (for rats) or disk (for pigeons) that the animal can press or peck for a food reward via the dispenser. Speakers and lights can be associated with certain behaviors. A recorder counts the number of responses made by the animal.
(a) B. F. Skinner developed operant conditioning for systematic study of how behaviors are strengthened or weakened according to their consequences. (b) In a Skinner box, a rat presses a lever in an operant conditioning chamber to receive a food reward. (credit a: modification of work by “Silly rabbit”/Wikimedia Commons)
Link to Learning
Watch this brief video clip to learn more about operant conditioning: Skinner is interviewed, and operant conditioning of pigeons is demonstrated.
In discussing operant conditioning, we use several everyday words—positive, negative, reinforcement, and punishment—in a specialized manner. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something, and negative means you are taking something away. Reinforcement means you are increasing a behavior, and punishment means you are decreasing a behavior. Reinforcement can be positive or negative, and punishment can also be positive or negative. All reinforcers (positive or negative) increase the likelihood of a behavioral response. All punishers (positive or negative) decrease the likelihood of a behavioral response. Now let’s combine these four terms: positive reinforcement, negative reinforcement, positive punishment, and negative punishment ( [link] ).
Reinforcement | Punishment | |
---|---|---|
Positive | Something is to the likelihood of a behavior. | Something is to the likelihood of a behavior. |
Negative | Something is to the likelihood of a behavior. | Something is to the likelihood of a behavior. |
The most effective way to teach a person or animal a new behavior is with positive reinforcement. In positive reinforcement , a desirable stimulus is added to increase a behavior.
For example, you tell your five-year-old son, Jerome, that if he cleans his room, he will get a toy. Jerome quickly cleans his room because he wants a new art set. Let’s pause for a moment. Some people might say, “Why should I reward my child for doing what is expected?” But in fact we are constantly and consistently rewarded in our lives. Our paychecks are rewards, as are high grades and acceptance into our preferred school. Being praised for doing a good job and for passing a driver’s test is also a reward. Positive reinforcement as a learning tool is extremely effective. It has been found that one of the most effective ways to increase achievement in school districts with below-average reading scores was to pay the children to read. Specifically, second-grade students in Dallas were paid $2 each time they read a book and passed a short quiz about the book. The result was a significant increase in reading comprehension (Fryer, 2010). What do you think about this program? If Skinner were alive today, he would probably think this was a great idea. He was a strong proponent of using operant conditioning principles to influence students’ behavior at school. In fact, in addition to the Skinner box, he also invented what he called a teaching machine that was designed to reward small steps in learning (Skinner, 1961)—an early forerunner of computer-assisted learning. His teaching machine tested students’ knowledge as they worked through various school subjects. If students answered questions correctly, they received immediate positive reinforcement and could continue; if they answered incorrectly, they did not receive any reinforcement. The idea was that students would spend additional time studying the material to increase their chance of being reinforced the next time (Skinner, 1961).
In negative reinforcement , an undesirable stimulus is removed to increase a behavior. For example, car manufacturers use the principles of negative reinforcement in their seatbelt systems, which go “beep, beep, beep” until you fasten your seatbelt. The annoying sound stops when you exhibit the desired behavior, increasing the likelihood that you will buckle up in the future. Negative reinforcement is also used frequently in horse training. Riders apply pressure—by pulling the reins or squeezing their legs—and then remove the pressure when the horse performs the desired behavior, such as turning or speeding up. The pressure is the negative stimulus that the horse wants to remove.
Many people confuse negative reinforcement with punishment in operant conditioning, but they are two very different mechanisms. Remember that reinforcement, even when it is negative, always increases a behavior. In contrast, punishment always decreases a behavior. In positive punishment , you add an undesirable stimulus to decrease a behavior. An example of positive punishment is scolding a student to get the student to stop texting in class. In this case, a stimulus (the reprimand) is added in order to decrease the behavior (texting in class). In negative punishment , you remove a pleasant stimulus to decrease a behavior. For example, a driver might blast her horn when a light turns green, and continue blasting the horn until the car in front moves.
Punishment, especially when it is immediate, is one way to decrease undesirable behavior. For example, imagine your four-year-old son, Brandon, runs into the busy street to get his ball. You give him a time-out (positive punishment) and tell him never to go into the street again. Chances are he won’t repeat this behavior. While strategies like time-outs are common today, in the past children were often subject to physical punishment, such as spanking. It’s important to be aware of some of the drawbacks in using physical punishment on children. First, punishment may teach fear. Brandon may become fearful of the street, but he also may become fearful of the person who delivered the punishment—you, his parent. Similarly, children who are punished by teachers may come to fear the teacher and try to avoid school (Gershoff et al., 2010). Consequently, most schools in the United States have banned corporal punishment. Second, punishment may cause children to become more aggressive and prone to antisocial behavior and delinquency (Gershoff, 2002). They see their parents resort to spanking when they become angry and frustrated, so, in turn, they may act out this same behavior when they become angry and frustrated. For example, because you spank Brenda when you are angry with her for her misbehavior, she might start hitting her friends when they won’t share their toys.
While positive punishment can be effective in some cases, Skinner suggested that the use of punishment should be weighed against the possible negative effects. Today’s psychologists and parenting experts favor reinforcement over punishment—they recommend that you catch your child doing something good and reward her for it.
In his operant conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target behavior, in shaping , we reward successive approximations of a target behavior. Why is shaping needed? Remember that in order for reinforcement to work, the organism must first display the behavior. Shaping is needed because it is extremely unlikely that an organism will display anything but the simplest of behaviors spontaneously. In shaping, behaviors are broken down into many small, achievable steps. The specific steps used in the process are the following: Reinforce any response that resembles the desired behavior. Then reinforce the response that more closely resembles the desired behavior. You will no longer reinforce the previously reinforced response. Next, begin to reinforce the response that even more closely resembles the desired behavior. Continue to reinforce closer and closer approximations of the desired behavior. Finally, only reinforce the desired behavior.
Shaping is often used in teaching a complex behavior or chain of behaviors. Skinner used shaping to teach pigeons not only such relatively simple behaviors as pecking a disk in a Skinner box, but also many unusual and entertaining behaviors, such as turning in circles, walking in figure eights, and even playing ping pong; the technique is commonly used by animal trainers today. An important part of shaping is stimulus discrimination. Recall Pavlov’s dogs—he trained them to respond to the tone of a bell, and not to similar tones or sounds. This discrimination is also important in operant conditioning and in shaping behavior.
Here is a brief video of Skinner’s pigeons playing ping pong.
It’s easy to see how shaping is effective in teaching behaviors to animals, but how does shaping work with humans? Let’s consider parents whose goal is to have their child learn to clean his room. They use shaping to help him master steps toward the goal. Instead of performing the entire task, they set up these steps and reinforce each step. First, he cleans up one toy. Second, he cleans up five toys. Third, he chooses whether to pick up ten toys or put his books and clothes away. Fourth, he cleans up everything except two toys. Finally, he cleans his entire room.
Rewards such as stickers, praise, money, toys, and more can be used to reinforce learning. Let’s go back to Skinner’s rats again. How did the rats learn to press the lever in the Skinner box? They were rewarded with food each time they pressed the lever. For animals, food would be an obvious reinforcer.
What would be a good reinforce for humans? For your daughter Sydney, it was the promise of a toy if she cleaned her room. How about Joaquin, the soccer player? If you gave Joaquin a piece of candy every time he made a goal, you would be using a primary reinforcer . Primary reinforcers are reinforcers that have innate reinforcing qualities. These kinds of reinforcers are not learned. Water, food, sleep, shelter, sex, and touch, among others, are primary reinforcers. Pleasure is also a primary reinforcer. Organisms do not lose their drive for these things. For most people, jumping in a cool lake on a very hot day would be reinforcing and the cool lake would be innately reinforcing—the water would cool the person off (a physical need), as well as provide pleasure.
A secondary reinforcer has no inherent value and only has reinforcing qualities when linked with a primary reinforcer. Praise, linked to affection, is one example of a secondary reinforcer, as when you called out “Great shot!” every time Joaquin made a goal. Another example, money, is only worth something when you can use it to buy other things—either things that satisfy basic needs (food, water, shelter—all primary reinforcers) or other secondary reinforcers. If you were on a remote island in the middle of the Pacific Ocean and you had stacks of money, the money would not be useful if you could not spend it. What about the stickers on the behavior chart? They also are secondary reinforcers.
Sometimes, instead of stickers on a sticker chart, a token is used. Tokens, which are also secondary reinforcers, can then be traded in for rewards and prizes. Entire behavior management systems, known as token economies, are built around the use of these kinds of token reinforcers. Token economies have been found to be very effective at modifying behavior in a variety of settings such as schools, prisons, and mental hospitals. For example, a study by Cangi and Daly (2013) found that use of a token economy increased appropriate social behaviors and reduced inappropriate behaviors in a group of autistic school children. Autistic children tend to exhibit disruptive behaviors such as pinching and hitting. When the children in the study exhibited appropriate behavior (not hitting or pinching), they received a “quiet hands” token. When they hit or pinched, they lost a token. The children could then exchange specified amounts of tokens for minutes of playtime.
Parents and teachers often use behavior modification to change a child’s behavior. Behavior modification uses the principles of operant conditioning to accomplish behavior change so that undesirable behaviors are switched for more socially acceptable ones. Some teachers and parents create a sticker chart, in which several behaviors are listed ( [link] ). Sticker charts are a form of token economies, as described in the text. Each time children perform the behavior, they get a sticker, and after a certain number of stickers, they get a prize, or reinforcer. The goal is to increase acceptable behaviors and decrease misbehavior. Remember, it is best to reinforce desired behaviors, rather than to use punishment. In the classroom, the teacher can reinforce a wide range of behaviors, from students raising their hands, to walking quietly in the hall, to turning in their homework. At home, parents might create a behavior chart that rewards children for things such as putting away toys, brushing their teeth, and helping with dinner. In order for behavior modification to be effective, the reinforcement needs to be connected with the behavior; the reinforcement must matter to the child and be done consistently.
Sticker charts are a form of positive reinforcement and a tool for behavior modification. Once this little girl earns a certain number of stickers for demonstrating a desired behavior, she will be rewarded with a trip to the ice cream parlor. (credit: Abigail Batchelder)
Time-out is another popular technique used in behavior modification with children. It operates on the principle of negative punishment. When a child demonstrates an undesirable behavior, she is removed from the desirable activity at hand ( [link] ). For example, say that Sophia and her brother Mario are playing with building blocks. Sophia throws some blocks at her brother, so you give her a warning that she will go to time-out if she does it again. A few minutes later, she throws more blocks at Mario. You remove Sophia from the room for a few minutes. When she comes back, she doesn’t throw blocks.
There are several important points that you should know if you plan to implement time-out as a behavior modification technique. First, make sure the child is being removed from a desirable activity and placed in a less desirable location. If the activity is something undesirable for the child, this technique will backfire because it is more enjoyable for the child to be removed from the activity. Second, the length of the time-out is important. The general rule of thumb is one minute for each year of the child’s age. Sophia is five; therefore, she sits in a time-out for five minutes. Setting a timer helps children know how long they have to sit in time-out. Finally, as a caregiver, keep several guidelines in mind over the course of a time-out: remain calm when directing your child to time-out; ignore your child during time-out (because caregiver attention may reinforce misbehavior); and give the child a hug or a kind word when time-out is over.
Time-out is a popular form of negative punishment used by caregivers. When a child misbehaves, he or she is removed from a desirable activity in an effort to decrease the unwanted behavior. For example, (a) a child might be playing on the playground with friends and push another child; (b) the child who misbehaved would then be removed from the activity for a short period of time. (credit a: modification of work by Simone Ramella; credit b: modification of work by “JefferyTurner”/Flickr)
Remember, the best way to teach a person or animal a behavior is to use positive reinforcement. For example, Skinner used positive reinforcement to teach rats to press a lever in a Skinner box. At first, the rat might randomly hit the lever while exploring the box, and out would come a pellet of food. After eating the pellet, what do you think the hungry rat did next? It hit the lever again, and received another pellet of food. Each time the rat hit the lever, a pellet of food came out. When an organism receives a reinforcer each time it displays a behavior, it is called continuous reinforcement . This reinforcement schedule is the quickest way to teach someone a behavior, and it is especially effective in training a new behavior. Let’s look back at the dog that was learning to sit earlier in the chapter. Now, each time he sits, you give him a treat. Timing is important here: you will be most successful if you present the reinforcer immediately after he sits, so that he can make an association between the target behavior (sitting) and the consequence (getting a treat).
Once a behavior is trained, researchers and trainers often turn to another type of reinforcement schedule—partial reinforcement. In partial reinforcement , also referred to as intermittent reinforcement, the person or animal does not get reinforced every time they perform the desired behavior. There are several different types of partial reinforcement schedules ( [link] ). These schedules are described as either fixed or variable, and as either interval or ratio. Fixed refers to the number of responses between reinforcements, or the amount of time between reinforcements, which is set and unchanging. Variable refers to the number of responses or amount of time between reinforcements, which varies or changes. Interval means the schedule is based on the time between reinforcements, and ratio means the schedule is based on the number of responses between reinforcements.
Reinforcement Schedule | Description | Result | Example |
---|---|---|---|
Fixed interval | Reinforcement is delivered at predictable time intervals (e.g., after 5, 10, 15, and 20 minutes). | Moderate response rate with significant pauses after reinforcement | Hospital patient uses patient-controlled, doctor-timed pain relief |
Variable interval | Reinforcement is delivered at unpredictable time intervals (e.g., after 5, 7, 10, and 20 minutes). | Moderate yet steady response rate | Checking Facebook |
Fixed ratio | Reinforcement is delivered after a predictable number of responses (e.g., after 2, 4, 6, and 8 responses). | High response rate with pauses after reinforcement | Piecework—factory worker getting paid for every x number of items manufactured |
Variable ratio | Reinforcement is delivered after an unpredictable number of responses (e.g., after 1, 4, 5, and 9 responses). | High and steady response rate | Gambling |
Now let’s combine these four terms. A fixed interval reinforcement schedule is when behavior is rewarded after a set amount of time. For example, June undergoes major surgery in a hospital. During recovery, she is expected to experience pain and will require prescription medications for pain relief. June is given an IV drip with a patient-controlled painkiller. Her doctor sets a limit: one dose per hour. June pushes a button when pain becomes difficult, and she receives a dose of medication. Since the reward (pain relief) only occurs on a fixed interval, there is no point in exhibiting the behavior when it will not be rewarded.
With a variable interval reinforcement schedule , the person or animal gets the reinforcement based on varying amounts of time, which are unpredictable. Say that Manuel is the manager at a fast-food restaurant. Every once in a while someone from the quality control division comes to Manuel’s restaurant. If the restaurant is clean and the service is fast, everyone on that shift earns a $20 bonus. Manuel never knows when the quality control person will show up, so he always tries to keep the restaurant clean and ensures that his employees provide prompt and courteous service. His productivity regarding prompt service and keeping a clean restaurant are steady because he wants his crew to earn the bonus.
With a fixed ratio reinforcement schedule , there are a set number of responses that must occur before the behavior is rewarded. Carla sells glasses at an eyeglass store, and she earns a commission every time she sells a pair of glasses. She always tries to sell people more pairs of glasses, including prescription sunglasses or a backup pair, so she can increase her commission. She does not care if the person really needs the prescription sunglasses, Carla just wants her bonus. The quality of what Carla sells does not matter because her commission is not based on quality; it’s only based on the number of pairs sold. This distinction in the quality of performance can help determine which reinforcement method is most appropriate for a particular situation. Fixed ratios are better suited to optimize the quantity of output, whereas a fixed interval, in which the reward is not quantity based, can lead to a higher quality of output.
In a variable ratio reinforcement schedule , the number of responses needed for a reward varies. This is the most powerful partial reinforcement schedule. An example of the variable ratio reinforcement schedule is gambling. Imagine that Sarah—generally a smart, thrifty woman—visits Las Vegas for the first time. She is not a gambler, but out of curiosity she puts a quarter into the slot machine, and then another, and another. Nothing happens. Two dollars in quarters later, her curiosity is fading, and she is just about to quit. But then, the machine lights up, bells go off, and Sarah gets 50 quarters back. That’s more like it! Sarah gets back to inserting quarters with renewed interest, and a few minutes later she has used up all her gains and is $10 in the hole. Now might be a sensible time to quit. And yet, she keeps putting money into the slot machine because she never knows when the next reinforcement is coming. She keeps thinking that with the next quarter she could win $50, or $100, or even more. Because the reinforcement schedule in most types of gambling has a variable ratio schedule, people keep trying and hoping that the next time they will win big. This is one of the reasons that gambling is so addictive—and so resistant to extinction.
In operant conditioning, extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very slowly, as described above. But in the other reinforcement schedules, extinction may come quickly. For example, if June presses the button for the pain relief medication before the allotted time her doctor has approved, no medication is administered. She is on a fixed interval reinforcement schedule (dosed hourly), so extinction occurs quickly when reinforcement doesn’t come at the expected time. Among the reinforcement schedules, variable ratio is the most productive and the most resistant to extinction. Fixed interval is the least productive and the easiest to extinguish ( [link] ).
The four reinforcement schedules yield different response patterns. The variable ratio schedule is unpredictable and yields high and steady response rates, with little if any pause after reinforcement (e.g., gambler). A fixed ratio schedule is predictable and produces a high response rate, with a short pause after reinforcement (e.g., eyeglass saleswoman). The variable interval schedule is unpredictable and produces a moderate, steady response rate (e.g., restaurant manager). The fixed interval schedule yields a scallop-shaped response pattern, reflecting a significant pause after reinforcement (e.g., surgery patient).
Skinner (1953) stated, “If the gambling establishment cannot persuade a patron to turn over money with no return, it may achieve the same effect by returning part of the patron’s money on a variable-ratio schedule” (p. 397).
Skinner uses gambling as an example of the power and effectiveness of conditioning behavior based on a variable ratio reinforcement schedule. In fact, Skinner was so confident in his knowledge of gambling addiction that he even claimed he could turn a pigeon into a pathological gambler (“Skinner’s Utopia,” 1971). Beyond the power of variable ratio reinforcement, gambling seems to work on the brain in the same way as some addictive drugs. The Illinois Institute for Addiction Recovery (n.d.) reports evidence suggesting that pathological gambling is an addiction similar to a chemical addiction ( [link] ). Specifically, gambling may activate the reward centers of the brain, much like cocaine does. Research has shown that some pathological gamblers have lower levels of the neurotransmitter (brain chemical) known as norepinephrine than do normal gamblers (Roy, et al., 1988). According to a study conducted by Alec Roy and colleagues, norepinephrine is secreted when a person feels stress, arousal, or thrill; pathological gamblers use gambling to increase their levels of this neurotransmitter. Another researcher, neuroscientist Hans Breiter, has done extensive research on gambling and its effects on the brain. Breiter (as cited in Franzen, 2001) reports that “Monetary reward in a gambling-like experiment produces brain activation very similar to that observed in a cocaine addict receiving an infusion of cocaine” (para. 1). Deficiencies in serotonin (another neurotransmitter) might also contribute to compulsive behavior, including a gambling addiction.
It may be that pathological gamblers’ brains are different than those of other people, and perhaps this difference may somehow have led to their gambling addiction, as these studies seem to suggest. However, it is very difficult to ascertain the cause because it is impossible to conduct a true experiment (it would be unethical to try to turn randomly assigned participants into problem gamblers). Therefore, it may be that causation actually moves in the opposite direction—perhaps the act of gambling somehow changes neurotransmitter levels in some gamblers’ brains. It also is possible that some overlooked factor, or confounding variable, played a role in both the gambling addiction and the differences in brain chemistry.
Some research suggests that pathological gamblers use gambling to compensate for abnormally low levels of the hormone norepinephrine, which is associated with stress and is secreted in moments of arousal and thrill. (credit: Ted Murphy)
Although strict behaviorists such as Skinner and Watson refused to believe that cognition (such as thoughts and expectations) plays a role in learning, another behaviorist, Edward C. Tolman , had a different opinion. Tolman’s experiments with rats demonstrated that organisms can learn even if they do not receive immediate reinforcement (Tolman & Honzik, 1930; Tolman, Ritchie, & Kalish, 1946). This finding was in conflict with the prevailing idea at the time that reinforcement must be immediate in order for learning to occur, thus suggesting a cognitive aspect to learning.
In the experiments, Tolman placed hungry rats in a maze with no reward for finding their way through it. He also studied a comparison group that was rewarded with food at the end of the maze. As the unreinforced rats explored the maze, they developed a cognitive map : a mental picture of the layout of the maze ( [link] ). After 10 sessions in the maze without reinforcement, food was placed in a goal box at the end of the maze. As soon as the rats became aware of the food, they were able to find their way through the maze quickly, just as quickly as the comparison group, which had been rewarded with food all along. This is known as latent learning : learning that occurs but is not observable in behavior until there is a reason to demonstrate it.
Psychologist Edward Tolman found that rats use cognitive maps to navigate through a maze. Have you ever worked your way through various levels on a video game? You learned when to turn left or right, move up or down. In that case you were relying on a cognitive map, just like the rats in a maze. (credit: modification of work by “FutUndBeidl”/Flickr)
Latent learning also occurs in humans. Children may learn by watching the actions of their parents but only demonstrate it at a later date, when the learned material is needed. For example, suppose that Ravi’s dad drives him to school every day. In this way, Ravi learns the route from his house to his school, but he’s never driven there himself, so he has not had a chance to demonstrate that he’s learned the way. One morning Ravi’s dad has to leave early for a meeting, so he can’t drive Ravi to school. Instead, Ravi follows the same route on his bike that his dad would have taken in the car. This demonstrates latent learning. Ravi had learned the route to school, but had no need to demonstrate this knowledge earlier.
Have you ever gotten lost in a building and couldn’t find your way back out? While that can be frustrating, you’re not alone. At one time or another we’ve all gotten lost in places like a museum, hospital, or university library. Whenever we go someplace new, we build a mental representation—or cognitive map—of the location, as Tolman’s rats built a cognitive map of their maze. However, some buildings are confusing because they include many areas that look alike or have short lines of sight. Because of this, it’s often difficult to predict what’s around a corner or decide whether to turn left or right to get out of a building. Psychologist Laura Carlson (2010) suggests that what we place in our cognitive map can impact our success in navigating through the environment. She suggests that paying attention to specific features upon entering a building, such as a picture on the wall, a fountain, a statue, or an escalator, adds information to our cognitive map that can be used later to help find our way out of the building.
Watch this video to learn more about Carlson’s studies on cognitive maps and navigation in buildings.
Operant conditioning is based on the work of B. F. Skinner. Operant conditioning is a form of learning in which the motivation for a behavior happens after the behavior is demonstrated. An animal or a human receives a consequence after performing a specific behavior. The consequence is either a reinforcer or a punisher. All reinforcement (positive or negative) increases the likelihood of a behavioral response. All punishment (positive or negative) decreases the likelihood of a behavioral response. Several types of reinforcement schedules are used to reward behavior depending on either a set or variable period of time.
Critical thinking questions.
1. What is a Skinner box and what is its purpose?
2. What is the difference between negative reinforcement and punishment?
3. What is shaping and how would you use shaping to teach a dog to roll over?
4. Explain the difference between negative reinforcement and punishment, and provide several examples of each based on your own experiences.
5. Think of a behavior that you have that you would like to change. How could you use behavior modification, specifically positive reinforcement, to change your behavior? What is your positive reinforcer?
1. A Skinner box is an operant conditioning chamber used to train animals such as rats and pigeons to perform certain behaviors, like pressing a lever. When the animals perform the desired behavior, they receive a reward: food or water.
2. In negative reinforcement you are taking away an undesirable stimulus in order to increase the frequency of a certain behavior (e.g., buckling your seat belt stops the annoying beeping sound in your car and increases the likelihood that you will wear your seatbelt). Punishment is designed to reduce a behavior (e.g., you scold your child for running into the street in order to decrease the unsafe behavior.)
3. Shaping is an operant conditioning method in which you reward closer and closer approximations of the desired behavior. If you want to teach your dog to roll over, you might reward him first when he sits, then when he lies down, and then when he lies down and rolls onto his back. Finally, you would reward him only when he completes the entire sequence: lying down, rolling onto his back, and then continuing to roll over to his other side.
What is operant conditioning, what is the practical meaning of operant conditioning, keywords and most important names, bibliography, operant conditioning.
Operant conditioning, sometimes also known as Skinnerian conditioning or radical behaviorism is a behaviorist learning approach similar to classical conditioning , mostly influenced by early theoretical and experimental works of American psychologist Burrhus Frederic Skinner from the 1950s. Main difference between those two theories is that classical conditioning modifies only reflex reactions and operant conditioning shapes new behavior .
The most famous experiment considering operant learning is Skinner box , also known as operant conditioning chamber . In one such experiment Skinner demonstrated the principles of operant conditioning and behavior shaping on a rat using reinforcement in terms of food. A starved rat was put in a box, in which pressing a small lever would release some food. The rat soon learned that pressing the lever would get him some food.
In another experiment, two lights (red and green) were introduced into the box and the rat would only get the food if one of them was on. The rat soon learned to discriminate between the lights, and stopped or reduced pressing the lever when the “wrong” light was on.
Unlike Pavlovian conditioning, where an existing behavior (salivating for food) is shaped by associating it with a new stimulus (sound of a bell), operant conditioning is the rewarding of an act that approaches a new desired behavior , but can also be the opposite: punishing undesirable behavior (negative reinforcement). 1)
After accidentally running short on rat food once, Skinner also started observing effects of different schedules of reinforcement 2) :
An interesting observation he made was that if fixed interval is used, rats managed to find a “rhythm” in displaying of behavior, which was never the case in variable schedules. Variable schedules, surprisingly, have also shown to be very resistant to extinction. The gambling addiction offers another example for this: although reinforcement comes rarely, one can never be sure if it will or won't come the next time so he gives another try.
Operant conditioning can also be used to shape more complex behaviors by starting from an idea similar to the intended behavior and after it is learned slowly shaping it until it becomes exactly what was desired . An example of this is how Skinner and his students managed to teach pigeons to bowl. 3)
Some of his ideas Skinner incorporated in his book “Walden II”, about a behavior control based utopian society. He is also remembered for claiming that if his house was on fire, he would rather save his books than his children, since his writings could make greater contributions than his genes. 4)
There are many examples of operant conditioning in everyday use. The act of completing homework in order to get a reward from a teacher, or finishing projects to receive praise or promotions from the employer is a form of operant conditioning 5) . In these examples, the increased probability of certain behavior is the result of possibility of rewards .
Oppositely, operant conditioning can also be used to decrease probability of certain behavior by use of punishment ( averse stimulus ). For example, children in classroom may be told they will have to sit in the back of the classroom if they talk out of turn 6) . The possibility of punishment may decrease the probability of unwanted behaviors.
Criticisms of operant conditioning are similar to criticisms in general. Operant conditioning
Boeree, G. Personality theories: B. F. Skinner. Retrieved February 22, 2011.
Blackman, Derek E. Operant conditioning: an experimental analysis of behaviour. Routledge, 1974.
Skinner, Burrhus F. About behaviorism. Vintage Books, 1974.
Skinner, B. F. "Superstition" in the Pigeon. Journal of Experimental Psychology #38, p168-172. 1947.
Peterson, G. B. A day of great illumination: BF Skinner's discovery of shaping. Journal of the Experimental Analysis of Behavior 82, no. 3, p317-328. 2004.
Wolf, M., Risley, T., Johnston, M., Harris, F. and Allen, E. Application of operant conditioning procedures to the behavior problems of an autistic child: a follow-up and extension. Behaviour Research and Therapy 5, no. 2, p103-111. May 1967.
Levene, Howard I., Engel, Bernard T. and Pearson, John A. Differential Operant Conditioning of Heart Rate. Psychosom Med 30, no. 6, p837-845. November 1, 1968.
Have you ever wondered, how did you learn to behave in a specific situation either to act good or bad? Of course, our parents and teachers have a great hand behind our behavioral aspects. But what are the tools that derive the behavior in our life?
Psychologist B.F. Skinner has defined Learning behavior through a called an operant conditioning theory. According to him, “The behavior of an individual is influenced by the consequences. It is the form of conditioning which explains the relationship between behavior and their consequences or rewards (Reinforcements and Punishments)”.
Two principal terms influence operant conditioning :
a. Reinforcements (Positive or Negative): Increase the rate of behavior.
b. Punishments ( Positive or Negative): Decrease the rate of behavior.
Now, let’s understand how operant conditioning operates our daily life activities:
In Positive reinforcement, one gets rewarded for a certain kind of behavior; with this, the probability of continuing good behavior increases. Let’s have some relevant examples of positive reinforcement:
A student tends to complete his/her homework daily; because he/she knows that he/she will be rewarded with a candy (action) or praise (behavior).
A child may learn to clean his/her room regularly; because he/she will be rewarded with extra TV hours every time he/she cleans up.
Workers are often offered with the incentives and bonus in return of completing their targets in time or for regular attendance. It makes the workers to perform better, so that, they can continuously get those incentives and bonus.
Sales Person often give Discounts and prizes to their customer in return for their assurance to shop with them again in the future. Similarly, most of the gyms also offer certain discounts to their customers, if they work out a certain number of times and use their diet products.
Negative reinforcement tends to take away something unpleasant, which is acceptable and helps in strengthening the behavior. Let’s have some relevant examples for Negative reinforcements:
Students or children will follow rules strictly to avoid being nagged by the teachers or parents. So, to avoid nagging, the child might end up following the rules strictly. Similarly, army personnel also have to follow the strict routine to avoid disciplinary actions against them; it shapes them into a disciplined individual.
Class presentations are daily parts of student life. If a student is praised or complimented, he/she will be encouraged to do well, but if the student is laughed on or criticized in front of everyone, the presentation will be nothing more than just a formality in future.
A child throws a tantrum because he/she didn’t get the candy bar. So, his/her father gets him one. He/She then stops the tantrum i.e. something unpleasant is avoided, and his/her father’s behavior of getting candy will increase.
A man turns on the TV sound to prevent the irritating sounds coming from outside of his house, maybe of vehicle’s honking or from an under-construction area. Turning on the TV or increasing the volume might decrease that unpleasant sound.
Positive Punishments is presenting something unpleasant after the behavior. It tends to decrease that behavior of the individual. Let’s have some relevant examples of positive punishment:
A student who always comes late to the class gets insulted every time in front of everyone from the teacher. To prevent the insult or shouting from the teacher, he/she may avoid coming late to the class.
After hitting a classmate, a student is made to sit alone in the class, and no one is allowed to talk to him or sit with him. It may ensure that the child will never hit his classmates again in the future.
A student who ignores his/her studies or regularly gets failed in his/her exams and does not care towards his/her studies is often scolded by his/her parents and teachers. Sometimes, his allowances (pocket money) may also be reduced or completely cut off, the student though reluctantly, may be forced to focus on his/her studies to avoid the failures again.
Negative Punishment is removing something pleasant after the behavior. It also tends to decrease that behavior.
An employee getting criticized in front of the whole office by his boss and having certain privileges taken away as a consequence to his bad behavior at work may motivate him to stay in line and be more sincere.
For instance, a driver is fined to some amount, and his driving license is ceased for not following the traffic rules. Here, money and license are removed as his pleasant affair.
One response.
Citation for this please
Jules Clark/Getty Images
Pavlov's dog experiments played a critical role in the discovery of one of the most important concepts in psychology: Classical conditioning .
While it happened quite by accident, Pavlov's famous experiments had a major impact on our understanding of how learning takes place as well as the development of the school of behavioral psychology. Classical conditioning is sometimes called Pavlovian conditioning.
How did experiments on the digestive response in dogs lead to one of the most important discoveries in psychology? Ivan Pavlov was a noted Russian physiologist who won the 1904 Nobel Prize for his work studying digestive processes.
While studying digestion in dogs, Pavlov noted an interesting occurrence: His canine subjects would begin to salivate whenever an assistant entered the room.
The concept of classical conditioning is studied by every entry-level psychology student, so it may be surprising to learn that the man who first noted this phenomenon was not a psychologist at all.
In his digestive research, Pavlov and his assistants would introduce a variety of edible and non-edible items and measure the saliva production that the items produced.
Salivation, he noted, is a reflexive process. It occurs automatically in response to a specific stimulus and is not under conscious control.
However, Pavlov noted that the dogs would often begin salivating in the absence of food and smell. He quickly realized that this salivary response was not due to an automatic, physiological process.
Based on his observations, Pavlov suggested that the salivation was a learned response. Pavlov's dog subjects were responding to the sight of the research assistants' white lab coats, which the animals had come to associate with the presentation of food.
Unlike the salivary response to the presentation of food, which is an unconditioned reflex, salivating to the expectation of food is a conditioned reflex.
Pavlov then focused on investigating exactly how these conditioned responses are learned or acquired. In a series of experiments, he set out to provoke a conditioned response to a previously neutral stimulus.
He opted to use food as the unconditioned stimulus , or the stimulus that evokes a response naturally and automatically. The sound of a metronome was chosen to be the neutral stimulus.
The dogs would first be exposed to the sound of the ticking metronome, and then the food was immediately presented.
After several conditioning trials, Pavlov noted that the dogs began to salivate after hearing the metronome. "A stimulus which was neutral in and of itself had been superimposed upon the action of the inborn alimentary reflex," Pavlov wrote of the results.
"We observed that, after several repetitions of the combined stimulation, the sounds of the metronome had acquired the property of stimulating salivary secretion."
In other words, the previously neutral stimulus (the metronome) had become what is known as a conditioned stimulus that then provoked a conditioned response (salivation).
To review, the following are some key components used in Pavlov's theory:
Pavlov's discovery of classical conditioning remains one of the most important in psychology's history.
In addition to forming the basis of what would become behavioral psychology , the classical conditioning process remains important today for numerous applications, including behavioral modification and mental health treatment.
Principles of classical conditioning are used to treat the following mental health disorders:
For instance, a specific type of treatment called aversion therapy uses conditioned responses to help people with anxiety or a specific phobia.
A therapist will help a person face the object of their fear gradually—while helping them manage any fear responses that arise. Gradually, the person will form a neutral response to the object.
Pavlov’s work has also inspired research on how to apply classical conditioning principles to taste aversions . The principles have been used to prevent coyotes from preying on domestic livestock and to use neutral stimulus (eating some type of food) paired with an unconditioned response (negative results after eating the food) to create an aversion to a particular food.
Unlike other forms of classical conditioning, this type of conditioning does not require multiple pairings in order for an association to form. In fact, taste aversions generally occur after just a single pairing. Ranchers have found ways to put this form of classical conditioning to good use to protect their herds.
In one example, mutton was injected with a drug that produces severe nausea. After eating the poisoned meat, coyotes then avoided sheep herds rather than attack them.
While Pavlov's discovery of classical conditioning formed an essential part of psychology's history, his work continues to inspire further research today. His contributions to psychology have helped make the discipline what it is today and will likely continue to shape our understanding of human behavior for years to come.
Adams M. The kingdom of dogs: Understanding Pavlov’s experiments as human–animal relationships . Theory & Psychology . 2019;30(1):121-141. doi:10.1177/0959354319895597
Fanselow MS, Wassum KM. The origins and organization of vertebrate Pavlovian conditioning . Cold Spring Harb Perspect Biol. 2015;8(1):a021717. doi:10.1101/cshperspect.a021717
Nees F, Heinrich A, Flor H. A mechanism-oriented approach to psychopathology: The role of Pavlovian conditioning . Int J Psychophysiol. 2015;98(2):351-364. doi:10.1016/j.ijpsycho.2015.05.005
American Psychological Association. What is exposure therapy?
Lin JY, Arthurs J, Reilly S. Conditioned taste aversions: From poisons to pain to drugs of abuse. Psychon Bull Rev . 2017;24(2):335-351. doi:10.3758/s13423-016-1092-8
Gustafson, C.R., Kelly, D.J, Sweeney, M., & Garcia, J. Prey-lithium aversions: I. Coyotes and wolves. Behavioral Biology. 1976; 17: 61-72.
Hock, R.R. Forty studies that changed psychology: Explorations into the history of psychological research. (4th ed.). New Jersey: Pearson Education; 2002.
By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."
IMAGES
COMMENTS
Operant conditioning is a method of learning that occurs through rewards and punishments for behavior. Through operant conditioning, an individual makes an association between a particular behavior and a consequence. ... This experiment was conducted to explore the effects of non-contingent reinforcement on pigeons, leading to some fascinating ...
Operant conditioning, sometimes referred to as instrumental conditioning, is a learning method that employs rewards and punishments for behavior. Through operant conditioning, an association is made between a behavior and a consequence (whether negative or positive) for that behavior. For example, when lab rats press a lever when a green light ...
Key Takeaways: Operant Conditioning. Operant conditioning is the process of learning through reinforcement and punishment. In operant conditioning, behaviors are strengthened or weakened based on the consequences of that behavior. Operant conditioning was defined and studied by behavioral psychologist B.F. Skinner.
Operant conditioning, also called instrumental conditioning, is a learning process where voluntary behaviors are modified by association with the addition ... thereby increasing the original behavior's frequency. In the Skinner Box experiment, the aversive stimulus might be a loud noise continuously inside the box; negative reinforcement would ...
Burrhus Frederic Skinner, also known as B.F. Skinner is considered the "father of Operant Conditioning.". His experiments, conducted in what is known as "Skinner's box," are some of the most well-known experiments in psychology. They helped shape the ideas of operant conditioning in behaviorism.
Operant conditioning is a system of learning that happens by changing external variables called 'punishments' and 'rewards.'. Throughout time and repetition, learning happens when an association is created between a certain behavior and the consequence of that behavior (good or bad). You might also hear this concept as "instrumental ...
Operant conditioning, also known as instrumental conditioning or Skinnerian conditioning, is a learning theory in behavioral psychology. It can be used to increase or decrease the frequency of ...
Operant conditioning is a fundamental concept in psychology. It describes the process where behavior changes depending on the consequences of the behavior (American Psychological Association, 2023). For example, if a behavior is rewarded (positively reinforced), the likelihood of it being repeated increases. And if it's punished, the ...
Operant conditioning differs from classical conditioning, in which subjects produce involuntary and reflexive responses related to a biological stimulus and an associated neutral stimulus.For example, in experiments based on the work of the Russian physiologist Ivan Pavlov (1849-1936), dogs can be classically conditioned to salivate in response to a bell.
In his operant conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target behavior, ... Operant conditioning is a form of learning in which the motivation for a behavior happens after the behavior is demonstrated. An animal or a human receives a consequence after performing a specific behavior.
Operant conditioning, sometimes called instrumental conditioning or Skinnerian conditioning, is a method of learning that uses rewards and punishment to modify behavior. Through operant ...
Operant conditioning is a theory of learning in behavioral psychology which emphasises the role of reinforcement in conditioning. It emphasises the effect that rewards and punishments for specific behaviors can have on a person's future actions. The theory was developed by the American psychologist B. F. Skinner following experiments ...
In his operant conditioning experiments, Skinner often used an approach called shaping. ... In operant conditioning, extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very ...
Here, the action of pressing the lever is an operant response/behavior, and the food released inside the chamber is the reward. The experiment is also known as Instrumental Conditioning Learning as the response is instrumental in getting food. This experiment also deals with and explains the effects of positive reinforcement.
The Skinner Box is a chamber, often small, that is used to conduct operant conditioning research with animals. Within this chamber, there is usually a lever or key that an individual animal can operate to obtain a food or water source within the chamber as a reinforcer. The chamber is connected to electronic equipment that records the animal ...
Example of classical conditioning: In animal training, a trainer might utilize classical conditioning by repeatedly pairing the sound of a clicker with the taste of food.Eventually, the sound of the clicker alone will begin to produce the same response as the taste of food. Example of operant conditioning: In a classroom setting, a teacher might utilize operant conditioning by offering tokens ...
In practice, operant conditioning is the study of reversible behavior maintained by reinforcement schedules. We review empirical studies and theoretical approaches to two large classes of operant behavior: interval timing and choice. We discuss cognitive versus behavioral approaches to timing, the "gap" experiment and its implications ...
Experiment #2: A Pigeon That Can Read. Building on the basic ideas from his work with the Operant Conditioning Chamber, B. F. Skinner eventually began designing more and more complex experiments. One of these experiments involved teaching a pigeon to read words presented to it in order to receive food.
Operant conditioning works by applying a consequence, that is a reward or punishment, after a behavior. There are 65 examples of operant conditioning behavior in everyday life, classroom, parenting, child development, animals, therapy, education, relationships, ABA, work, and classic experiments. The difference between classical and operant ...
In his operant conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target behavior, ... Operant conditioning is a form of learning in which the motivation for a behavior happens after the behavior is demonstrated. An animal or a human receives a consequence after performing a specific behavior.
The most famous experiment considering operant learning is Skinner box, also known as operant conditioning chamber. In one such experiment Skinner demonstrated the principles of operant conditioning and behavior shaping on a rat using reinforcement in terms of food. A starved rat was put in a box, in which pressing a small lever would release ...
Psychologist B.F. Skinner has defined Learning behavior through a called an operant conditioning theory. According to him, "The behavior of an individual is influenced by the consequences. It is the form of conditioning which explains the relationship between behavior and their consequences or rewards (Reinforcements and Punishments)".
Pavlov's dog experiments played a critical role in the discovery of one of the most important concepts in psychology: Classical conditioning . While it happened quite by accident, Pavlov's famous experiments had a major impact on our understanding of how learning takes place as well as the development of the school of behavioral psychology.
Operant Conditioning Theory is not the only idea behind dog-training, but it powers many schools of thought. Now that you understand the basics, you can make more informed decisions when choosing ...