I want to get into a topic that might be the question in baseball. The one that people really want to know. Who is going to break out next year?
I want to approach the question from a slightly different way than it is usually done. Normally, people who look for breakouts are looking to identify a magic list. These five guys are the ones to watch out for. Yeah, eventually, there need to be names attached (because otherwise, what’s the point?) but I’m a little wary of magic lists. The problem with “must watch” lists is that someone will often pick some factor which s/he believes will lead to change (the good ones will actually have some reasonable data to try to justify the factor) and then list players who fit that mold. This is where you get the 26+3 (under age 26, more than 3 years in the majors) lists. It’s not a bad heuristic, but it’s also not very informative to the bigger question. What promotes growth and development in players?
Magic lists have the unfortunate habit of invariably swinging and missing on a few guys (remember Scott Sizemore, Rookie of the Year candidate?) and completely missing out on the guy who actually appears from nowhere (Josh Donaldson). So, I’m going to start with a disclaimer. The lists are just to illustrate the point.
Growth and development are hard to predict for a simple reason. There are several ways to become a better player. Sometimes it’s as simple as making a single decision that snaps everything into place. Sometimes, it’s a matter of the growth of a couple of skills at the same time. I think that in the rush to make the list, people have forgotten to really study what’s going on at a deeper, more molecular level. So, that’s what I want to do. What are the ways—plural—in which we can model (numerically) growth and development in baseball players?
Warning! Gory Mathematical Details Ahead!
We’re going to look at breakouts from a very specific place. Baseball stats are most often quoted in their full season form (e.g., “He hit .280 last year.”) because the season is the most important unit of measurement in the game for two reasons. One is that they hand out only one “World” Series Championship each year (yeah, I know, Toronto), and the other is that player contracts pretty much exclusively run through the end of a season. General managers (whether of the fantasy or real variety) are generally looking back at last year’s stats, trying to divine who will shine this year.
I’m going to define a breakout as a change in some statistic of note from one year to the next. (For example, in a moment, I’m going to use a reduction in strikeout rate.) That can take two forms. One is a raw change (He struck out in 18.0 percent of his plate appearances last year and 16.2 percent this year, a difference of 1.8 percent) or a percent change over baseline (his strikeout rate fell by 10 percent over last year).There’s a third method, which I have discussed before, called the reliable change index that adjusts for some of the inherent unreliability in baseball stats, especially at smaller sample sizes. For right now, I’m going to use the first two.
To start looking for evidence that a player might change his stripes in the year to come, let’s start in an obvious place and see whether our player showed any indications that he was changing last year. I have previously suggested a method for looking at talent level variations within a season for an individual player. The basic idea is that we might use a moving average approach toward modeling a player’s performance.
For example, what is the better predictor of what Smith will do in a particular plate appearance? Is his overall seasonal average the best predictor, or perhaps his last 100 PA? If the answer is the last 100 PA, then we can safely say that there were probably some peaks and valleys in Smith’s true talent level over the course of the year. If his full season rate is the best predictor of his plate appearances, then we can assume that his talent level was fairly consistent over the course of the year. We run the same analyses for all of Jones’s plate appearances in a given year as well. In this way, we get a read on Smith and Jones as individuals. For these analyses, I looked to see whether a player’s full season average was the best predictor or whether a moving average of his last 50 PA, 60 PA, 70 PA, etc. up to 200 was the best predictor. (For the initiated, I used a stepwise binary logistic model and picked out the first variable to enter the equation.) I looked at all playerseasons from 20092013, minimum of 250 PA in a season, and determined the best predictor for each one.
We’re going to call the group that shows peaks and valleys over the course of the year “changelings” and those who stay the same “solids.” For the changelings, once we know what the best “width” for a tracking average is for him, we can map out his projected true talent levels over the course of a season. We can also take those points and shoot a regression line through them to see whether, overall, the trend line is pointing upward or downward (for the initiated, the regression coefficient is positive or negative). That leaves us with three groups: changelings who are trending upward, changelings who are trending downward, and solids. Now we know whether Smith’s strikeout rate was moving up, down, or staying level over the course of the previous year.
Now, what happens in the next year to each of those groups? Again, I’m defining a breakout based on fullseason stats, because those are the ones that are easiest to look at. So, I looked to see how many in each group had a year over year increase of at least 1 percentage point in their strikeout rate. Then, 2 percentage points, then 3. I also looked to see how many showed an increase of more than 10 percent of their previous rate.
The results:
Trending Upward 
Solids 
Trending Downward 

Increased Strikeouts by 1% or more 
45.3% 
40.5% 
40.0% 
Increased Strikeouts by 2% or more 
31.8% 
27.8% 
29.8% 
Increased Strikeouts by 3% or more 
19.3% 
17.0% 
19.5% 
Increased Strikeouts by 10% or more over baseline 
36.5% 
31.4% 
32.7% 
Decreased Strikeouts by 1% or more 
33.2% 
31.8% 
39.5% 
Decreased Strikeouts by 2% or more 
20.2% 
20.9% 
26.8% 
Decreased Strikeouts by 3% or more 
11.7% 
12.9% 
18.5% 
Decreased Strikeouts by 10% or more over baseline 
20.6% 
24.6% 
31.2% 
One thing that we clearly see is that there are a lot of players who randomly move around in their strikeout rate. Even a big move like 3 percent up happened for about 20 percent of the sample that had previously been trending downward. But we do see a pattern that we might expect. Changelings who were trending upward last year were somewhat more likely (for the initiated, chisquares were not significant) to show an increase in their strikeout rates. There’s only a couple of points of separation there. Some of that is probably due to the fact that if a player starts to strike out a lot more than he had been, he’s likely to get demoted or released for such an offense, and not reach the 250 PA inclusion limit.
On the decrease side, the effect is a little more pronounced (and statistically significant). Hitters who are showing signs of a decreasing strikeout rate the year before tend to have seasons the next year where they show a decrease. There’s consistently about 67 percentage points worth of separation between the groups. Not bad.
We can sharpen our focus a little bit though. I mentioned that I based my determination of whether the trend was up or down, based on the regression coefficient of the line that passed through their moving average graph For those who were changelings who showed a downward trend, I ran a logistic regression using a binary outcome of whether, in the next year, they showed a 10 percent decrease over baseline. I used the slope of the initial tracking average line as a predictor. If in 2012, a hitter was tilting seriously downward, he’s probably a better bet to show a big drop in 2013. And that’s what I found. Players who had steeper lines were more likely to experience a big drop in their strikeout rate in the following year.
But not all is rosy. Now, using 2013 stats, here are the players whom the model predicted would be in line for a big drop in their strikeout rates in 2014. (And remember, there are no magic lists…)
Player 
Model Prediction of Likelihood of Drop in K Rate 
2013 Strikeout Rate 
2014 Strikeout Rate 
Absolute change 
52.6% 
16.0% 
17.1% 
+1.1% 

Marcel Ozuna 
49.1% 
19.6% 
26.8% 
+7.6% 
Yonder Alonzo 
48.8% 
12.5% 
12.5% 
0.0% 
48.1% 
17.4% 
19.9% 
+2.5% 

47.7% 
17.4% 
16.4% 
1.0% 
If a team had looked at this list and this list only, they would have seen one guy who didn’t move his strikeout rate at all, two guys who moved a point, and then Nava and Ozuna who struck out more. Score one for the aggregate, and none for the specific!
The Problem with Lists
There’s a reasonable rebuttal to these analyses. I’ve shown a method that can separate out a group with a 31 percent chance of breaking out (on strikeouts, anyway) vs. a 24 percent chance. Even when we take the top five candidates, we’re talking about a list of five guys who aren’t even 50 percent chances to break out, and the names the model would have told us to focus on ended up being duds. This is hardly the stuff of certainty, but there are no certainties in baseball. Here’s where I think we see the problem with lists. I think that people expect a perfectly discriminant function. This mythical equation will separate the sheep from the goats, the Sharks from the Jets, and the people who like Cincinnati chili from the normal people. This method isn’t it.
We’ve identified one factor that could indicate that someone is ready for some growth. If you make the article about the list, you will be disappointed. If you recognize that we’ve identified one factor (likely of many) that can predict growth, and take this as the starting point for finding others, then you’ll be happy. If you can identify a few more of these, then you can start to build a growth profile of a player. If he has three of these growth factors on his profile, maybe he’s worth taking a risk on. Note the word “risk.” You really should have a preponderance of evidence before taking such a risk, and that sort of understanding requires a deep study of the subject. And maybe (gasp!) talking to the scouts.
Think of this from the point of view of a team. Unlike a fantasy draft, real teams do not have the ability to simply target any player they like. If a team wants to go take a chance on a player, he either has to conveniently be a free agent or be pryable in a trade. And it’s not like teams can trade for 20 players and hope that a few work out like people who are making lists on the internet can. You have to pick a guy or two based on who’s available and hope you get lucky. A breakout is a low frequency event. There’s a certain amount of drawing the magic lottery ticket that goes with it. The best that you might hope for is to have a slightly better and moreinformed list of guys to think about. But I don’t think that we need to throw up our hands and say it can’t be done. It’s just going to take some work.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
http://www.snopes.com/business/names/worldseries.asp
From a subscriber standpoint, one hardly knows how much work goes into the various lists. I recently went back to check several BP writers' 2014 preseason rankings of a few positions. They generally left a lot to be desired, to be courteous. Saul may just be embarking on the road to Damascus, having no idea what light may descend ahead.
#zerohedge