Main | May 2006 »

April 30, 2006

The Effective Use of Simulation with TOC. Part 2 – In Production

On my entry titled Beyond DBR - the need for basic data, I talked about some of the reasons that simple data collection was needed.  I also talked about that it was often not required in the early phases of process of finding bottlenecks, which I’ll call Phase 1.  In Phase 2, the working data collection system generates Basic Data for each workstation that gives us some idea of the current bottleneck location.  This is most effective in simple, serial systems with large buffers and few feeder lines.  As the process to attack bottlenecks becomes more mature and the system becomes more balanced, it typically becomes more difficult to isolate the bottleneck. Here, analysis tools like simulation can be used.  Finally, as the process changes the culture, the desire to know where the bottlenecks will be in the future becomes apparent.

For my purposes here, I’ll assume the reader is familiar with Phase I, and I’ll cover Phase 2 & 4 in future blogs.  In this entry, I’ll talk primarily about Phase 3 – Analysis using simulation tools.
 

1.  Observation – looking at where the flow is interrupted
2.  Basic data – Stand Along Throughput (SAT), Efficiency
3.  Analysis – Using C-More or Simulation
4.  Prediction – Determine where the bottlenecks will be in the future.

There are a few algorithms that can be used with simulation to help predict the throughput improvement that can be generated by improving each workstation.  I’ll be glad to talk to people off line about how to set this up.  For now, let’s assume one of those algorithms has been run, and the bottleneck has identified – and quantified, often an important difference between methods.  The quantified improvement in JPH can be used in financial decisions – is it more cost effective to run overtime, for example, or spend money to improve the performance of the machine?  If the goal of the company is to make money, then quantifying the financial impact of improving the bottleneck ties in very nicely, making it difficult to make sub-optimal decisions.

How is an Analysis using simulation better than the Basic Data method? The basic reason is demonstrated in the dice game, where the student concludes that variation plays a role in throughput.  With the average role being 3.5, many students assume that, after 10 rounds, 35 parts will appear at the end of the production line.  As Alex found out in The Goal, it never does.  The “good” variation (when a player roles a six) does not make up for time when the student experiences “bad” variation, when the student roles a one), since the incoming buffer usually does not have six parts in it when the six is rolled.

With that said, I have found the following conditions influence the location of the bottleneck:

Downtime duration – When looking at two workstations that both have low SAT’s, we may find that they have different downtime characteristics – one fails very often for a very short period of time versus one that does not fail very often, but when it does, it fails for a long time.  In most cases, the later will be the bottleneck, since the buffers can cope with the long downtimes. 

Variation in part cycle time/model mix – Two workstations have the same low SAT, but one has a long cycle time for part A, and a short one for B.  The second workstation has about the same cycle time for each part.  The first is more likely to the bottleneck, especially if the model mix results in a large number of part A’s being send through the workstation in a short amount of time.  As with downtime, the buffers may not be able to recover from the long cycle times, and short cycle times end up causing the bottleneck to be blocked.

Location – Two workstations may have the same low SAT in a production system, but the one that is closest to the end of the line will mostly likely be the bottleneck, especially if large buffers are present throughout the system. The same is true is one station A has more buffer around it than station B.  The larger buffer may absorb variation, thus making station B the bottleneck.

Are these differences worth devoting the time and effort to install a simulation methodology?  My answer is to let the system tell you that.  Use Observation until it no longer is dependable.  Then install data collection, based upon the benefits of finding the bottlenecks more consistently.  The same is true at the simulation phase – buy the simulation system when the Basic Data method proves to be inadequate, and then with a business case.  Many companies, with simple production systems, will never get there, but other larger, more complex systems, will. As with most cases, default towards action – it’s usually better to improve a workstation than to get trapped in “analysis” paralysis.  Improving a non-bottleneck workstation may not improve throughput, but will help make the bottleneck more obvious the next time around.

What has changed that allow simulation to be more viable? At one time, the time and cost to construct a simulation model was prohibitive in production, and was usually only used in the designing of a new system.  With the maturity of simulation products, however, this has changed.  Current simulation packages have stencils to allow for rapid model development of plants that have similar operations.  “Drag and drop” tools allow for rapid model creation.  Finally, connectivity to data bases allows for easy assess to data.  Once set up and tied to a production database, they have the potential of doing “near real-time” analysis, something that was impossible just a few years ago.

The speed of today’s computers also has shortened the analysis time to predict throughput.  A few years ago, computers toiled for hours just to a complete one analysis.  Now, they can be done in minutes, and with C-More, in seconds.  Thus, the time from end of shift to the posting the bottleneck report on the web is just minutes, allowing shift management to make decisions to improve throughput before they go home for the day.

Thus, simulation has become a viable tool for the plant floor, as long as it stands upon the bedrock for analysis – accurate and timely data.

April 28, 2006

The Effective Use of Simulation with TOC. Part 1 - Games

In some of the forums I have been following, there has a noticeable lack of support for simulation as either a manufacturing tool or as a TOC tool.  But my experience with simulation has shown it to be a valuable asset.  Over the next few sessions, I’ve decided to share my experiences with simulation and see what kind of response I get.  The first area I want to talk about it the use of simulation games to help sell TOC and change the current paradigm of the customer.

Why Games?

I have found games to be an effective way to get past the first two layers in the Five (or is it 6? 9?) Layers of Resistance that I learned in Jonah school.  These first two layers are:

Layer 1
Raising problems having one thing in common – it’s out of our hands: vendors do not always deliver, clients change their minds at the last minute, workers at not properly trained...

Layer 2
Arguing that the proposed solution cannot possibly yield the desired outcome.

Using the games described below helped demonstrate that our customers can impact the problem, and the proposed solution will yield the desired outcome.

Production Games
I’ve always felt that using simple simulations as games, as is done in most TOC throughput courses I have sat through, is a great use of the tool.  A simple model is a safe place to challenge the paradigm that you have about how to run a plant.  Play the game once with your methods, and then play it again with mine, and we’ll see which method generates the most profit. There are no excuses and no one else to blame, and if you think you got unlucky, then play the game again. 

At GM, we developed a simple game that uses C-More to help sell the tool and the Throughput Improvement Process.  It was Excel based, and used C-More as the simulation engine.  Players had all the data they could possibly need, and had to pick which workstation and which metric (downtime, speed, failure frequency, etc.) that they wanted to improve. 

They generally did fine in the beginning, when the bottleneck was obvious and the data clearly showed its location.  But after a few rounds, the fixes started to have no impact on throughput, meaning they had chosen the wrong bottleneck to improve (or, in some cases, the wrong metric to improve on the bottleneck). 

In the second round, they had access to the bottleneck report as generated by C-More.  If we had time, we let them “test” each one of the possible fixes, to help them decide which fix to install.  Their throughput rates at the end of this round were significantly higher, and we made the point to calculate the difference in revenue (assuming the product was in demand) between their method and using C-More.  Finally, we compared that increased revenue to the combined salaries in the room! 

Design Games
We played a slightly different version of the production game to help engineers and process managers change their view about how to design a plant.  In the previous version, the players where not allowed to add a workstation or change the number of buffers. In the design game, the players are not allowed to change the downtime characteristics of the machine, but can choose from five different types of machines for any process, have as many machines as they want, have any buffer size they want, can design the configuration of the line, etc.  Again, we play it first using their methods, which is usually very chaotic.  When we teach the class to a room full of two-player teams, we rarely get consensus on choosing one type of machine for any process step.   Results are all over the board, but in the end, no design meets the original criteria for success.

The second time, we use TOC methods to select which machines to use, the number, the buffer sizes, etc.  Surprisingly to the lean folks, the design comes out with less investment and higher throughput.  The resulting change in ROI is impressive, and usually results in a call for using simulation to help design the next set of manufacturing processes.  Once again, we compare the difference in results for this simple game to the combined salaries in the room, to make sure they understand the magnitude of the problem and the impact of the solution.

Finally, having only starting doing critical chain work, I am surprised that I have not run across a simple simulation game for CC that does the above.  I have seen a few simple tools that demonstrate the difference, and perhaps that is enough.  Or perhaps there is not a perceived need for a game to change the paradigm of the customer.  Or, there just might be one out there that I don’t know about. It’s something I would like to explore further.

Kevin

April 27, 2006

Measures drive Behaviors

One of the most fascinating processes that I had a chance to study at one of the companies I worked for was their design process for new products.  I had a chance to look at it from the “design factory” perspective, and I was working with a few experts that have a strong background in critical chain implementation.  Although we did not have a chance to finish the project, the study itself was very worthwhile.  I could go through all the undesirable effects, and put up a very crowded current reality tree, but I think the root cause issue came down to how each task was being measured. 

The major metric, so to speak, was having your task deliverable done on-time.  It was also clear, as we looked at the process, that there was really no way to validate from the outside that a task had been completed satisfactorily.  It only became apparent that there was a problem when the next engineer tried to use the deliverable to complete the next task. 

And, as with many projects, the product was constantly changing.  At this company, we saw that if an engineer completed their task early, they generally did not deliver the task until the due date.  They used the time between completion and due date as a buffer - in case a change occurred to the product while they had the task. The bottom line is that the system could not take advantage of any positive variation. The task was completed on-time, at best.

Of course, if something went wrong, and you could not complete a task on-time, most engineers still reported that their task as completed on-time.  With so much pressure from management to complete a task on-time, and no method to validate the quality of the completed task, reporting it as complete seemed to be a small risk.  If it was only slightly late, chances are that you could complete the task before the next engineer realized that there was a problem.  Savvy engineers already had their excuses in place in case the issue became apparent to management:

"Well, the previous engineer did not deliver their task to me on-time, and they reported it to me as complete, so that's what I did."

"I thought it was more important to deliver what I had completed to the next engineer so that they could get started, while I finished up the details of the task."

And many others. The managers that had the task of monitoring the flow of work through the design pipeline started to see a common pattern.  Early on, all programs reported that everything was "green," and that all of the tasks had been completed on-time.  At some point, one of the engineers finds themselves at the due date of his task, with no inputs.  They usually have no choice but to report a problem, and thus the project goes from on-time to six months late.

Top management looks at all the people and investment involved in the design process and can't believe it.  They decide to take one of the projects they have a lot of passion for and make sure it gets through the design factory on-time.  Suddenly, the priority is clear, and engineers will drop other projects (which may be more profitable in the company) to make sure they don't get their ass in a sling over this task.  If they get done early, they send the deliverable to the next engineer, to get the “hot potato” out of their hands. The design gets done on-time, and senior management concludes that it must be due to a lack of a sense of urgency on the part of middle management.  Combined with poor financial performance and the need to cut costs and eliminate non-productive resources, the inevitable reorg follows.  In this company, the mean time between reorgs (MTBR?) is about two years.

And so it goes on, beginning to look like the downward spiral discussed in Malcolm Gladwell’s The Tipping Point.  The key lesson, I think, is to understand the pros and cons of the major metrics you are using to evaluate your people and processes.  Tie the metrics into Net Profit, ROI or Cash Flow, if possible.  Overall, it always appears to be that is the easy metrics (efficiency, on-time delivery, etc.) are the ones that seem to give us the most trouble.  Also, the ability to accurately validate that measure (with an outside party, if necessary) seems important as well.  As Eli has often said, “Tell me how you are going to measure me, and I will tell you how I will react.”

And why did we not have a chance to finish the project?  Well, there was this reorg right in the middle of it…

Beyond DBR - the need for basic data

My experience at GM has colored my view towards Drum-Buffer-Rope.  We often used the basic methods in the Goal to help find the first set of “obvious” bottlenecks in a plant (although they were rarely obvious to the plants themselves). Fairly quickly, however, the ability to use these methods become obsolete, and our correlation between improving a station and improving overall throughput began to drop.

GM has the advantage, however, of using C-More, a tool that will take a very basic set of data and tell you the location of the bottleneck.  Accurate and fast, the first versions of this tool were not a simulation, but an algorithm that quantified the impact, in JPH, of the bottleneck.

The need for data drove us to design a very basic template for the PLC (programmable logic controller) programs that run the workstations.  Over time, we found we only needed a small set of data from each workstation to drive C-More accurately. Armed with this information, we then used the TIP (Throughput Improvement Process) to begin the process of improvement. Focusing this process on our most profitable products generated a lot of revenue for GM, and when I left, there were no plants that could not make the enough products to meet demand.

The difference between most of the other TOC efforts to improve throughput and the one we used in GM is the data and analysis part.  I became a strong believer in the leverage this data has on improving the organization, now and in the future.  But I do not get the impression that others in TOC world value this data, perhaps for many valid reasons.  Perhaps the difference in viewpoint can be attributed to how a vehicle assembly plant is different than most other manufacturing plants:

Buffer Sizes - Certainly assembly plants, with their small buffers for vehicles (think of the space 10 full size SUV’s consume), are different than the buffers for say, camshafts and pistons.  These smaller buffers can lead to more starving and blocking than the large ones that can be used in other manufacturing plants.  However, between major departments, there can be hundreds of vehicles being stored in overhead or storage buffers.  The amount of time blocking and starving occurs from these buffers can usually be used to determine in which department the bottleneck is currently located.

Balanced Cycle Times - The emphasis in lean manufacturing on getting balanced takt times also makes finding the bottleneck without data difficult. Balanced workstations cycle times means that it takes a long time to refill a buffer after a long downtime occurrence.

Uneven schedules & work practices – Working through lunch & breaks, overtime, stripping lines at the end of a shift, etc., are just a few of the work practices that fill or empty buffers and change the dynamics of blocking and starving.  The reasons these practices are used are not to make more jobs overall, but to improve other, local, measures for a group or department.

Large number of workstations – in a typical assembly plant department, there are hundreds of workstations.  Given the other factors listed above, this also hinders rapid identification of the bottleneck.

So, data collection became a way of life in GM, resulting in the formation of its own group and a dedicated implementation team.  I firmly believe that it can play a role in a plant that matches some of the conditions listed above, or in a mature DBR process that is beginning to doubt the location of their bottleneck.
 
The down side usually given is the time and cost to install, but common templates and the small amount of data required did not make that an issue for GM.  Typically, the installation process began while the Goal methods were being implemented, so that by the time the bottleneck became harder to find, C-More could use the data from the recently installed system to accurately locate it.

For plants in this situation, I think there are solutions available today in the market that can do just as good a job (or better!) than what we developed at GM.  But before I get to that part of the blog, I’d like to hear other viewpoints. What do you think?

Kevin 


Hosting by Yahoo!