2022.01.19 01:55

Software failure rate prediction

Then we approach to the order of 75—80 which would be very realistic. As fatigue or wear-out occurs in components, failure rates increasing high. Power wear-out supplies is usually due to the electrical components breakdown that are subject to physical wear and electrical and thermal stress.

A product with a MTBF of 10 years can still exhibit wear-out in 2 years. The wear-out time of components cannot predict by parts count method. Electronics in general, and Vicor power supplies in particular, are designed so that the useful life extends past the design life. This way wear-out should never occur during the useful life of a module.

There are two major categories for system outages: 1. Unplanned outages failure and 2. Planned outages maintenance that both conducted to downtime. In terms of cost, unplanned and planned outages are compared but use the redundant components maybe mitigate it. The planned outage usually has a sustainable impact on the system availability, if their schematization be appropriate.

They are mostly happen due to maintenance. Some causes included periodic backup, changes in configuration, software upgrades and patches can caused by planned downtime. This downtime period can spent lots of money. Specification and design flaws, manufacturing defects and wear-out categorized as internal factors. The radiation, electromagnetic interference, operator error and natural disasters can considered as external factors.

However, a well-designed system or the components are highly reliable, the failures are unavoidable, but their impact mitigation on the system is possible. The most common ways that failure rate data can be obtained as following: Historical data about the device or system under consideration.

Many organizations register the failure information of the equipment or systems that they produce, in which calculation of failure rates can be used for those devices or systems. For equipment or systems that produce recently, the historical data of similar equipment or systems can serve as a useful estimate. The available handbooks of failure rate data for various equipment can be obtained from government and commercial sources. MIL-HDBKF, reliability prediction of electrical equipment, is a military standard that provides failure rate data for many military electronic components.

Several failure rate data sources are available commercially that focus on commercial components, including some non-electronic components. The most accurate source of data is to test samples of the actual devices or systems in order to generate failure data.

This is often prohibitively expensive or impractical, so that the previous data sources are often used instead.

The different types of failure distribution are provided in Table 2. For other distributions, such as a Weibull distribution or a log-normal distribution, the hazard function is not constant with respect to time.

This section shows the derivations of the system failure rates for series and parallel configurations of constant failure rate components in Lambda Predict. Consider a system consisting of n components in series. For this configuration, the system reliability, Rs, is given by [ 4 ]:. Note that since the component failure rates are constant, the system failure rate is constant as well.

In other words, the system failure rate at any mission time is equal to the steady-state failure rate when constant failure rate components are arranged in a series configuration. It should be pointed out that if n blocks with non-constant i. Consider a system with n identical constant failure rate components arranged in a simple parallel configuration. For this case, the system reliability equation is given by:.

Notice that this equation does not reduce to the form of a simple exponential distribution like for the case of a system of components arranged in series. In other words, the reliability of a system of constant failure rate components arranged in parallel cannot be modeled using a constant system failure rate model.

To find the failure rate of a system of n components in parallel, the relationship between the reliability function, the probability density function and the failure rate is employed. The failure rate is defined as the ratio between the probability density and reliability functions, or:. Because the probability density function can be written in terms of the time derivative of the reliability function, the previous equation becomes:.

Thus, the failure rate for identical constant failure rate components arranged in parallel is time-dependent. Taking the limit of the system failure rate as t approaches infinity leads to the following expression for the steady-state system failure rate:. So the steady-state failure rate for a system of constant failure rate components in a simple parallel arrangement is the failure rate of a single component.

It can be shown that for a k-out-of-n parallel configuration with identical components:. Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3. Help us write another book on this subject and reach those readers. Login to your personal dashboard for more detailed statistics on your publications.

Edited by Aidy Ali. We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals.

Downloaded: Abstract Failure prediction is one of the key challenges that have to be mastered for a new arena of fault tolerance techniques: the proactive handling of faults. Keywords failure component analysis reliability probability. Introduction Failure prediction is one of the key challenges that have to be mastered for a new arena of fault tolerance techniques: the proactive handling of faults. At first we define common words related to failure rate: Failure A failure occurs when a component is not available.

Error In reliability engineering, an error is said a misdeed which is the root cause of a failure. Fault In reliability engineering, a fault is defined as a malfunction which is the root cause of an error. Table 1. Components failures during use hours. Mean time to repair MTTR Mean time to repair MTTR can described as the total time that spent to perform all corrective or preventative maintenance repairs divided by the total of repair numbers.

Four failure frequencies are commonly used in reliability analyses: Failure Density f t - The failure density of a component or system means that first failure what is likely to occur in the component or system at time t. Constant failure rates If the failure rate is constant then the following expressions 6 apply:. But MTTR may not be the identical as MDT because: Sometimes, the breakdown may not be considered after it has happened The decision may be not to repair the equipment immediately The equipment may not be put back in service immediately it is repaired If you used MDT or MTTR, it is important that it reflects the total time for which the equipment is unavailable for service, on the other hands the computed availability will be incorrect.

Useful to remember If an item works for a long time without breakdown, it can be said is highly reliable. Early life period To ensure the integrity of design, we used many methods. Useful life period The maturity of product is caused that the weaker units extinct, the failure rate nearly shows a constant trend, and modules have entered what is considered the normal life period.

MTBF vs. Wear-out period As fatigue or wear-out occurs in components, failure rates increasing high. Failure sources There are two major categories for system outages: 1. Another categorization can be: Internal outage External outage Specification and design flaws, manufacturing defects and wear-out categorized as internal factors.

Failure rate data The most common ways that failure rate data can be obtained as following: Historical data about the device or system under consideration. Government and commercial failure rate data. Testing The most accurate source of data is to test samples of the actual devices or systems in order to generate failure data.

Failure distribution types The different types of failure distribution are provided in Table 2. Measuring software reliability remains a difficult problem because we don't have a good understanding of the nature of software.

There is no clear definition to what aspects are related to software reliability. We can not find a suitable way to measure software reliability, and most of the aspects related to software reliability.

Even the most obvious product metrics such as software size have not uniform definition. It is tempting to measure something related to reliability to reflect the characteristics, if we can not measure reliability directly. The current practices of software reliability measurement can be divided into four categories: [RAC96]. Software size is thought to be reflective of complexity, development effort and reliability.

But there is not a standard way of counting. This method can not faithfully compare software not written in the same language.

The advent of new technologies of code reuse and code generation technique also cast doubt on this simple method. Function point metric is a method of measuring the functionality of a proposed software development based upon a count of inputs, outputs, master files, inquires, and interfaces. The method can be used to estimate the size of a software system as soon as these functions can be identified. It is a measure of the functional complexity of the program.

It measures the functionality delivered to the user and is independent of the programming language. It is used primarily for business systems; it is not proven in scientific or real-time applications. Complexity is directly related to software reliability, so representing complexity is important.

Complexity-oriented metrics is a method of determining the complexity of a program's control structure, by simplify the code into a graphical representation. Representative metric is McCabe's Complexity Metric. Test coverage metrics are a way of estimating fault and reliability by performing tests on software products, based on the assumption that software reliability is a function of the portion of software that has been successfully verified or tested.

Detailed discussion about various software testing methods can be found in topic Software Testing. Researchers have realized that good management can result in better products. Research has demonstrated that a relationship exists between the development process and the ability to complete projects on time and within the desired quality objectives.

Costs increase when developers use inadequate processes. Higher reliability can be achieved by using better development process, risk management process, configuration management process, etc. Based on the assumption that the quality of the product is a direct function of the process, process metrics can be used to estimate, monitor and improve the reliability and quality of software. ISO certification, or "quality management standards", is the generic reference for a family of standards developed by the International Standards Organization ISO.

The goal of collecting fault and failure metrics is to be able to determine when the software is approaching failure-free execution. Minimally, both the number of faults found during testing i. Test strategy is highly relative to the effectiveness of fault metrics, because if the testing scenario does not cover the full functionality of the software, the software may pass all tests and yet be prone to failure once delivered.

Usually, failure metrics are based upon customer information regarding failures found after release of the software. The failure data collected is therefore used to calculate failure density, Mean Time Between Failures MTBF or other parameters to measure or predict software reliability.

Software Reliability Improvement Techniques Good engineering methods can largely improve software reliability. Before the deployment of software products, testing, verification and validation are necessary steps.

Software testing is heavily used to trigger, locate and remove software defects. Software testing is still in its infant stage; testing is crafted to suit specific needs in various software development projects in an ad-hoc manner. Various analysis tools such as trend analysis, fault-tree analysis, Orthogonal Defect classification and formal methods, etc, can also be used to minimize the possibility of defect occurrence after release and therefore improve software reliability.

After deployment of the software product, field data can be gathered and analyzed to study the behavior of software defects. Software Reliability is a part of software quality. It relates to many areas where software quality is concerned. The initial quest in software reliability study is based on an analogy of traditional and hardware reliability. Many of the concepts and analytical methods that are used in traditional reliability can be used to assess and improve software reliability too.

Software fault tolerance is a necessary part of a system with high reliability. It is a way of handling unknown and unpredictable software and hardware failures faults [Lyu95] , by providing a set of functionally equivalent software modules developed by diverse and independent production teams. The assumption is the design diversity of software, which itself is difficult to achieve. Software testing serves as a way to measure and improve software reliability.

It plays an important role in the design, implementation, validation and release phases. It is not a mature field. Advance in this field will have great impact on software industry. As software permeates to every corner of our daily life, software related problems and the quality of software products can cause serious problems, such as the Therac accident. The defects in software are significantly different than those in hardware and other components of the system: they are usually design defects, and a lot of them are related to problems in specification.

The unfeasibility of completely testing a software module complicates the problem because bug-free software can not be guaranteed for a moderately complex piece of software. No matter how hard we try, defect-free software product can not be achieved.

Losses caused by software defects causes more and more social and legal concerns. Guaranteeing no known bugs is certainly not a good-enough approach to the problem.

Software reliability is a key part in software quality. The study of software reliability can be categorized into three parts: modeling, measurement and improvement. Software reliability modeling has matured to the point that meaningful results can be obtained by applying suitable models to the problem. There are many models exist, but no single model can capture a necessary amount of the software characteristics. Assumptions and abstractions must be made to simplify the problem.

There is no single model that is universal to all the situations. Software reliability measurement is naive. Measurement is far from commonplace in software, as in other engineering field.

Software reliability can not be directly measured, so other related factors are measured to estimate software reliability and compare it among products.

Development process, faults and failures found are all factors related to software reliability. Software reliability improvement is hard.

Five of the most widely used Reliability Prediction standards for reliability analysis. It was originally developed and published for use by the Department of Defense. The Part Stress section leads off the document and includes a number of equations that predict the failure rate for a wide variety of electrical components. The factors in the equation are various operating, rated, temperature, and environmental conditions of the device in the system. For this above equation, the following list describes the variables:.

The equations, the variables, and the data parameters needed vary for all the different components modeled. The Parts Count reliability prediction is useful in early design stages when the design is still in progress and not all operating parameters are known. Parts Count predictions do not require as many data parameters for analysis compared to Part Stress predictions. By using Parts Count models, you can obtain early failure rate assessments and then refine them as your product design evolves and is finalized.

In many cases, Parts Count is used to start a Reliability Prediction analysis. Then, as the product design becomes more solidified and data parameters are established, the Parts Count prediction is moved over to Part Stress, maintaining all the data already entered during the Parts Count assessment.

Another widely used and accepted Reliability Prediction standard is commonly referred to as Telcordia SR Early on, Telcordia was referred to as the Bellcore standard. The Telcordia standard has also been through several updates and revisions, which are designated by the Issue Number.

Today, Telcordia is commonly used in the commercial sector. However, its use over the years has become widespread. It is now used throughout a broad range of industries, including those related to military and defense applications. Example Telcordia formulas to compute the black-box steady state failure rate of a device are:. Once the device level black-box steady state failure rates are determined, the unit level and system level failure rates can be calculated.

Using the black-box steady state failure rates as a basis, the Telcordia standard includes additional methodologies for augmenting failure assessments by taking into account other data that may be available about the devices, units, or systems under analysis. This additional information is not required, but can be used if available to adjust failure rates to reflect actual product performance.

Essentially, real-world data available can be used to further refine the estimated failure rate values. It should be noted that any of this additional data is not required to perform a reliability prediction based on the Telcordia standard.

It is up to the analyst to determine if any of this additional data is available and if it is helpful to include in the reliability prediction analysis.

In some cases, Telcordia analyses are initially performed to obtain the black-box steady state failure rates, and then updated as laboratory, field, and burn-in data become available. The failure rate models of Plus have their roots in MIL-HDBK, but have enhancements to include the effects of operating profiles, cycling factors, and process grades on reliability.

An example equation for capacitors in Plus Notice 1 is:. The equations, the variables, and the data parameters vary based on the specific device being modeled. Once the device failure rates are evaluated, they are summed up to determine a base system failure rate. At this point, further analysis can be done at the system level if more data about the system is available, such as test or field data.

By factoring in this information, the Plus analysis will provide a more accurate predicted failure rate estimation. At the system level, Plus can incorporate environmental stresses, operating profile factors, and process grades. If this data is not known, default values are used.

lanmapare1978's Ownd

0コメント

1000 / 1000