Challenges and Opportunities in

    Implementing Speech Recognition

              in a Medical Enterprise

By Leonard A. Phillips 

Copyright © 2000. All rights reserved.

 

The integration of speech recognition with Electronic Medical Record systems can significantly enhance the efficiency of record generation and improve overall productivity in the healthcare environment.

 

CONTENTS 

       I. Introduction: What Is Speech Recognition and How Is It Used?……………………..…     2

     II. Summary of Benefits of Speech Recognition for Medical Professionals……………..…      4

    III. Challenges of Implementation and Use………………………………………………..…          6

    IV. Pilot Trial Rationale and Methodology…………………………………………………..          8

     V. Determining Return on Investment from Speech Recognition……………………….…      10

    VI. Sample Worksheet: Return on Investment for Implementing Speech Recognition

           in a Medical Enterprise……………………………………………………………….....            12

     _______________________________________________________

I. Introduction: What Is Speech Recognition and How Is It Used?

Speech Recognition—Fundamentals

Speech recognition technology enables users to communicate with computers and other speech-enabled devices by voice. [1] Typical products that use speech recognition range from software that produces formatted transcriptions of dictated text to devices that respond to spoken commands.

In office environments, transcription software applications provide a practical alternative for keyboarding text and data into a computer, which can be a challenge to people who cannot type well or at all, or prefer not to type. For some devices and in some environments, speech may be the only practical input method, or may provide benefits that are superior to alternatives.

Typical Users

Potential users of speech recognition include virtually anyone who is able to speak and who uses computers to generate documents and communications. Speech recognition products specifically designed for use by medical professionals can be used to produce Electronic Medical Records (EMRs) almost instantly—patient records, medical reports, lab requests—as well as insurance forms, operations reports, and general correspondence. Some healthcare user profiles include:

Executives, administrators, clinicians, researchers, and other medical professionals. A growing number of medical professionals work with minimal administrative staff. Many have never learned to type or are uncomfortable with a keyboard, and can create text at higher rates of speed by dictating than by keyboarding.

Small practices. Typically with minimal staff, medical professionals in small practices may find that dictating documents reduces the time needed for paperwork. The resulting free time can be used for additional patient care, research, education, and other primary responsibilities.

Mobile professionals. People who are frequently away from their desks can benefit from the convenience of recording dictation on a portable recording device for subsequent automatic transcription on a computer.

People with disabilities. Speech recognition technology is a practical and viable alternative for personnel who find it difficult or impossible to keyboard due to disability or injury.

Senior citizens. Keyboarding may pose special difficulty for older users who have never learned to type, suffer from arthritis, or tire easily. Speech recognition technology enables them to dictate letters and e-mail, or Web addresses and hyperlinks, and eliminate the physical work of keyboarding.

Fundamentals Summary

Speech recognition provides a practical alternative for keyboarding text and data into a computer, which can be a challenge to people who cannot type well or at all, or prefer not to type. In some environments and situations, speech may be the only practical input method, or may provide benefits that are superior to alternatives.

 

II. Summary of Benefits of Speech Recognition for Medical Professionals

1.  Increased productivity—Speech recognition technology allows users to immediately produce EMRs and/or typeset hardcopy of patient notes, reports, lab requests, invoices, and correspondence. They can get more done in less time, freeing them up to focus on the primary tasks at hand. This time is gained by dictating into computers, which produces text faster than by traditional keyboarding. Users can create, edit, and format documents, send e-mail, access and update records, and navigate their desktop and the Web by speaking at a normal pace.

2.  Economy—Eliminate the need for transcription services. Electronic and hardcopy transcriptions of dictated reports, letters, and other documents can be produced accurately and rapidly by a computer. In comparison, manual transcription incurs additional costs and turnaround time, requires skilled transcriptionists, and risks misreading of handwritten materials as well as the potential loss of the original recording or handwritten document in transit. [2]

3.  Speed—Speech recognition generates text faster than keyboarding. The best typists average 50 to 90 words per minute. In contrast, text can be dictated successfully for automatic transcription at 100 to 180 words per minute—twice as fast as typing, on average, and with little or no physical effort.

In addition, a string of computer commands can be combined for activation by a single word or brief phrase called a "macro" command to perform complex tasks. Using speech-enabled macro commands, the time required to perform usually lengthy tasks can be reduced to seconds. Applications can be launched, for example, by saying, "start Word." Boilerplate text of virtually any length can be inserted by saying, for example, "standard company description." The combination of using templates to create, navigate, and complete forms and calling them up by saying, "patient record John Smith" or "invoice Jane Jones" can dramatically reduce the time it takes to produce and revise such documents.

4.  Accuracy—Handwritten notes can be minimized, along with the concomitant possibility of misinterpretation and potentially serious consequences. Transcribed text can be reviewed on screen as an accuracy check. Transcription accuracy with leading speech recognition products can be very high. Users can train software in minutes to accurately recognize their voice. Overall accuracy improves with software that "learns" by accumulating corrections in "user" files. To facilitate accuracy, some products include extensive vocabularies that can be customized by users, and include specialized terms, phrases, abbreviations, and acronyms. Typically, when software "misrecognizes" dictation, the result is syntactically obvious, and the misrecognition can be easily detected and corrected. Recognition accuracy of 99 percent has been documented in third-party evaluations of over-the-counter products. [3]

5.  Mobility—Documents can be created virtually anywhere by means of dictation into a high-quality recording device and subsequent transcription on a computer. This capability provides substantial productivity benefits for users who need to generate text when away from their computer, for example on rounds or on the road. In situations requiring uninterrupted vision and physical involvement, such as in a car, or on a plane or train, dictation into a lightweight mobile recorder can be more convenient and faster than using a laptop or handheld computer.

6.  Uniformity—When used across an enterprise, system administrators can ensure that standards and generally accepted professional and institutional practices are supported throughout a workgroup by distributing only standardized macros and vocabularies over the network. Standardized macros can help ensure adherence to medical protocols and standards throughout the enterprise. With such centralized administrative control, speech recognition can help organizations improve accuracy, consistency, and legibility and thereby reduce the liability and risks associated with nonconformance.

7.  ADA compliance—In the U.S.A., speech recognition technology can be a "reasonable accommodation" by employers for disabled but qualified personnel—a stipulation of the Americans with Disabilities Act of 1990. Such users may be sight-challenged, have limited dexterity due to physical disability, arthritis, or Repetitive Stress Injury (RSI) such as carpal tunnel syndrome or tendinitis. Speech recognition can enable users impaired with disabilities to dictate into their computers hands-free and eyes-free, and also eliminate the possibility of RSI discomfort or injury from repetitive keyboard and mouse manipulation.

8.  Expedience. Speech recognition can be a valuable asset in the patient care environment, for example, in emergency medical situations when uninterrupted vision and physical involvement is critical. Information can be accessed and documents can be created by voice, hands-free and eyes-free. In a managed care setting, physicians can immediately access a patient's electronic medical record with a few words, facilitating communications with the patient's HMO for pre-certification of additional tests and prescriptions, and reducing delays in patient care.

9.  Reduced stress—Speech recognition can provide significant physical and psychological benefits. Documents can be produced in less time and with significantly less physical effort and stress than by keyboarding.

10.  Convenience—Speech recognition installed on a network allows users to dictate into any configured computer on the network, creating documents in real time virtually wherever they are in an enterprise. After transcription by the computer, an electronic record is available that can be immediately printed out, e-mailed, or linked to a Web page. If they wish, users can also play back their digitized dictation in their own voice or save it as a .wav file.

Speech-Enabled Applications

Many existing software applications can be speech-enabled with software development kits (SDKs), substantially enhancing their power and value. Even sophisticated speech features can be readily incorporated into existing applications with state-of-the-art software developer toolkits.

Benefits Summary

Speech recognition technology can provide benefits including: rapid generation of EMR documents, productivity enhancements that free up more time for primary tasks at hand, reduction of transcription cost and delay, accuracy, mobility, uniformity, ADA compliance, an expedient means of computer input in environments requiring uninterrupted vision and physical involvement, reduction of stress, and convenience.

 

III. Challenges of Implementation and Use

Environmental Constraints

Speech recognition applications require end-users to speak audibly, possibly for extended periods. In general, environments that are appropriate for telephone calls are likely to be appropriate for speech recognition, but some issues arise that are unique to speech recognition:

·    Poor acoustic conditions, such as high ambient noise levels or high echo levels, can compromise recognition accuracy. Although such problems may be mitigated by the use of microphones, microphone headsets, or recorders that are directional and/or noise canceling, results will vary on a case-by-case basis.

·    When ongoing dictation is suddenly interrupted by telephone calls or unexpected visitors, the speaker must become accustomed to turning off the input microphone to prevent irrelevant content entering the active document.

·    Depending on workplace layout, audible dictation could be a simple nuisance to those nearby, or convey information not meant to be overheard.

User Skill Requirements

End-users must be familiar with the correct use of the particular software and headset, microphone, or recorder. They must also be able to speak thoughts clearly and fluently in a language supported by the speech recognition application. For some people, expressing themselves extemporaneously will be more difficult than for others, and a period of time will be required to make the shift from the keyboard. Occasionally, the acoustic properties of a particular voice may not yield the desired recognition accuracy.

Hardware Specifications

Computer systems (including network components, wiring or optical fiber cabling) must be configured to meet or exceed the requirements specified by the application manufacturer. The quality of input speech is critical to good recognition accuracy so that microphones or recorders should be approved by the speech recognition software manufacturer.

Because speech recognition software places high demands on computer systems and networks, minimum specifications may be insufficient if additional applications or network traffic is to run simultaneously with speech recognition. For example, networked speech recognition applications can require downloading 10–20 MB speech files to client PCs and then uploading them to a speech server. As a result, the introduction of speech recognition to a network that is already experiencing high traffic flow may require upgrades in wiring, cabling, hardware, and/or software.

Training

Training is very helpful for expediting optimal success with speech recognition. While end-users can train themselves to some level of proficiency by using product documentation and tutorials, instruction from trainers certified by the speech recognition software manufacturer is more likely to develop mastery of the product. Leading speech recognition software manufacturers have training professionals on staff who may instruct end-users and also certify trainers in customer, Value-Added Reseller (VAR), and independent training provider organizations. Additional support may include product documentation, manufacturers' scheduled training classes and published curriculum materials, telephone and online support, e-mail support, newsletters, and online forums.

There are three general types of training:

1.  Training end-users. Users are trained to use the software and headset, microphone, or recorder correctly. This requires learning some basic spoken commands and becoming familiar with the speech user interface. Training by instructors certified by the manufacturer is advisable.

2.  Training the software. Large vocabulary, continuous speech recognition software is "user dependent." Each end-user must train the software by creating a "speech" file that the computer associates with his/her dictation to produce the intended words and phrases. Creating custom vocabularies requires access to source documents. End-user expertise in the application to be speech-enabled is required and assumed.

3.  Training the trainers. Leading speech recognition product manufacturers maintain a staff of highly skilled professional trainers to train and certify new trainers in VAR or customer organizations.

Customization

Some speech recognition applications include features that allow end-users, consultants, or IT departments to create custom voice macro commands and custom vocabularies, and to personalize the standard application commands that came with the software. It is prudent to have such customization performed or overseen by consultants or trainers from, or certified by, the application manufacturer and at least prototyped before the software is deployed to end-users.

Standalone or Network Solution?

Standalone installations are adequate for individual end-users or small offices. However, the most efficient way to use speech across a workgroup is by using a network solution. Users train the software and create their own speech files on any properly configured PC on the network. Their speech files are maintained on a shared network server and can be downloaded as needed at any client node. Users need only train once and can then move from machine to machine, including creating records or documents on one and editing them on another. In a medical setting, patient care can be improved because clinical healthcare providers can quickly access patient information at any properly configured computer on the network by speaking a simple command such as “patient record John Smith

Networked speech recognition applications can do more than provide the convenience of ubiquity and legibility. Applications that enable system/network administrators to download user voice files, custom vocabularies, and standardized macros and templates to clients from a central server can ensure that generally accepted professional, corporate, or institutional practices and industry standards are supported throughout the organization. Such centralized administrative control can help organizations improve consistency and thereby substantively help avoid liability and risks associated with professional, industry, or statutory nonconformance.

Maintenance and Upgrades

Provision for user and instructor training, technical support, and product upgrades should be included with a purchase contract. Some manufacturers or VARs provide upgrade assurance options, so customers can contract in advance for future software upgrades and interim releases. This provides a way to budget for anticipated technology advances.

Open License Programs for Large User Bases

An open license program allows multi-user organizations to streamline the purchase and deployment of software. In one typical open license program, a point value or "unit of measure" (UOM) is assigned to each product in a product line. A series of pricing thresholds or levels are established by the manufacturer as a function of product UOM values multiplied by the quantity of each product purchased on the initial purchase order. On the basis of this initial purchase, customers qualify their own pricing level, which applies to all orders and reorders for a year. Pricing levels are reviewed annually. Savings can be substantial and renewal contracts are only required annually.

Challenges Summary

Implementing speech recognition can involve a number of challenges and decisions, including environmental constraints, user skill requirements, hardware specifications, training, customization, maintenance and upgrades, and questions related to the purchase decision itself.

 

IV. Pilot Trial Rationale and Methodology

Purpose of a Pilot Trial

A small-scale evaluation effort called a "pilot" trial allows an organization to test the performance and quantify the value of speech recognition in a real setting. The results of a pilot can be used to confirm expectations before making a broader commitment to speech recognition. Pilot trials typically last for 4 to 8 weeks.

Phase 1: Needs Assessment/Workflow

1.      A core pilot implementation team is established, consisting of representatives from the VAR organization, the manufacturer's pilot support personnel (sales representative, field engineer), and motivated personnel within the client organization.

2.      The scope and structure of the pilot trial are developed:

a.     All areas of the organization within which speech recognition may be used are identified. Noncritical areas suitable for pilot testing are identified and prioritized.

b.     The company and team establish the goals that will be used to evaluate the results of the speech recognition pilot.

c.      Weighted criteria for evaluating the success of the pilot in achieving each goal and providing the basis for a final decision are established. Specific, quantifiable parameters are agreed upon to evaluate the performance and value of the pilot in terms of calculating the likely return on investment (ROI) of deploying speech recognition across the entire organization.

d.     A timeline is created showing the milestones and responsibilities.

e.     A commitment is made to hold a weekly forum meeting or conference call with the client to review status and plans for following week. A pilot progress form is created, to be revised weekly with previous week achievements and next-week goals.

f.       Participants are selected for the pilot who would also be its internal evangelists.

g.     Pilot implementation team member(s) meet with each pilot participant and plot workflow. Each participant's skill levels are assessed and the training curriculum is adjusted appropriately.

h.     A technical support plan is established to assure the availability of support for the users.

i.       The pilot implementation team decides which applications, templates, and macros should be speech-enabled and which vocabularies should be built. Key parameters may include potential productivity improvement, frequency of use throughout the organization, the value of enhanced consistency in the context of adherence to practice/corporate guidelines, and the basis for a pilot implementation timeline.

j.       A final timeline for pilot implementation with specific milestones for development, training, deployment in a noncritical area, testing, and evaluation is established.

Phase 2: Pilot Testing

1.      The selected applications, templates, and macros are speech-enabled and custom vocabularies are built.

2.      The speech-enabled applications, templates, and macros and custom vocabularies are deployed to pilot end-user participants.

3.      Training of participants is implemented.

4.      Pilot testing is launched in the designated noncritical areas. Direct involvement by a manufacturer's representative in conjunction with the VAR throughout this phase is desirable to ensure success. End-users involved in the pilot should be observed during basic software training, while using voice for their applications. They should be given tips on improving speed and accuracy as well as other help as needed. Frustrations and questions should be addressed promptly. Customizations should be refined as necessary. Milestones should be monitored, documented, and modified as appropriate.

Phase 3: Post-Pilot Evaluation and Decision

1.      Pilot performance is evaluated, the qualitative benefits of full-scale deployment are determined, and the ROI is calculated.

2.      The purchase decision is made.

3.      If the decision is to move forward with speech recognition, full-scale deployment, training, maintenance, support, and upgrade programs are implemented.

 Pilot Summary

A limited-scale, pilot trial of speech recognition enables a selected group of users in a prospective customer organization to test the performance and quantify the value of speech recognition in a real setting. The results are then used as the basis for larger scale implementation.

 

V. Determining Return on Investment from Speech Recognition

The value of speech recognition to an organization may be considered in several strategic contexts, within which the determination of ROI is subject to specific workplace and industry factors. Some of these contexts and factors include:

·    Transcription applications—The calculation of the value of a transcription application, whether off-the-shelf or developed by speech-enabling an existing application, as an alternative to manually transcribing or keyboarding documents is reasonably straightforward. However, beyond savings from the reduction of transcription service costs and delays, there are a number of other, nonintuitive factors to consider in evaluating ROI. These factors include the freeing up of time to perform primary tasks, enhanced conformance with workplace-related statutes and guidelines, and gains in overall productivity.

·    Human Resources-related benefits—As noted earlier, in the U.S.A., speech recognition is a means of providing a "reasonable accommodation" by employers for disabled but qualified personnel to work with computers—a stipulation of the broadly applicable Americans with Disabilities Act of 1990. It can also help to eliminate the risk of Repetitive Stress Injury (RSI), such as carpal tunnel syndrome and tendonitis, caused by keyboard manipulation. As a result, the calculation of ROI should consider two items as downside costs of noncompliance to the ADA: (1) the indirect costs of RSI-related productivity loss, and (2) the direct litigation costs of ADA and RSI issues. An organization's own guidelines and records and/or those of the cogent profession or industry addressing these items can be used as a rational basis for determining costs.

·    Accuracy—The transcription accuracy of leading speech recognition products can be very high, particularly when transcribed text is printed out or reviewed on screen as an accuracy check. To the extent that machine transcription can eliminate handwritten notes and misspellings, the potential consequences of misinterpreting those notes is also reduced.

·    The value of standardization—Speech recognition can reduce risk by making generally accepted protocols and standards available throughout an organization with networked speech macros and templates. The calculation of the value of preventing errors of nonconformance can be based on actual cost records and/or the reasonable and expectable costs within the industry, based on actuarial data.

·    EMR generation—Speech recognition is an ideal means of producing EMRs and eliminating handwritten information. In the healthcare industry, the ability to produce standard documents and text fragments immediately suggests enhancement of productivity—particularly with the use of mobile recorders—on an intuitive or anecdotal basis. However, quantification of value in dollar-and-cent terms requires establishing a rational basis for the purchase decision.

This basis can be established by combining (1) the potential downside costs of noncompliance with EMR requirements and failure to document adherence to government regulations, and (2) the value of productivity gains. In (1), noncompliance may be quantified as the sum of costs such as loss of insurance reimbursements due to disqualification, plus potential revenue loss, for example, pursuant to loss of accreditation. This total can be quite substantial. In (2), productivity gains are based on a theory of eliminating redundancy and/or assuring the best use of personnel.

A rational approach is to determine the dollar value of the time currently spent by physicians, nurses, other caregivers, and support personnel now keyboarding their own or others' records. With speech recognition, this time could be available instead for additional patient care, research, administrative work, education, etc. Arithmetically, the product of the average hourly compensation of the personnel relieved from transcription tasks times the respective hourly totals of the newly available time provides a net value estimate of the time made available.

·    The limits of pilot trial data—While a pilot trial can yield useful scaleable data with which to project the expenses and benefits of larger-scale installations, additional expense and savings data that might not be apparent during a pilot trial should be included in ROI calculations. Examples of such expenses include: capital and maintenance/operations costs accrued by enterprise-wide network improvements, including: server/router hardware, software, cable/connector, and architectural planning and construction required for recabling, loss of plant/office productivity during recabling, network configuration, and debugging. Examples of savings, as noted previously, include increases in overall productivity and avoidance of the costs of noncompliance with industry standards and with workplace-related statutes and guidelines.


·  WORKSHEET:

SAMPLE WORKSHEET:

Return on Investment for Implementing Speech Recognition in a Medical Enterprise

Note: This sample worksheet may not contain all line items applying to specific installations. Contingencies should be addressed according to generally accepted practices of individual organizations.

I. CURRENT ANNUAL COSTS

      A. Outside transcription costs

             1. TOTAL Annual cost of outside transcription services (include

                   incidental costs—messenger, postage, etc.)……………………………………...…      __________

             2. Total pages transcribed by outside service per year…………...        __________

             3. Outside transcription service cost per page

                   (Line I-A-1 χ I-A-2)………………………………………………           __________

      B. Direct costs of in-house transcription

             1. Total number of pages transcribed in-house per year………...         __________

             2. Total time (hours) per year spent on in-house transcription       __________

             3. Average hourly salary of transcriptionist(s)

                   Note that transcriptionists may be physicians, nurses, or other

                   caregivers keyboarding their own documents, as well as

                   support personnel transcribing live or recorded dictation

                   or handwritten text……………………………………………..…          __________

             4. TOTAL annual in-house transcription cost (Line I-B-2 x I-B-3)…………....….          __________

      C. Other costs of in-house transcription

             1. Intrinsic loss of personnel working on transcription tasks

                   instead of on medical tasks, based on average hourly salary

                   (Enter the value from Line I-B-4)………………………………...         __________

             2. Intrinsic loss of net billings from lost patient care services…....    __________

             3. TOTAL other costs of in-house transcription

                   (Total of Lines I-C-1 and I-C-2)…………………..…………………….…..…….             __________

      D. Additional personnel-related costs

             1. Proofreading and correction costs………………………………         __________

             2. Estimated annual cost of noncompliance with American

                   Disabilities Act in computer operations (legal fees,

                   lawsuit awards/settlements, lost business opportunities) ……...… __________

             3. Cost of computer-related RSI and similar claims………….…..        __________

             4. Estimated annual loss of personnel productivity from RSI...…         __________

             5. TOTAL Annual personnel-related costs

                   (Total of Lines I-D-1 through 4)………..…………………………………….…..             __________

       


E. Opportunity costs of failure to use enterprise-wide standards and protocols

                Estimated annual cost of noncompliance with standards/protocols,

                such as business losses and actual or potential legal fees and/or

                lawsuit awards or settlements………………………………….……………….……           __________

      F. Medical Industry: Electronic Medical Record generation costs

                Total annual costs exclusively to generate EMRs (in addition to I-A and I-B) …......   __________

      G. OPTIONAL (Non-Transcription): Design engineering opportunity cost

                Estimated value of annual sales or other activities that cannot be pursued

                due to the lack of a practical hands-free, eyes-free speech interface………….…..…    __________

      H. TOTAL Current annual costs

             (Enter total of all boxed items in Section I)…………………………………….…            __________

 

 

II. SPEECH RECOGNITION—STARTUP COSTS FOR FIRST YEAR

      A. Capital costs

             1. Desktop

                     a. Hardware costs for startup

                             1. For Pilot Trial……………………………………..…..…          __________

                             2. For balance of first-year implementation

                                   (not including Pilot Trial)...………………………...…...        __________

                             3. TOTAL Desktop Hardware (Total of Lines II-A-1-a-1 and 2)…………...       __________

                     b. Software costs for startup

                             1. For Pilot Trial………………………………………....…          __________

                             2. For balance of first-year implementation

                                   (not including Pilot Trial)...…………………………..…        __________

                             3. TOTAL Desktop Software (Total of Lines II-A-1-b-1 and 2)…………...        __________

                     c. TOTAL Desktop capital costs for startup (Total of Lines II-A-1-a-3

                           and II-A-1-b-3)……………………………………………….……………….              __________

             2. Network costs for startup

                     a. Hardware

                             1. For Pilot Trial (PCs, servers, routers, drives, etc.)……….    __________

                             2. For balance of full implementation

                                     (not including Pilot Trial)...…………………………...        __________

                             3. TOTAL Network Hardware (Total of Lines II-A-2-a-1 and 2)…………...       __________

                     b. Software

                             1. For Pilot Trial………………………………………….…         __________

                             2. For full implementation (not including Pilot Trial)…...…     __________

                             3. Annual maintenance and upgrades……………………....    __________

                             4. TOTAL Network Software (Total of Lines II-A-2-b-1, 2, and 3)……..…        __________

                     c. Network wiring/optical fiber cabling and connectors

                             1. For Pilot Trial……………………………………………          __________

                             2. For balance of full implementation

                                     (not including Pilot Trial)...…………………………...        __________

                             3. TOTAL Network wiring (Total of Lines II-A-2-c-1 and 2)…………..….         __________

 


                     d. Network-related plant construction

                             1. Required for Pilot Trial (if any)…………………..…...…        __________

                             2. Required for balance of full implementation

                                     (not including Pilot Trial)...…………………………....       __________

                             3. TOTAL Construction (Total of Lines II-A-2-d-1 and 2)……………...….       __________

                     e. Miscellaneous (furniture, office equipment, and supplies)…………………….     __________

             3. TOTAL Capital network costs for startup

                   (Total of boxed subtotals for Sections II-A-2-a, b, c, d, and e)…………………....        __________

          4. TOTAL Capital costs for startup (Total of Lines II-A-1-c and II-A-3)…………          __________

 

B.     Maintenance and Operation costs for startup

1. Payroll costs

                     a. Managerial oversight

                             1. For Pilot Trial (if any)……………………………………         __________

                             2. For balance of first year implementation

                                   (not including Pilot Trial)..…………………………...…        __________

                             3. TOTAL Managerial oversight for startup

                                   (Total of Lines II-B-1-a-1 and 2)………………………………………....          __________

                     b. Installation

                             1. For Pilot Trial (if any)……………………………...….…         __________

                             2. For balance of first-year implementation

                                     (not including Pilot Trial)…………………………...…       __________

                             3. TOTAL Installation (Total of Lines II-B-1-b-1 and 2).…………….….....        __________

                     c. Estimated productivity loss during installation and training

                           (salary plus value of work not performed)

                             1. During Pilot Trial (if any)……………………………..…        __________

                             2. During balance of first year (not including Pilot Trial).....    __________

                             3. TOTAL Estimated productivity loss

                                     (Total of Lines II-B-1-c-1 and 2)……….…………………………..…...         __________

                     d. TOTAL Payroll costs

                           (Total of boxed items II-B-1-a-3, II-B-1-b-3, and II-B-1-c-3)……………….         __________

             2. Consulting costs for installation and first-year maintenance

                     a. For hardware installation ………………………………......          __________

                     b. For software installation and customization………..….….         __________

                     c. For user training…………………………..…….…….….….         __________

                     d. For technical support……………………………….…..…...          __________

                     e. TOTAL consulting costs for startup

                           (Enter total of Lines II-B-2-a through d)……………………..………….…....          __________

             3. Upgrade costs for first year (salary plus value of work not performed

                   during installation and shakedown)…………………………………………....                __________

             4. Miscellaneous costs during first year

                   (Additional HVAC, electricity, security costs)………………………………….….        __________

             5. TOTAL Maintenance and operations costs for startup

                   (Enter total of Lines II-B-1-d, II-B-2-e, II-B-3, and II-B-4)………………………          __________


 

      C. TOTAL Startup costs of speech recognition

          (Enter total of Capital Costs from Line II-A-4 and M & O on Line II-B-5)………        __________

 

 

III. SPEECH RECOGNITION—ANNUAL COSTS AFTER STARTUP YEAR

      A. Capital costs

             1. Desktop

                     a. Hardware upgrades and replacements…………………….          __________

                     b. Software upgrades………………………………………..…           __________

                     c. TOTAL Desktop………………………………………………………………              __________

             2. Network

                     a. Hardware upgrades and replacements exclusively

                           due to speech recognition requirements…………………..        __________

                     b. Software upgrades exclusively due to speech

                           recognition requirements………………………….……….         __________

                     c. TOTAL Network (Total of Lines III-A-2-a and b)…………………………              __________

             3. Miscellaneous (furniture, office equipment, and supplies

                     exclusively due to speech recognition requirements)………………………..…...       __________

             4. TOTAL Annual capital costs

                   (Total of Lines III-A-1-c, III-A-2-c, and III-A-3)………………………………..              __________

      B. Annual maintenance and operation costs

             1. Payroll costs

                     a. Managerial oversight……………………………………..…          __________

                     b. Training……………………………………………………..            __________

                     c. Estimated productivity loss during training………………         __________

                     d. Technical support………………………………………...…           __________

                     e. Proofreading and correction costs…………………………         __________

                     f. TOTAL Payroll costs (Total of Lines III-B-1-a through e)……….…….……         __________

             2. TOTAL consulting costs (typically, outsourced training and technical support).…  __________

             3. Miscellaneous costs (HVAC, electricity, security costs)…………………………..       __________

             4. TOTAL Annual maintenance and operation costs

                     (Total of Lines III-B-1 through 3)………………….………………………….…           __________

      C. TOTAL Annual operating costs after startup year

             (Total of Lines III-A-4 and III-B-4)……………………….……...…….…    __________

 


IV. NET ANNUAL SAVINGS (LOSS) FROM IMPLEMENTING SPEECH RECOGNITION TECHNOLOGY

      A. Total current annual costs

             (Enter Total from Section I-H)……………………………..…………………….……             __________

        B. Costs of speech recognition startup year

          (Enter Total from Section II-C)……………………………..……………………..…              __________

      C. Net savings (loss) of implementing speech recognition in Year 1

             Compare Line IV-B with Line IV-A.

1.       If the amount on Line IV-B is smaller than the amount on Line IV-A,

        breakeven will occur during the first year.

        Calculate how many months are required to achieve breakeven:

        Divide Line IV-B by Line IV-A and multiply the result by 12.

        Enter the result here:……………………………….……………………….           ______months

2. If Line IV-B is larger than Line IV-A, go to Line IV-D.

        D. Annual speech recognition operating costs after startup year

           (Enter amount from Line III-C)…………………………………………………….…              __________

    E. Net savings (loss) of speech recognition after 2 years

             1. Current costs. Multiply Line IV-A x 2. Enter result here………..      __________

2. Two-year speech recognition costs. Add Lines IV-B and IV-D.

        Enter sum here…………………………………………………..          __________

             3. Compare Line IV-E-2 with Line IV-E-1:

a. If Line IV-E-2 is smaller than Line IV-E-1, breakeven will occur

                           during the second year.

Calculate how many months are required to achieve breakeven:

Divide the amount on Line IV-E-2 by the amount on Line IV-E-1 and

multiply the result by 24.

                           Enter result here:…………………………..………………….……..…             ______months

                     b. If Line IV-E-2 is larger than Line IV-E-1, go to Line IV-F.

        F. Net savings (loss) of speech recognition after 3 years

             1. Current costs. Multiply Line IV-A x 3. Enter product here……...     __________

             2. Three-year speech recognition costs.

                   Add Line IV-B and 2 x Line IV-D. Enter sum here……………...        __________

             3. Compare Line IV-F-2 with Line IV-F-1:

                     a. If Line IV-F-2 is smaller than Line IV-F-1, breakeven will occur

                           during the third year.

Calculate how many months are required to achieve breakeven:

Divide the amount on Line IV-F-2 by the amount on Line IV-F-1 and

multiply the result by 36.

                           Enter result here:…………………………..………………….……..…             ______months

                     b. If Line IV-F-2 is larger than Line IV-F-1, continue adding years to annual

                           calculations until breakeven is achieved.

5400 LAP



[1] How Does It Work?

The basis of practical speech recognition technology is statistical, and generally involves the use and interrelationship of three fundamental types of information:

·      A lexicon—a group of words and their pronunciations

·      A language model—which specifies the relative likelihood of a sequence of words

·      An acoustic model—the sound-related variables of a given pronunciation

Automatic speech recognition algorithms relate acoustic data (user speech) to the intended linguistic equivalent (transcription or control action). This task can range from identifying a few, readily distinguishable commands from a finite grammar—a relatively "easy" speech recognition task—to the far more computationally intensive challenge of accurately discriminating and recognizing long sequences of words, numbers, and system commands in continuous-speech, large-vocabulary dictation. The larger and more varied the lexicon, the greater the number of hypotheses the computer must evaluate to make the most plausible assessment of a given utterance— and the greater the demand placed on computer processor and memory resources. With some software, if a "misrecognition" is made, users can correct by voice command, and "teach" the application the intended transcription of the spoken words.

[2] Witt, D.J.; Transcription service in the ED. Amer. J. Emerg. Med. 1995; 13:34-36.

[3] PC Magazine, October 22, 1998, Dragon NaturallySpeaking Preferred.