Models, code, and papers for "V M Thakare":
Form about four decades human beings have been dreaming of an intelligent machine which can master the natural speech. In its simplest form, this machine should consist of two subsystems, namely automatic speech recognition (ASR) and speech understanding (SU). The goal of ASR is to transcribe natural speech while SU is to understand the meaning of the transcription. Recognizing and understanding a spoken sentence is obviously a knowledge-intensive process, which must take into account all variable information about the speech communication process, from acoustics to semantics and pragmatics. While developing an Automatic Speech Recognition System, it is observed that some adverse conditions degrade the performance of the Speech Recognition System. In this contribution, speech enhancement system is introduced for enhancing speech signals corrupted by additive noise and improving the performance of Automatic Speech Recognizers in noisy conditions. Automatic speech recognition experiments show that replacing noisy speech signals by the corresponding enhanced speech signals leads to an improvement in the recognition accuracies. The amount of improvement varies with the type of the corrupting noise.
The time domain waveform of a speech signal carries all of the auditory information. From the phonological point of view, it little can be said on the basis of the waveform itself. However, past research in mathematics, acoustics, and speech technology have provided many methods for converting data that can be considered as information if interpreted correctly. In order to find some statistically relevant information from incoming data, it is important to have mechanisms for reducing the information of each segment in the audio signal into a relatively small number of parameters, or features. These features should describe each segment in such a characteristic way that other similar segments can be grouped together by comparing their features. There are enormous interesting and exceptional ways to describe the speech signal in terms of parameters. Though, they all have their strengths and weaknesses, we have presented some of the most used methods with their importance.
Speech is a natural form of communication for human beings, and computers with the ability to understand speech and speak with a human voice are expected to contribute to the development of more natural man-machine interfaces. Computers with this kind of ability are gradually becoming a reality, through the evolution of speech recognition technologies. Speech is being an important mode of interaction with computers. In this paper Feature extraction is implemented using well-known Mel-Frequency Cepstral Coefficients (MFCC).Pattern matching is done using Dynamic time warping (DTW) algorithm.
In this age of information technology, information access in a convenient manner has gained importance. Since speech is a primary mode of communication among human beings, it is natural for people to expect to be able to carry out spoken dialogue with computer. Speech recognition system permits ordinary people to speak to the computer to retrieve information. It is desirable to have a human computer dialogue in local language. Hindi being the most widely spoken Language in India is the natural primary human language candidate for human machine interaction. There are five pairs of vowels in Hindi languages; one member is longer than the other one. This paper describes an overview of speech recognition system that includes how speech is produced and the properties and characteristics of Hindi Phoneme.
Automatic speech recognition enables a wide range of current and emerging applications such as automatic transcription, multimedia content analysis, and natural human-computer interfaces. This paper provides a glimpse of the opportunities and challenges that parallelism provides for automatic speech recognition and related application research from the point of view of speech researchers. The increasing parallelism in computing platforms opens three major possibilities for speech recognition systems: improving recognition accuracy in non-ideal, everyday noisy environments; increasing recognition throughput in batch processing of speech data; and reducing recognition latency in realtime usage scenarios. This paper describes technical challenges, approaches taken, and possible directions for future research to guide the design of efficient parallel software and hardware infrastructures.
Acoustical mismatch among training and testing phases degrades outstandingly speech recognition results. This problem has limited the development of real-world nonspecific applications, as testing conditions are highly variant or even unpredictable during the training process. Therefore the background noise has to be removed from the noisy speech signal to increase the signal intelligibility and to reduce the listener fatigue. Enhancement techniques applied, as pre-processing stages; to the systems remarkably improve recognition results. In this paper, a novel approach is used to enhance the perceived quality of the speech signal when the additive noise cannot be directly controlled. Instead of controlling the background noise, we propose to reinforce the speech signal so that it can be heard more clearly in noisy environments. The subjective evaluation shows that the proposed method improves perceptual quality of speech in various noisy environments. As in some cases speaking may be more convenient than typing, even for rapid typists: many mathematical symbols are missing from the keyboard but can be easily spoken and recognized. Therefore, the proposed system can be used in an application designed for mathematical symbol recognition (especially symbols not available on the keyboard) in schools.
The main motivation for Automatic Speech Recognition (ASR) is efficient interfaces to computers, and for the interfaces to be natural and truly useful, it should provide coverage for a large group of users. The purpose of these tasks is to further improve man-machine communication. ASR systems exhibit unacceptable degradations in performance when the acoustical environments used for training and testing the system are not the same. The goal of this research is to increase the robustness of the speech recognition systems with respect to changes in the environment. A system can be labeled as environment-independent if the recognition accuracy for a new environment is the same or higher than that obtained when the system is retrained for that environment. Attaining such performance is the dream of the researchers. This paper elaborates some of the difficulties with Automatic Speech Recognition (ASR). These difficulties are classified into Speakers characteristics and environmental conditions, and tried to suggest some techniques to compensate variations in speech signal. This paper focuses on the robustness with respect to speakers variations and changes in the acoustical environment. We discussed several different external factors that change the environment and physiological differences that affect the performance of a speech recognition system followed by techniques that are helpful to design a robust ASR system.
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI).
Achieving and maintaining the performance of ubiquitous (Automatic Speech Recognition) ASR system is a real challenge. The main objective of this work is to develop a method that will improve and show the consistency in performance of ubiquitous ASR system for real world noisy environment. An adaptive methodology has been developed to achieve an objective with the help of implementing followings, -Cleaning speech signal as much as possible while preserving originality / intangibility using various modified filters and enhancement techniques. -Extracting features from speech signals using various sizes of parameter. -Train the system for ubiquitous environment using multi-environmental adaptation training methods. -Optimize the word recognition rate with appropriate variable size of parameters using fuzzy technique. The consistency in performance is tested using standard noise databases as well as in real world environment. A good improvement is noticed. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI) using Speech User Interface (SUI).