**5. Conclusions and future work**

'GradientThreshold',8, ... 'Shuffle','every-epoch', ...

*Deep Learning Applications*

'Plots','training-progress', ...

"The text definition here."];

XNew = doc2sequence(enc,

labelsNew = classify(net,XNew)

'Verbose',false);

reportsNew = [...

options);

*\**

**Table 2.**

**Figure 12.**

**76**

*cancer type in 2017.*

'ValidationData',{XValidation,YValidation}, ...

net = trainNetwork(XTrain,YTrain,layers,

documentsNew = preprocessText(reportsNew);

*MATLAB codes and specification of cods for Example 2\*.*

documentsNew,'Length',sequenceLength);

**i. Train the LSTM network Descriptions**

**j. Predict using new data Descriptions**

**k. Preprocess Convert and Classify Descriptions**

*All descriptions are based on MATLAB 2020b and related examples from the MATLAB.*

*Visualizing the training data text file for SEER-2017 cancer types by age groups using a word cloud of LSTM. The MATLAB codes to create this figure are given in Table 2. The bigger the word, the more often diagnosed*

'GradientThreshold: Clip gradient values for the

Classify the event type of three new reports. Create a string array containing the new reports.

Train the LSTM network using the trainNetwork function.

threshold

LSTM is a very powerful ANN architecture for disease subtypes, time series analyses, for the text generation, handwriting recognition, music generation, language translation, image captioning process. The LSTM approach is effective to make predictions as equal attention is provided for all input sequences by the information flows through the cell state. Because of the mechanism adopted, the small change in the input sequence does not harm the prediction accuracy done by LSTM. Future work on LSTM has several directions. Most LSTM architectures are designed to handle data evenly distributed between elapsed times (days, months, years, etc.) for the consecutive elements of a sequence. More studies are needed to improve the predictive ability of LSTM for nonconstant consecutive observations elapsed times. Moreover, further studies are needed for possible overfitting problems for training with smaller data sets. Rather than using early stopping to avoid the overfitting, Bayesian regularized approach would be more effective to ensure that the neural network halts training at the point where further training would result in overfitting. As the Bayesian regularized approach uses a different loss function with different hyperparameters, this approach demands costly computation resources.
