Understanding the common or different characteristics of sublanguages in clinical documents through corpus analysis is essential for downstream applications of clinical natural language processing (NLP). Here, we conducted a sublanguage analysis of a corpus consisting of 500,000 clinical documents concerning clinical sections. We analyzed sublanguage characteristics per practice setting or document type for the top ten most frequent clinical sections. The named entity (NE) for the problem, test, and treatment concepts was extracted using fine-tuned bio-clinical Bidirectional Encoder Representations from Transformers (BERT). Fast-clustering using sentence-BERT was applied, and clustering results, a case study of terms containing 'pain,' were visualized using SandDance. Our results confirmed that document types with a narrow scope (i.e., limited evaluation) presented high term frequencies in diverse disjoint clusters than document types with a broad scope (i.e., Discharge Summary). Family Medicine and Primary Care practice settings presented similar cluster distributions (i.e., the frequent use of similar co-occurring words with 'pain'), implying the similar sublanguage. In contrast, Emergency Medicine showed a distinct sublanguage with high term frequencies in disjoint clusters than other practices. Those findings suggest that analyzing term distribution with respect to different combinations of the section, practicing setting, and document type provide important information when developing or implementing NLP systems.